Background

Speech recognition is the ultimate goal of cochlear implantation. Auditory speech perception in CI depends on fundamental aspects of acoustics and psychoacoustics phenomena. These include adequacy of frequency and temporal resolution of the acoustic signal, pitch, and temporal discrimination. Auditory learning also plays an essential role in speech perception and is dependent on environmental exposure and memory processes [1].

Cochlear implant patients show differences in their abilities to receive speech [2]. Many factors contribute to the variability in CI outcomes such as differences in auditory perceptual resolution of acoustic signal, cognitive, and linguistic capabilities [3]. One such higher-level process is working memory (WM), which is the temporary storage mechanism for signal awareness, sensory perception, and information retrieval from long-term memory. Speech requires WM to encode, store, and retrieve phonological and lexical representations of words for speech perception and production [4].

CI patients continue to show deficits in basic auditory processes, such as temporal and amplitude discrimination, gap detection, and frequency discrimination. Frequency discrimination is essential for speech perception, especially in demanding listening conditions, for the identification and the localization of auditory signals and for the appreciation of music.

Many studies showed poor frequency discrimination skills in CI which can be explained by poor pitch perception. The limited electrode number may limit accurate harmonic representation of the acoustic signal. Spectral resolution is also limited by the spread and interaction of currents between adjacent electrodes and the uneven neural survival along the length of the cochlea [5].

The major limitation of frequency resolution is that electrical stimulation has a restricted capacity to convey temporal fine structure (TFS) cues which contribute to the processing of linguistic information and are indicative of speech perception, especially in noisy situations [6].

This limited frequency discrimination abilities of CI and the fact that the currently used speech testing materials have reached ceiling effects [7] have led to an increasing need for more difficult tests, which provide fine-grained information on perception of consonants and vowels. The Arabic consonants speech discrimination test is expected to provide a realistic tool for assessment of speech perception and phoneme perception analysis [8].

The aims of this study were to evaluate behavioral speech discrimination in CI using consonant discrimination materials and to evaluate working memory in those patients and assess its possible effect on their consonant discrimination abilities.

Methods

This study was conducted on 55 CI patients. They were divided according to the onset of hearing loss into two groups:

Group I: Prelinguals which includes 35 CI children implanted before the age of 5 years and have been using CI for at least 3 years. Their aided free field thresholds had to be ≤ 40 dBHL at 250, 500, 1000, 2000, and 4000 Hz. They received audiological and phoniatrics rehabilitation for at least 3 month. Their IQ levels were 80 or more on the Arabic version of Stanford Binet test [9].

Group II: Postlinguals which includes 20 CI patients with aided free field thresholds of ≤ 40 dBHL at 250, 500, 1000, 2000, and 4000 Hz. They received audiological rehabilitation for at least 3 months and their IQ levels were 80 or more on the Arabic version of Stanford Binet test [9].

Medical ethics were considered and the patients were informed that they will be a part of a research study. Adult patients and children’s caregivers were asked to sign a written consent.

All tested patients were subjected to the following:

  1. 1.

    Complete history taking.

  2. 2.

    Measurement of aided free field thresholds for 250, 500, 1000, 2000, and 4000 Hz using Madsen Itera audiometer in a sound treated room.

  3. 3.

    Psychometry: Intelligent quotient (IQ) using the Arabic version of Stanford Binet test [9].

  4. 4.

    Speech recognition scores are as follows:

    1. 4.1.

      Phonetically balanced word lists. Each patient was subjected to Arabic monosyllabic phonetically balanced words appropriate for age either phonetically balanced word discrimination lists [10] or phonetically balanced kindergarten word lists (PBKG) [11]. They were introduced to the patients through loudspeakers. Each test consists of 25 monosyllabic words. Scores are provided for the number of correctly identified words expressed as percentages correct.

    2. 4.2.

      Arabic consonant discrimination lists. The original test consisted of 2 lists of high frequency. The original test consisted of 2 lists of high frequency weighted CVC words [12]. Consonant classes which are fricatives, stops, nasals and laterals are represented in sense word list of 120 words and a non-sense list of 156 words. All words of both lists were analyzed in frequency domain and tested on high frequency hearing loss patients for validity and reliability.

    s. The original test consisted of 2 lists of high frequency weighted CVC words [12]. Consonant classes which are fricatives, stops, nasals, and laterals are represented in sense word list of 120 words and a non-sense list of 156 words. All words of both lists were analyzed in frequency domain and tested on high frequency hearing loss patients for validity and reliability.

A modified version of these lists was developed to shorten the test items and without changing the consonant and vowel frequency of representation [8].

For sensible word list, 24 words were chosen for fricatives, 18 words for stops, 4 for nasals, and 4 for laterals. So, the total for sensible word list was 50 words.

For non-sensible word lists, 26 words were chosen for fricatives, 18 words for stops, 4 for nasals, and 4 for laterals. So, the total for non-sensible word list was 52 words.

Although the original test includes both sensible and non-sensible syllables, in this study only sensible syllables were used. This choice was done since both pre and post lingual hearing loss patients may have more difficulty for meaningless sounds and thus possible bias in errors occur.

The words were introduced to the patients through loudspeakers at 40 dB SL as it was found that this level was the most comfortable to all patients. Scores are provided for the number of correctly identified phonemes and words, expressed as percentages correct.

Performance on phonemic errors was categorically identified and scored as follows:

Probability of error for each class of consonant=\( \frac{\mathrm{Total}\ \mathrm{errors}\ \mathrm{for}\ \mathrm{a}\ \mathrm{class}\ \mathrm{of}\ \mathrm{consonant}\ \mathrm{across}\ \mathrm{subjects}}{Total\ stimuli\ for\ that\ class\ across\ subjects} \) %

Working memory assessment

Forty patients (20 prelinguals and 20 postlinguals) were subjected to items of The Test of Memory and Learning-2 (TOMAL-2) [13]. This test was designed to verify the presence of memory problem in patients and to estimate the severity of the disorder [14]. The original TOMAL-2 has been subjected to translation and modification to suit the Egyptian culture and environment and to overcome the differences in the language structure between English and Arabic language [13].

Only the items testing auditory memory was used which are as follows:

  1. (1)

    Memory for stories (MFS) which requires verbal recall of a short story heard from the examiner. It provides a measure of meaningful recall.

  2. (2)

    Word selective reminding (WSR) which is a verbal free-recall task in which the subject tries to repeat a word list presented verbally. It tests learning and immediate recall in verbal memory.

  3. (3)

    Object recall (OR) where the examinee is asked to recall a series of pictures introduced to him/her.

  4. (4)

    Paired recall (PR) which is a verbal paired-associate learning task in which a list of word pairs is asked to be recalled when the first word of each pair is provided.

  5. (5)

    Digits forward (DF): DF measures low-level recall of a sequence of number.

  6. (6)

    Digits backward (DB): The examinee recalls the numbers in reverse order.

  7. (7)

    Letters forward (LF) which is a language-related analog to digit span task using letters instead of numbers.

  8. (8)

    Letters backward (LB): This task is a language-related analog to the digits backward task using letters.

  9. (9)

    Verbal delayed recall subtests: The patient is asked to recall the items he was interviewed for in the MFS and WSR subsets after 30 min to assess learning and the decay of memory.

  10. (10)

    The following indices of TOMAL-2 test were calculated:

The verbal memory index (VMI) which assesses memory for information presented verbally and reproduced in a sequential manner. This task is important in diagnosing learning disabilities with primary deficits in speech and language issues. This index is comprised of memory for stories, word selective reminding, object recall, and paired recall.

Associative recall index (ARI) which consists of memory for story and paired recall.

The verbal delayed recall index (VDRI) serves as a measure of “forgetting.” It assesses memory for information presented verbally for recall (memory for story and word selective reminding) after a delayed period.

Statistical analysis of the data

Data were fed to the computer and analyzed using IBM SPSS software package version 20.0. (Armonk, NY: IBM Corp). Qualitative data were described using number and percent. Quantitative data were described using range (minimum and maximum), mean and standard deviation. Chi-square test for categorical variables, to compare between the two groups while Monte Carlo correction for chi-square when more than 20% of the cells have expected count less than 5. Student t test was assessed for normally distributed quantitative variables to compare between the two groups. Pearson coefficient to correlate between two normally distributed quantitative variables. Significance of the obtained results was judged at the 5% level.

Results

Demographic data of the studied cases (Table 1)

Table 1 Distribution of the studied cases according to their demographic data

Age at implant, duration, and CI brand (Table 2)

Table 2 Distribution of the studied cases according to age at implant, duration, and CI brand

Aided free field thresholds of studied cases (Fig. 1)

Fig. 1
figure 1

Mean and standard deviation of aided free field thresholds of prelingual and postlingual CI patients

Speech discrimination scores (Tables 3, 4)

Figure 2 shows that among all classes of consonants tested by modified Arabic discrimination lists fricatives are the most affected with a probability of error of 25.14 ± 13.11% for prelinguals and 32.43 ± 12.02% for postlinguals.

Table 3 Comparison between PBWL scores and high frequency words discrimination scores for prelingual group (n = 35)
Table 4 Comparison between PBWL scores and high frequency words discrimination scores for postlingual group (n = 20)
Fig. 2
figure 2

Bar chart describing the probability of error of consonants in pre- and postlingual CI users

Working memory scores

Forty patients (20 prelinguals and 20 postlinguals) were subjected to the items of TOMAL-2 which assess auditory memory. Figure 3 shows the performance of all prelingual and postlingual CI patients on memory for stories (MFS), word selective reminding (WSR), object recall (OR), paired recall (PR), digit forward (DF), letter forward (DF), digit backward (DB), letter backward (DB), memory for stories delayed (MFSD), and word selective reminding delayed (WSRD). Table 5 shows comparison between prelinguals and postlinguals working memory items scores. The scores were higher in postlinguals although reached significance only in digit forward, digit backward, and letter backward subgroups.

Fig. 3
figure 3

Percentage of prelingual and postlingual patients lying in each category of TOMAL-2 scores

Table 5 Comparison between prelingual and postlingual according to scaled scores of TOMAL-2 test

Correlation between speech discrimination scores and working memory indices scores (Tables 6, 7, Figs. 4, 5)

The indices used were verbal memory index (VMI), verbal delayed recall index (VDRI), and associative recall index (ARI). These indices were correlated to scores on phonetically balanced word list (PBWL) and the results are shown in Table 6 and Fig. 4. The indices were also correlated to scores on high frequency discrimination list and the results are shown in Table 7 and Fig. 5. These results showed that there is a positive correlation between working memory indices and speech perception scores on both lists. The correlation was much stronger with high frequency list than with phonetically balanced word list (PBWL).

Table 6 Correlation between PBWL scores with different indices of TOMAL–2 test
Table 7 Correlation between high frequency list scores with different indices of TOMAL–2 test
Fig. 4
figure 4

Scatter diagram and best fit lines for the correlation between PBWL scores and TOMAL-2 indices

Fig. 5
figure 5

Scatter diagram for the correlation between high frequency list scores and TOMAL-2 indices

Discussion

Cochlear implants have shown satisfactory progress in speech perception for children and adults with severe to profound hearing loss. However, speech perception and language development show a considerable variability among CI users.

The goal of this study is to assess the consonant discrimination ability of CI users and to assess if working memory recall abilities have an impact on consonant discrimination performance.

Present results showed that speech discrimination scores for PBWLs in patients with CI are lower than normal hearing (NH) individuals with a wide range of variability despite pure tone thresholds being better than 40 dB HL. The PBWLs scores agreed with those of Turgeon C. who showed scores for monosyllabic phonetically balanced words for adult CI users to range from very poor reaching even zero to an excellent performance reaching 92% correct with a mean of 54% and wide variability with a standard deviation of 33 [15]. Also, Zhang’s results for consonant-nucleus-consonant (CNC) words varied across CI patients with a mean and a standard deviation of 58.04% ± 28.66 [16].

Paired t test was used to compare the performance of CI patients on phonetically balanced word lists appropriate to their age and modified Arabic consonant speech sounds in both pre- and postlingual patients. The scores on high frequency weighted materials were significantly lower than those of PBWL and that the most affected class of consonants is the fricatives. Fricatives have high frequency spectrum especially for the voiceless ones, as the noise segment for the voiceless fricatives is wider than the voiced ones [17]. The noise segment begins at 2000 Hz for /ς/ consonant, below 3000 Hz for /s/ consonant then extends upward to 10000 Hz. The friction noise for /ɵ/ sound is very weak across the spectrum and begins at 4000/Hz/. For /f/ sound, the friction noise begins at 6000-8000 Hz. The energy of the voiced fricatives /ħ/, /x/, /δ/, /z/, /ε/, and /ʁ/ appeared as formant like bands which extend with that of the following vowel. The noise source of /ħ/ had a cut off frequency at 2000 Hz than that of /z/ is at 4000 Hz while that of /δ/ at 6000 Hz [18].

Stieler et al. assessed perception of 5 Ling phonemes (aa, uu, ii, ss, and sh) in children with CI. The tested patients had difficulties with differentiation of phonemes based on high-frequency fricatives (sssh) even when having normal thresholds in free field (20–30 dB SPL; 0.25–6 kHz) [19].

Peng et al. found that children with bilateral CI were able to discriminate consonant contrasts using fine-grained spectral-temporal cues above chance level but poorer than their NH peers. While the electrodogram outputs suggest that CI provides some access to spectral cues that distinguish between consonants, it is likely that these spectral cues were too coarse for the saliency they needed to achieve a performance similar to NH [20].

The temporal information of the speech signal is decomposed into envelope (2–50 Hz), periodicity (50–500 Hz), and temporal fine structure (500–10,000 Hz). The envelope is the slow variations in the speech signal. Periodicity corresponds with the vibrations of the vocal cords, which conveys fundamental frequency (F0) information. Temporal fine structure (TFS) is the fast fluctuations in the signal, and contributes to pitch perception, sound localization, and binaural segregation of sound sources. All stimulation strategies represent high-frequency sounds only by place coding. The stimulation rate in every implant is constant, between 500 and 3500 pulses/s. Low-frequency sounds are represented by both temporal and place coding [21].

Poorer consonant discrimination in CI is attributed to the difficulty in recognizing pitch differences compared to normal which leads to difficulties in music appreciation, understanding of speech in noise, and understanding of tonal language. This may be due to the lack of sharp frequency tuning in electric hearing and defective ability of CI to discriminate the fundamental frequency (F0) of complex sounds [22].

One of the most important speech cues that help with higher-level speech perception speech is voice characteristics which are F0 and vocal tract length (VTL). VTL perception is severely impaired in CI users due to channel interactions and smeared spectral resolution [23]. This degree in resolution may also affect central resolution and memory storage.

Assessment of working memory using TOMAL-2 test revealed that most of the patients in the two groups fall into the deficient and very deficient subgroups indicating that WM in CI patients was poorer than normal. This may be attributed to the deficient auditory sensory input which disrupts accurate encoding of verbal information necessary for phonological processing and memory [24]. One assumption is that WM processes poorly defined pitch contrasts in the short time of its operation. These contrasts remain poorly defined in their final storage and future recalls on a longer temporal domain.

Davidson et al. compared WM in NH children and children with CI. Children with CIs scored significantly lower on simple and complex verbal WM tasks compared with their NH age mates; however, verbal WM deficits for CI group persist even with good audibility. Children with CIs have deficits in WM related to storing and processing verbal information. These deficits extend to receptive vocabulary and verbal reasoning [24]. For pre- and postlingually deaf, auditory deprivation will occur after a period of lack of sensory input. This process entails a degeneration of the auditory system, both peripherally and centrally, including a degradation of neural spiral ganglion cells [25]. Auditory deprivation during critical developmental phases leads to atypical development of executive functions [26].

On comparison of memory scores between prelinguals and postlinguals, the scores were higher in postlinguals although reached significance only in digit forward, digit backward, and letter backward subgroups. In postlingually deaf adults, the neural pathways in the brain have been shaped by acoustic sound perception before onset of deafness. The degree of success with a CI depends on how the brain compares the new signal with what was heard previously [27].

Moberly et al. found that adults with CIs performed on par with NH peers on measures of verbal WM that did not explicitly tax phonological skills: forward and backward digit span. However, on tasks of WM that placed greater demands on phonological capacities, serial recall of words, CI users were less accurate, suggesting that poor phonological sensitivity accounted for the difference in performance. Thus, these deficits can be attributed to a problem in storage, not processing [28].

Functional neuroimaging studies show that in adults, a network of prefrontal, parietal, and anterior cingulate regions are activated in WM contexts and activation in these regions increases with increasing working memory load [29]. Many studies report less activation in children than in adults in these regions and the amount of activation increases with age, mirroring improvements in behavioral performance [30]. In contrast, other studies report more diffuse patterns of activation in children in WM tasks reflecting that neural activation becomes progressively more focused with development [31].

We found a significant positive correlation between WM indices (verbal memory index, associative recall, and verbal delayed recall index) and speech perception scores on both lists. The correlation was much stronger with high frequency list than with PBWL. Talebi et al. assessed CI children memory performance, their speech perception and their speech production. They found a positive and significant relation between the memory scores and auditory perception and between memory and speech intelligibility. These results emphasize that CI children memory performance has a significant effect on their speech production [32].

Speech perception and language processing are dependent on fast and efficient phonological coding of auditory input in verbal short-term memory, i.e., stable phonological representations. Thus, verbal short-term memory operates as a linkage between auditory speech input and the stored language knowledge in the long-term memory [33].

The degraded auditory input of CI children results in underspecified phonological representations in their verbal short-term memory and as phonological WM is important for rapidly encoding and processing degraded and underspecified speech signals transmitted to the auditory nerve by a CI, this in turn would affect speech perception, verbal, and visuospatial reasoning abilities and ultimately language and academic performance [34].

Tao et al. found significant correlations of disyllable recognition and digit span scores in adult CI users. WM performance was significantly poorer for CI than for NH participants, suggesting that CI users experience limited WM capacity and efficiency [35]. In postlingual CI users, prolonged hearing loss can lead to degeneration of long-term phonological representations. Signal degradation impedes the ability of listeners to recover phonological representations, even when those representations remain intact internally [28].

Conclusions

The variability in CI users’ outcome is reflected as a wide range of performance on speech perception tasks. CI patients still lag behind normal hearing peers in speech recognition performance especially consonant discrimination. Frequency discrimination using high frequency weighted words can provide a realistic tool for assessment of speech perception by CI users.

Working memory skills seem to be retarded in CIs due to periods of auditory deprivation and the relatively degraded auditory message delivered to the implant. Differences in WM seem to contribute to the vast amounts of individual differences in the spoken language outcomes of adult and pediatric CI users.