The assessment of speech intelligibility plays an important role in fields such as audiology, psychoacoustics, and telecommunications, among others. The use of sentence materials to test speech intelligibility has many advantages over using other types of speech stimuli, such as words or syllables, because sentences are more representative of real everyday communicative situations than are words or syllables. On the other hand, these types of stimuli have some disadvantages. For example, if the experimenter uses different experimental conditions, such as different signal-to-noise ratios (SNRs) or other listening conditions, the same speech materials cannot be repeated with the same listener.

To address this issue, Kalikow, Stevens, and Elliot (1977) developed a test of speech perception in noise (SPIN) consisting of eight lists of sentences equivalent in intelligibility and tested in different conditions of background babble noise. Thus, an experimenter can select some of these lists and use them in different experimental listening conditions simulating those encountered in everyday speech communication.

The SPIN sentences have another valuable characteristic. Each 50-sentence list contains 25 high-predictability (HP) sentences and 25 low-predictability (LP) sentences. The HP sentences are constructed in such a way that the final word can somehow be predicted by the preceding context, and the LP sentences are constructed in such a way that the final word cannot be predicted by the context. Each HP sentence has its corresponding LP sentence, so that the same final word appears in both the HP sentence and its corresponding LP sentence. The listeners must respond by providing the final word or key word. Comparing the performance of individuals on the recognition of these two types of sentences makes it possible to assess the separate effects of auditory acuity and linguistic knowledge, expressed as the capability of using the preceding context to recognize the final word. Thus, the contribution of either sensory or cognitive processing to the total score obtained by the listener can be estimated by comparing performances on the HP and LP sentences. The assumption is that the HP sentences produce higher scores than do the LP sentences, especially in adverse listening conditions. In these situations, when the acoustical cues and bottom-up processing are not enough to accomplish speech perception, top-down processing (linguistic knowledge or the use of context) can facilitate this identification.

The evaluation of speech intelligibility is especially important in adverse listening conditions that simulate everyday listening situations, such as background noise at different signal-to noise levels (Dubno, Ahlstrom, & Horwitz, 2000; Gordon-Salant & Fitzgibbons, 1999, 2001, 2004; Gordon-Salant, Fitzgibbons, & Friedman, 2007; Humes, Burk, Coughlin, Busey, & Strauser, 2007; Kalikow et al., 1977), fast speech (Gordon-Salant & Fitzgibbons, 1999, 2001, 2004; Gordon-Salant et al., 2007; Humes et al., 2007), same versus different speakers’ voices (Goy, Pichora-Fuller, van Lieshout, Singh, & Schneider, 2007), or some speech distortions such as jitter (Pichora-Fuller, Schneider, MacDonald, Pass, & Brown, 2007) or noise-vocoded speech (Sheldom, Pichora-Fuller, & Schneider, 2008).

Another area in which the measurement of speech intelligibility is especially relevant is clinical audiology. The SPIN test has demonstrated its clinical utility in measuring the effects of linguistic cues on speech discrimination in studies by Bilger, Nuetzel, Rabinowitz, and Rzeczkowsky (1984) and Hutcherson, Dirks, and Morgan (1979) for listeners with sensorineural hearing loss or in the study by Del Dot, Hickson, and O’Connell (1992) for listeners using hearing aids.

There are many other situations in which testing the effects of linguistic knowledge is a relevant issue. For instance, the study by Elliot (1979) evaluated from what age children are able to use contextual or linguistic cues to achieve speech perception in noise, while the study by Mayo, Florentine, and Buus (1997) determined how age of acquisition influences second-language speech perception. In the latter study, differences in the recognition of HP and LP sentences, especially in noise conditions, would indicate the degree to which the nonnative listeners had mastered the ability to profit from the semantic and syntactic information provided by the context.

Another research area of interest is the recognition of speech in noise in elderly listeners. The differences these listeners show for the HP versus LP sentences have been extensively studied (Dubno et al., 2000; Gordon-Salant & Fitzgibbons, 1997; Perry & Wingfield, 1994; Pichora-Fuller, 2008; Pichora-Fuller, Schneider, & Daneman, 1995; Sommers & Danielson, 1999; Wingfield, Tun, & McCoy, 2005). In these listeners, decreases in sensory information due to loss of hearing acuity, especially in adverse listening conditions, can be compensated by information provided by the context (Pichora Fuller, 2008).

Thus, the SPIN test has been applied to a variety of experimental conditions and types of listeners in the English language, and it has proved to be a useful tool in psycholinguistics, psychoacoustics, and audiology. The objective of the present study was to develop a test to measure the intelligibility of speech in noise for the Spanish language similar to the test developed by Kalikow et al. (1977) for the English language in an experiment conducted to measure the intelligibility of a pool of sentences with different signal-to-noise ratios (SNRs). These sentences were used in a previous study (Cervera & Gonzalez-Alvarez, 2010). In that study, six lists of HP sentences were first generated. These lists had equivalent predictability for the final word. They were also equivalent in length, phonetic content of the sentence, and frequency of the final word. In addition, each HP sentence had its corresponding LP sentence generated by using the same final word but with an LP preceding context, producing six corresponding LP lists.

In the present study, our aim was to assess the intelligibility of these sentences in normal-hearing listeners in three different SNR conditions (0 dB, +5 dB, and +10 dB) using babble noise. We hypothesized that the performance of the listeners on the HP sentences would be higher than the performance on the LP sentences. At the same time, among the three SNR conditions, the +10-dB SNR condition would produce higher scores than the +5-dB SNR condition, and the latter would produce higher scores than the 0-dB SNR condition.

The ultimate objective was to create a set of final lists (hereafter referred to as forms) of equal intelligibility to be used as a test of speech intelligibility in noise for the Spanish language. These forms consist of 50 sentences each (25 HP and 25 LP). The forms must also have equivalent phonetic content, because this characteristic is very important in audiology.

Method

Participants

The participants in the experiment were 474 undergraduate students, 394 from the University of Valencia and 80 from the University of Jaume I. Of these students, 291 were female and 183 were male. Their ages ranged from 21 to 30 years, with a mean age of 23.1 years (SD = 2.6). They received partial credit for a course requirement. None of the participants reported having any hearing or language problems, and they were native speakers of Castilian Spanish.

Stimuli

The stimuli consisted of 150 HP sentences and 150 LP sentences. These sentences were generated in a previous study (Cervera & Gonzalez-Alvarez, 2010). The HP sentences consisted of sentences whose final word was in some way predictable from the preceding context, with values of between 10% and 90% predictability (e.g., “Ata el regalo con una cinta”; “Tie the present with a ribbon”). The 150 sentences were grouped in six lists so that the predictability of all of the six lists was equivalent. These lists were also equivalent in length (all of them had from seven to ten syllables), phonetic content (with regard to both the whole sentence and only the last word or key word), syllabic structure, word stress, and frequency of the final word.

In addition, each HP sentence had its corresponding LP sentence, generated using the same final word, but with an LP preceding context. An example would be “Ahora voy a decir cinta” (“Now I am going to say ribbon”). Thus, six lists of 25 HP sentences and six lists of 25 LP sentences were created.

These lists were recorded by a native Castilian Spanish female speaker who was accustomed to recording for experimental or clinical purposes. The speaker was required to repeat each sentence 3 times. In addition, the duration of the utterance had to be from 1,800 to 2,000 ms. The clearest production of each sentence recording was selected. The recording took place in a soundproof room, using a Sennheiser HMD 224 microphone set at 15 cm from the lips and directly digitalized in the computer using an Edirol UA-5 sound card, with a sampling frequency of 11.025 kHz, and then the signal was low-pass filtered at 5.5 kHz to prevent aliasing.

The speech materials were edited with Adobe Audition sound editor software. First, each sentence was excised from the recorded list of sentences, creating WAV files of 1,800–2,000 ms of duration. Visual inspection of the waveform and the spectrogram was used to determine optimal points at which to excise the sentence. Then the intensity of each stimulus was also adjusted so that it would have an equal root-mean square (RMS) across the entire sentence. The final words of the sentences were also equal in intensity.

To create the masking condition, we used babble noise. The babble noise was generated by mixing 12 voices (six males and six females) reading a text. The recording conditions and digitalization of the signal were the same as in the case of the sentence stimuli. The babble noise was mixed with each sentence, creating each of the three SNR conditions, 0-dB, +5-dB, and +10-dB SNR, by manipulating the overall RMS of both the signal and the babble noise. These manipulations were performed using Adobe Audition Pro software.

Procedure

The six lists of HP sentences and the six lists of LP sentences were presented in three different conditions of background noise (0-dB, +5-dB, +10-dB SNR) to a group of 474 listeners. Each individual was presented randomly with one of the following combinations of the HP and LP lists: list 1 (HP) with list 2 (LP), list 2 (HP) with list 3 (LP), list 3 (HP) with list 4 (LP), list 4 (HP) with list 5 (LP), list 5 (HP) with list 6 (LP), and list 6 (HP) with list 1 (LP). By means of these combinations, each participant was presented with both HP and LP sentences, but with no repetition of the final word of the sentence. At the same time, each individual was presented with only one of the three SNR conditions randomly. Thus, 79 individuals completed each of the six list combinations described above. Of each of these 79 participants, 26 of them were presented with the 0-SNR condition, 26 of them were presented with the +5-SNR condition, and 27 of them were presented with the +10-SNR condition.

The listeners participated in the experiment in a sound-attenuated laboratory with six cabins. Each cabin contained a Pentium PC with Sennheiser headphones. Before the experiment began, participants were instructed to listen to the sentence and enter the last word of the sentence, using the computer keyboard. The administration of the stimulus and the registration of the responses made by the listener were performed by a Java program developed specifically for this experiment.

Results

Percent correct scores

Figure 1 shows the mean scores (expressed in percentages) obtained by the listeners on the perceptual test for both the HP and the LP sentences in the three conditions of background noise, 0 dB, +5 dB, and +10 dB SNR. For each condition of SNR and for HP and LP sentences, the percentiles of the data obtained by the listeners were calculated as well (see Table 1).

Fig. 1
figure 1

Means of the percentage of correct identification scores in three signal-to-noise ratio (SNR) conditions for high-predictability sentences and low-predictability sentences. Error bars indicate standard errors

Table 1 Percentiles corresponding to the percent correct scores obtained by the listeners in the three signal-to-noise (SNR) conditions for high-predictability (HP) and low-predictability (LP) sentences

As can be observed in Fig. 1, the HP sentences presented higher perceptual scores than did the LP sentences, as was hypothesized. At the same time, the perceptual scores were higher for the highest SNR condition, +10 dB, followed by the +5-dB condition, and the lower scores correspond to the 0-dB condition, as was hypothesized.

With the aim of testing whether the differences between the LP and the HP sentences and the differences in the three conditions of SNR were significant, we submitted the data to a two-way ANOVA with percentage of correct scores obtained on the intelligibility test as a dependent measure and type of sentence (HP or LP) and the three SNR conditions (0 db, +5 dB, and +10 dB) as factors or independent variables.

We found significant main effects of type of sentence, F(1) = 3,005.19, p < .01, η 2 = .11, and SNR condition, F(2) = 2,234.43, p < .01, η 2 = .16. The interaction was also significant, F(1, 2) = 77.66, p < .01, η 2 = .007. A posteriori comparisons of the levels of the SNR factor, by means of the Tukey test, showed significant differences between 0 and +5 dB (p < .01), 0 and +10 dB (p < .01), and +5 and +10 dB (p < .01).

To confirm that the six HP sentence lists did not differ statistically on their intelligibility values, a one-way ANOVA was conducted with correct scores (expressed in percentages) as a dependent variable and list the sentences belonged to (list) as an independent variable with six levels. The results showed no significant effects of list, F(5) = 2.14, p > .05, η 2 = .069. Thus, the six HP sentences lists did not differ on their percent correct scores.

The same analysis was carried out for the LP sentences. Responses on the intelligibility test (expressed in percentages of correct scores) were used as a dependent measure. A one-way ANOVA was performed with list the sentences belonged to (list), with six levels, as a factor. We found no significant effects of list, F(5) = 1.48, p > .05, η 2 = .04. Thus, the six lists of LP sentences did not differ with regard to their percent correct scores.

Creation of final forms

The second step was to create final forms that would contain both HP and LP sentences. These forms were the result of combining one of the HP sentences lists with one LP sentence list in the following manner: list 1 (HP) with list 2 (LP), called form 1; list 2 (HP) with list 3 (LP), called form 2; list 3 (HP) with list 4 (LP), called form 3; list 4 (HP) with list 5 (LP), called form 4; list 5 (HP) with list 6 (LP), called form 5; and list 6 (HP) with list 1 (LP), called form 6. Thus, the final test instrument contains six forms of 50 sentences each. Within each form, the order of presentation of the HP and LP sentences was randomized. This manner of presentation is the same as that used in the SPIN sentences by Kalikow et al. (1977).

Previously, we confirmed that the HP lists were equivalent in their percent correct scores, as were the LP lists. The next question was to find out whether the final six forms (resulting from the combination of one HP sentence list and one LP sentence list) continued to be equivalent in percent correct scores and phonetic content (the predictability of the final forms did not have to be measured, because this characteristic concerns only the HP sentences and it was tested in the previous study by Cervera & Gonzalez-Alvarez, 2010).

Percent correct scores of the final forms of the test

The means and standard deviations of the percent correct scores obtained in the present experiment, for each of the six final forms of sentences across the three SNR conditions, were calculated (see Table 2). In order to have forms that were equivalent in their percent correct scores, a one-way ANOVA was conducted with values of intelligibility (expressed as percentages of correct scores) as a dependent variable and form the sentences belonged to as an independent variable with six levels. The results showed no significant effects of the form to which the sentences belonged, F(5) = 1.49, p > .05, η 2 = .25. Thus, it can be concluded that the six final forms of sentences (containing 25 HP and 25 LP sentences) did not differ in their percent correct scores.

Table 2 Means and standard deviations for the percent correct scores averaged for the three signal-to-noise ratios in the six forms (combination of high-predictability and low-predictability sentences)

Phonetic content of the final forms of the test

Another aim of the present study was for the final forms of 50 sentences to have similar equivalent phonetic content. The phonetic counts in each phonetic category were performed separately for the last word of the sentences and for the whole sentence (the preceding context plus the last word). In these counts, only content words (verbs, nouns, and adjectives) were taken into account, while articles, prepositions, and adverbs were not considered. The phonetic counts were calculated by counting the number of occurrences of segments in each phoneme class (occlusives, fricatives, nasals, liquids, and vowels). Phonetic counts were performed by the authors, who had training in this task.

A distribution of frequencies for each phoneme class was obtained for each of the 300 sentences (150 HP sentences and 150 LP sentences), for the whole sentence and for the final word in the sentences alone. Table 3 shows the number of occurrences of each phoneme class for each of the six final forms.

Table 3 Number of occurrences in each phoneme class for the last word and the whole sentence for each form (combination of high-predictability [HP] and low-predictability sentences)

In order to test whether all the forms had equivalent phonetic contents, a chi-square analysis was performed. Two separate tests were performed, for the final words or key words alone and for the whole sentences. Phoneme class (occlusives, fricatives, nasals, liquids, and vowels) and form (six levels) were included as factors in both cases. The chi-square value was not significant for the whole sentences, χ 2(20) = 8.13, p > .05, or for the final words of the sentence, χ 2(20) = 9.66, p > .05. Thus, the six final forms were equivalent in their phonetic content, whether the whole sentence was considered or only the final words. Finally, the definitive forms containing both HP and LP sentences are presented in the Appendix.

Discussion

Our objective was to generate forms of HP and LP Spanish sentences equivalent in intelligibility (measured as percent correct scores obtained in the perceptual task). These types of sentences have many applications, especially in psycholinguistics and audiology. In psycholinguistics, they could be especially useful in those circumstances in which it would be interesting to assess the sensory or bottom-up processing and the cognitive (effective use of context) or top-down processing capabilities of listeners during language processing. Some examples would be elderly listeners with age-related hearing loss but with intact top-down processing skills, children learning a second language who are not yet completely able to use context to accomplish speech perception, or individuals learning a second language with different levels of language proficiency. In audiology, these sentences can be useful for evaluating hearing aids in different SNR conditions simulating a variety of everyday communicative situations.

As in the case of the SPIN sentences (Kalikow et al., 1977) for the English language, the sentences developed in the present study for the Spanish language are easy to administer. The duration is short (about 10 min per form). The response required by the listeners is simple, because he or she has to respond only with the final word of the sentence. Besides intelligibility (percent correct scores), other characteristics are also controlled, such as phonetic content, sentence length, final word stress, and final word frequency, all of which are quite relevant in audiological evaluation. The utility of the SPIN sentences (Kalikow et al., 1977) has been demonstrated by their utility in audiology and psycholinguistics. For the audiological evaluation of Spanish-speaking listeners or research conducted with Spanish-speaking listeners, it is necessary to have similar speech materials for the Spanish language.