Lexical stress refers to the pattern of stressed and unstressed syllables within single words. For example, the word pencil is pronounced with more stress on the first syllable (/′pen/) than on the second syllable (/səl/). The present study employed a recognition memory task to investigate two research questions: first, whether English speakers use foreign lexical stress cues to recognize newly learned Spanish words and, second, the extent to which similarity to English word forms influences the recognition process.

Lexical stress cues in English and Spanish

Acoustic measurements of lexical stress include pitch, intensity, and duration of word segments (Curtin, Campbell, & Hufnagle, 2012; Peperkamp, Vendelin, & Dupoux, 2010). In English, stressed syllables contain vowels that are pronounced with higher pitch and intensity and are longer than vowels in unstressed syllables (Fear, Cutler, & Butterfield, 1995). Typically, unstressed syllables contain the vowel schwa or a short form of a vowel (Cutler & Norris, 1988). In contrast, stressed syllables contain only full vowels, making stressed syllables longer than unstressed ones (Van Donselaar, Koster, & Cutler, 2005). In the Singapore English variety spoken by our participants, vowels in unstressed syllables are not reduced as much as in Standard English (Low, Grabe, & Nolan, 2000). However, both Standard and Singapore English reduce vowel duration in unstressed syllables, in comparison with Spanish. In polysyllabic Spanish words, vowels in both stressed and unstressed syllables are full vowels and have similar durations (Soto-Faraco, Sebastián-Gallés, & Cutler, 2001; Van Donselaar et al., 2005). As a result, in Spanish, stressed syllables are differentiated from unstressed ones by pitch and intensity changes (Soto-Faraco et al., 2001; Toro, Sebastián-Gallés, & Mattys, 2009).

Soto-Faraco et al. (2001) suggested that lexical stress cues may be critical in Spanish because they can be used to disambiguate many otherwise identical words existing in the Spanish lexicon (e.g., TÉRmino [clause] vs. terMIno [I finish] vs. termiNÓ [he finished]). Soto-Faraco et al. showed that the auditory prime prinCI facilitated the recognition of the written word prinCIpio (beginning) but that the same prime inhibited lexical access to the word PRINcipe (prince). In contrast, Cooper, Cutler, and Wales (2002) showed that a prime such as ADmi facilitated the identification of the English word ADmiral but that the same prime did not inhibit a word such admiRAtion. This shows that lexical stress is a more constraining feature in Spanish than in English during word recognition.

There is also evidence suggesting that lexical stress may not be as significant during word recognition in English. For example, Creel, Tanenhaus, and Aslin (2006) asked English-speaking participants to memorize nonwords associated with nonsense figures. After learning the word–figure associations, participants engaged in a four-alternative forced choice (4AFC) task where the target (e.g., /BOsapeI/) and competitor (/BOsapaI/) shared onset and lexical stress patterns or shared onset but mismatched on lexical stress (/KAdazu/ and /kaDAzei/). Lexical stress mismatches did not reduce the level of confusion between target and competitor, relative to the condition where target and competitor matched for stress and onset, suggesting that English speakers use segmental information but not lexical stress during word recognition.

One inference that can be drawn from these findings is that, for English speakers, suprasegmental lexical stress cues may be treated as “noise” and, therefore, not encoded into long-term memory when a foreign language is encountered for the first time. However, research using the recognition memory paradigm has shown that “peripheral” and contextual information are retained even when the task is to recognize the more “central” information. For example, people retain indexical properties of spoken words, such as voice attributes, when comparing performance between conditions where the same talker and word are presented during the study and test phases, versus a different talker but the same word during the test phase (e.g, Goh, 2005; Goldinger, 1996). Similar findings have also been found with melody recognition, where melodies with the same or a different timbre or format between study and test phases were manipulated (e.g., Lim & Goh, 2012, 2013; Peretz, Gaudreau, & Bonnel, 1998). In all cases, the same condition, whether it is talker, timbre, or format, elicited more yes responses (i.e., the stimuli were recognised as previously studied) than did the different (or opposite) condition during recognition, suggesting that these features are retained in memory.

The focus of the present study is on whether English speakers encode pitch and intensity cues when they memorize Spanish words. Note that, in Spanish, the quality and duration of the vowel is the same in stressed and unstressed syllables. Thus, vowel quality and duration changes cannot be used to cue lexical stress. Following the design used in the previous studies on recognition memory, English speakers memorized spoken Spanish words (e.g., DUcha) and were then tested with words spoken with the same lexical stress (i.e., DUcha) or the opposite stress (i.e., duCHA). If English speakers are sensitive to the suprasegmental cues of pitch and intensity, they should retain the lexical patterns of the studied Spanish words and be able to discriminate DUcha from duCHA. However, what would happen when English speakers are requested to study cognate words that differ in lexical stress between English and Spanish (e.g., LOcal and loCAL, respectively)? Would they remember loCAL (correct in Spanish) or LOcal (incorrect in Spanish but correct in English)? Since the retention of foreign lexical stress cues may be dependent on word-form similarity with the native language, it is important to also consider the similarities and differences between English and Spanish word forms and the potential influences on recognition performance.

Native language influences

English and Spanish differ in lexical stress distributions. In English, stress tends to fall on the first syllable of a word (Arciuli & Cupples, 2006; Cutler & Carter, 1987; Jusczyk, Houston, & Newsome, 1999); hence, English has a relatively fixed trochaic-stress pattern (Van Donselaar et al., 2005). English speakers seem to be sensitive to such a distributional stress bias. For example, English-speaking adults stress the first syllable of disyllabic nonwords (that simulate nouns in sentences) because 94 % of disyllabic words possess trochaic stress (Kelly & Bock, 1988). Sanders, Neville, and Woldorff (2002) showed that English-speakers spotted phonemes more accurately when these were located in trochaic words; in contrast, Spanish speakers did not show this lexical stress bias.

Moreover, there are differences in the extent to which lexical stress assignment is associated with structural properties of the syllables. In English, consonant clusters in a syllable determine lexical stress (Kelly, 2004). This is not the case for Spanish. For example, the cognates dragón (dragon) and cristal (crystal) are stressed on the first syllable in English and on the second syllable in Spanish, showing that syllabic weight is not an indicator of lexical stress in Spanish. However, although lexical stress placement in Spanish words has been labeled as nonpredictable (Peperkamp et al., 2010), stress placement is not totally free in Spanish. Disyllabic nouns ending with a vowel tend to be trochaic, while those ending with a consonant tend to be iambic (Archibald, 1993; Gutiérrez-Palma & Palma-Reyes, 2008). For example, cognate words such as poni (pony), polo, and kilo, are all pronounced with trochaic-stress in both English and Spanish, whereas words such as actor, doctor, and local are trochaic in English but iambic in Spanish. Guion, Harada, and Clark (2004) showed that proficient Spanish–English bilinguals did not use syllabic structure to place lexical stress in (English) nonwords as monolingual English speakers did, and Archibald showed that native Spanish speakers were influenced by Spanish lexical stress patterns while reading English words aloud. This suggests that the structure of the first language has a strong effect on the lexical stress processing of the second language. The second research question of this study was whether or not English lexical stress patterns would affect lexical stress recognition of newly learned Spanish words.

The influence of the native language on the processing of foreign words could be directly examined by looking at the differential effects of cognate and noncognate words. Cognate words are words that have forms that are perceptually similar in different languages (De Groot & Nas, 1991). English and Spanish share multiple cognates. However, many of them differ in the position of lexical stress (e.g., ACtor and acTOR, in English and Spanish, respectively). It is possible that Spanish cognate words might activate their English translations and affect the encoding and subsequent recognition of the Spanish words. Evidence for the automatic activation of cognate words in two languages was found by Strijkers, Costa, and Thierry (2010), who used electroencephalography measures to examine cognate and word frequency effects during the course of lexical access. Spanish–Catalan bilinguals and Catalan–Spanish bilinguals performed a picture-naming task in Spanish. For both groups of bilinguals, cognates and high-frequency words were named faster than noncognates and low-frequency words. Cognate event-related potentials’ (ERPs) amplitudes started to diverge from noncognate ERPs as early as 190 ms after picture onset. Likewise, high-frequency words departed from low-frequency words at 180 ms after picture presentation. These results led Strijkers et al. to hypothesize that, upon uttering or hearing a cognate word, both the current lexical representation and its translation are strongly activated. As a result, cognates behave like high-frequency words, since both are activated often. Moreover, they suggested that the cognate effect may just be a word frequency effect, wherein cognate words are high-frequency words and noncognates are low-frequency words. As such, Strijkers et al.’s findings indicate that our participants may activate English translations upon hearing Spanish cognates and this may affect the extent to which foreign stress patterns are encoded and recognized. In our experiments, participants studied Spanish trochaic- and iambic-stress cognates (e.g., MANgo and soLAR, respectively). Note that all the English counterparts had trochaic stress (i.e., MANgo, SOlar). By using cognates, we investigated whether the English lexical representations could affect the memorization of the lexical stress patterns of Spanish words. We were particularly interested in observing whether Spanish words, such as loCAL, would be incorrectly recognized when the word LOcal was presented, since LOcal has the same trochaic-stress as in English.

The present approach

The general design of the recognition memory task used in the present study is summarized in Table 1. Participants memorized Spanish cognates with trochaic (e.g., MANgo) and iambic stress (loCAL) and noncognates with trochaic (SAStre) and iambic (viaJAR) stress. During the recognition phase, test words were presented in three conditions: same stress, where words had the same stress patterns as were studied; opposite stress, where the stress was switched (manGO, VIAjar); and nonstudied words (which also included cognates and noncognates, both with trochaic and iambic stress).

Table 1 General design and stimuli examples

For ease of exposition, we outline the predictions for the proportion of yes responses here, but it should be noted that the general logic should apply similarly for the analysis of response latencies too. Response latencies were measured because chronometric measurements have traditionally been used in the study of the course of lexical access (Strijkers et al., 2010). Lexical access is an automatic process, and reaction time is a better index of automatic activation processes than is accuracy (Johnson & Hasher, 1987). Moreover, response latencies are capable of revealing differences between groups that are not evident using accuracy or error analyses (e.g., Ehrich & Meuter, 2009).

To determine whether foreign lexical stress is retained in memory and used in the recognition process, the critical comparisons are between performance in the same-stress and opposite-stress conditions. If lexical stress is used, the proportion of yes responses in the same-stress condition should be higher than that in the opposite-stress condition. However, if the foreign stress is not used, there should be no difference between the same-stress and opposite-stress conditions, since the segmental information is the same in these two conditions.

The extent to which there is a native-language (English) bias on lexical stress processing will be examined by the pattern of results with respect to the cognates and noncognates and words with trochaic and iambic stress. If activation of the English lexical representation influences the recognition process, yes responses to cognates should be higher and quicker than noncognates, because cognates have an existing lexical representation and a higher word frequency than the (newly learned) noncognates (Strijkers et al., 2010). Similarly, if English suprasegmental similarity has an influence, yes responses should be higher and quicker to trochaic words than to iambic words. Additionally, if both segmental and suprasegmental similarity interact, the strongest influence might be expected for trochaic cognates, since these are the closest match with English words.

Two experiments were originally conducted, but they are presented as a single study, since the general pattern of results was similar across both studies. The difference and motivation for the two experiments are described in the Procedure subsection below.

Method

Participants

A total of 134 students from the National University of Singapore who were native Singapore-English speakers participated for course credit or as volunteers. As a requirement of the country’s education system, most of them were also bilingual in Mandarin Chinese. All reported no hearing impairment and had not studied Spanish or another Romance language. Fifty of these participants did not do the main experiments but participated in tasks for stimuli selection, as described below. The rest participated in the two main experiments, with 42 in each one.

Materials

An initial list of 355 Spanish words made of cognates and noncognates that were disyllabic and contained from four to seven phonemes were obtained by consulting Spanish dictionaries. The words were digitally recorded in 16-bit mono, 44.1-kHz, .wav format, and root-mean-square amplitude levels were normalized across items. Words were spoken by a female native Castilian-Spanish speaker.

Twenty-one participants listened to the list of words and were asked to guess their meaning according to its phonological similarity with any known English word. Whenever a Spanish word reminded them of an English word, they were asked to type the English word on the keyboard; otherwise, they were to type “x” to indicate its lack of similarity with any known English word. This was done to assess the proportion of participants who perceived the words as cognates and noncognates. We selected only those cognate words that could be matched with their corresponding English counterparts (e.g., mango) reliably. The mean correct matching rate between Spanish cognates and the intended English counterparts that were chosen was 90 % (SD = 6). Additionally, 20 different participants were asked to rate the chosen cognate words for familiarity on a 7-point scale (with 7 indicating that the participant knows the meaning of the word). The average rating was 6.89 (SD = 0.12). So, the selected cognate words were clearly associated with well-known English words. Conversely, noncognates (e.g., mujer) were chosen such that most participants responded “x” (M = 90 %, SD = 6) when asked about their similarity to English words. Noncognates were reliably nonwords to the participants.

The selected stimuli consisted of 72 Spanish words (18 trochaic-stress cognates, 18 iambic-stress cognates, 18 trochaic-stress noncognates, and 18 iambic-stress noncognates [see the Appendix]). Note that all the English cognate counterparts (e.g., MANgo, LOcal) had trochaic stress, which was confirmed by checking against the MRC (Wilson, 1988) and CELEX (Baayen, Piepenbrock, & van Rijn, 1993) databases. Hence, if the participants had studied the Spanish word loCAL (iambic stress) but recognized the incorrect word LOcal (trochaic stress, as in English) as a studied word, it could be argued that they disregarded the Spanish lexical stress pattern and were biased toward the English pattern.

For each of the 72 words, three tokens were recorded by the same Spanish speaker who recorded the original 355 words. The first token (e.g., POni) was used in the study phase of the experiment. The second token was pronounced with the same lexical stress as the first token, and it was used in the recognition phase (POni). Different tokens for each word were necessary in order to create tokens that were not physically identical to each other in the study and recognition phases. The third token was pronounced with opposite lexical stress (poNI) and was also used in the recognition phase.

In order to ensure that the intended lexical stress of the selected words was perceived properly, 9 different participants heard the words of the first token and indicated the stressed syllable of each word by typing 1 (first part of the word) or 2 (last part of the word). For example, the correct response for SAStre is 1. Across all words, the average correct stress assignment was .78 (SD = .16). There were no differences between the correct attribution of lexical stress for cognates (M = .76, SD = .18) and noncognates (M = .80, SD = .15), as shown by a between-subjects analysis by items, F(1, 71) = 1.08, MSE = .03, p = .30. The second and third tokens were not subjected to the lexical stress identification task. However, we measured pitch and intensity levels (as in Peperkamp et al., 2010; Reinisch, Jesse, & McQueen, 2010, 2011) and found that words’ stressed syllables had higher pitch and intensity than did unstressed syllables in all three tokens (see the analyses below). Table 2 shows the characteristics of the vowels in stressed and unstressed syllables of the three spoken tokens.

Table 2 Mean values for pitch, amplitude, and duration of vowels in stressed and unstressed syllables of the tokens

For all acoustic measures, separate comparisons were made between the first and second tokens and between the first and third tokens. This was done to show that for lexical stress, the first token used in the study phase was similar to the second token but differed from the third token used in the test phase, as intended by the experimental manipulations.

Pitch

The average pitch between the first and second tokens was very similar, F(1, 71) = 2.62, MSE = 250.69, p = .11, and both had stressed syllables higher in pitch than did unstressed syllables, F(1, 71) = 80.76, MSE = 972.63, p < .001. No interaction between tokens and syllabic stress was found, F(1, 71) = 2.67, MSE = 259, p = .11, which indicates that the first and second tokens were similar in terms of pitch differences between stressed and unstressed syllables. The comparison between the first and third tokens revealed a significant interaction, F(1, 71) = 150.93, MSE = 464.52, p < .001. Simple effects indicated that the syllables of the first token were opposite in stress to the corresponding syllables of the third token (all ps < .001), as intended. For example, pan in PANda was higher in pitch than pan in panDA, and da in PANda was lower in pitch than in panDA.

Intensity

The average loudness of the words in the first and second tokens was statistically equal, F < 1. Stressed syllables were louder than unstressed syllables, F(1, 71) = 14.98, MSE = 60.28, p < .001. No interaction between tokens and intensity levels in stressed and unstressed syllables was found, F < 1. In contrast, a significant interaction indicated that the syllables of the first and third tokens differed as intended, F(1, 71) = 26.16, MSE = 20.36, p < .001. Simple effects analyses showed that unstressed syllables of the first token (da in PANda) were softer than their counterpart in the third token (da in panDA), p < .001, and stressed syllables in the first token (pan in PANda) were marginally louder than those in the third token (pan in panDA), p = .06.

Duration

The duration analyses showed that the first and second tokens had similar syllabic durations (all ps > .16). The comparison between the first and the third tokens showed that the words had the same average duration, F < 1. However, there was a significant effect of syllabic duration, F(1, 71) = 6.26, MSE = .01, p = .02, and a significant interaction between token and syllable, F(1, 71) = 83.80, MSE = .00, p < .001. Simple effects revealed no differences in duration between stressed and unstressed syllables within the words of the first token, p = .19, but a significantly longer duration in stressed syllables than in unstressed syllables in the third token, p < .001. Ideally, third tokens should have had vowels of similar duration across stressed and unstressed syllables, as in the first and second tokens. Potential implications will be discussed in the Discussion section.

Design and procedure

The 72 stimuli were randomly assigned to three lists of 24 words each. Each list was made of 12 cognates (6 with trochaic stress and 6 with iambic stress) and 12 noncognates (6 with trochaic stress and 6 with iambic stress).

In the experiments, two out of the three lists of words were used in the study phase (i.e., 48 words). Then the 48 studied words were presented in the recognition phase: 24 words had the same lexical stress patterns as in the study phase (same-stress condition), and 24 studied words had the opposite lexical stress (opposite-stress condition). The unused list of 24 words was included in the test phase as new words (nonstudied-words condition). The three lists had half of their cognates and noncognates pronounced with trochaic stress and the other half with iambic stress. The three lists of words were rotated across the study and recognition phases, using a balanced Latin-square procedure, and each participant was assigned randomly to one of six versions of the experiment.

The experiment was programmed in E-Prime 1.2. The instructions in the study phase requested participants to memorize 48 words, one by one, and to focus on each presentation. They were also asked not to rehearse previous trials. This was to avoid subvocal rehearsal of previous words that could interfere with the encoding of the word currently being heard. Participants were not informed about the purpose of the experiment. English translations of the Spanish words were not provided.

The words were presented binaurally through Beyerdynamic DT150 headphones. The order of word presentation in both the study and recognition phases was randomized for each participant. In the study phase, each word (total 48) was repeated thrice sequentially, with 1 s between repetitions and 3 s between different words. In the recognition phase, participants heard each word once (72 in total). Upon hearing a word, a message on the screen asked the participants to indicate as quickly and accurately as possible whether they had studied that word or not. Participants used a PST Serial Response Box (Schneider, Eschman, & Zuccolotto, 2002), with the right-most button labeled YES, and the left-most button labeled NO. Participants were asked to press YES if the word was presented during the study phase and it was pronounced exactly as it was presented during the study phase. They were instructed to press NO if the word was studied but sounded different or if the word was not studied before. No information about how different the word could sound, or about lexical stress changes, was provided.

Although the standard recognition memory paradigm is to have yes responses for half the trials, the present design has yes responses on only one third of the trials. However, this categorization of a yes response is relative to only one specific hypothesis—that participants are able to use lexical stress during discrimination and, hence, the response for the opposite-stress condition should be no. If the competing hypothesis that they are unable to do so is correct, then yes responses would be expected for two thirds of the trials, since the expected response would be yes for the opposite-stress condition. Since the nature of the responses in the opposite-stress condition is the precise empirical question that is being tested, it was important to ensure that the number of trials in same-stress (yes response expected under both hypotheses) and nonstudied (no response expected) conditions are equal to prevent bias in these two conditions that would serve as the basis for comparison with performance in the critical opposite-stress condition.

During the debriefing in Experiment 1, many participants commented that they had expected segmental changes (such as changes in some vowels or consonants) when they were informed, in the recognition phase, that they were about to hear words presented during the study phase but some of them sounded different. In fact, all the participants reported being unaware of lexical stress changes. Experiment 2 was run to ensure that participants were aware of the importance of lexical stress when performing the recognition task. At the beginning of the recognition phase, participants were explicitly informed that the lexical stress for some of the studied words had been changed and that this change meant that the word was not a studied word. One example was given with a cognate word not used during the study phase: If the word moRAL was presented during the study phase but they heard MOral during the test phase, they had to press NO in the response box because the stress of the word was different and, thus, it did not sound exactly the same as it was studied. Information about the lexical stress changes was provided only during the recognition phase, and not during the study phase.

Results

Unless otherwise stated, for all analyses, we conducted an omnibus four-way ANOVA with Experiment (1, 2) as the between-subjects factor and word type (cognate, noncognate), stress type (trochaic, iambic), and condition (same stress, opposite stress, nonstudied) as within-subjects factors. Significant main effects are reported, but when these are qualified by higher-order interactions, the interpretations were based on the latter in terms of the implications for the expected outcomes listed in the introduction.

Proportion of yes responses

Table 3 summarizes the yes response data. A main effect of word type, F(1, 82) = 125.62, MSE = .05, p < .001, and a main effect of condition, F(2, 164) = 530.13, MSE = .05, p < .001, were found. These were qualified by three significant interactions, which we interpret below.

Table 3 Mean probability of yes responses

The word type × condition interaction, F(2, 164) = 63.38, MSE = .04, p < .001, provided evidence for the encoding of foreign lexical stress and the influence of English word-form similarity. For cognate words, planned comparisons showed more yes responses in the same-stress (M = .83) than in the opposite-stress (M = .72) condition, F(1, 83) = 21.08, MSE = .02, p < .001, which were, in turn, higher than in the nonstudied (M = .14) condition, F(1, 83) = 1,054.83, MSE = .03, p < .001. For noncognates, there were more yes responses in the same-stress (M = .59) than in the opposite-stress (M = .43) condition, F(1, 83) = 39.56, MSE = .06, p < .001, which were, in turn, higher than in the nonstudied (M = .17) condition, F(1, 83) = 357.27, MSE = .03, p < .001. These patterns of results for both cognates and noncognates suggest that foreign lexical stress was encoded in memory. If participants had used only segmental information in the recognition process, there would not be a difference between the same-stress and opposite-stress conditions.

The rest of the comparisons examined the proportion of yes responses between cognates and noncognates at each condition. Cognates had more yes responses than noncognates in the same-stress, F(1, 83) = 93.23, MSE = .02, p < .001, and opposite-stress, F(1, 83) = 114.74, MSE = .03, p < .001, conditions. However, in the nonstudied condition, the proportion of yes responses was equal for cognates and noncognates, F(1, 83) = 2.24, MSE = .01, p = .14, and this was the reason for the interaction effect, as is illustrated in the left panel of Fig. 1. These results suggest that similarity with English word forms increased yes responses.

Fig. 1
figure 1

Proportion of yes responses in the different conditions for word type (left panel) and stress type (right panel)

A significant stress type × condition interaction, F(2, 164) = 3.52, MSE = .04, p = .03, as depicted in the right panel of Fig. 1, provided further evidence that lexical stress was encoded in memory. For trochaic words, planned comparisons showed more yes responses in the same-stress (M = .72) than in the opposite-stress (M = .58) condition, F(1, 83) = 27.34, MSE = .03, p < .001, which were, in turn, higher than in the nonstudied (M = .13) condition, F(1, 83) = 743.05, MSE = .03, p < .001. For iambic words, there were more yes responses in the same-stress (M = .70) than in the opposite-stress (M = .57) condition, F(1, 83) = 41.15, MSE = .03, p < .001, which were, in turn, higher than in the nonstudied (M = .18) condition, F(1, 83) = 536.20, MSE = .03, p < .001.

There was also an experiment × word type interaction, F(1, 82) = 9.85, MSE = .05, p = .002. This interaction was driven by the finding that while cognates had equal yes response rates across Experiments 1 (M = .57) and 2 (M = .55), F < 1, there were more yes responses for noncognates in Experiment 2 (M = .43) than in Experiment 1 (M = .36), F(1, 82) = 6.18, MSE = .02, p = .02. However, the cognate response rates were all higher than the noncognate responses across both Experiment 1, F(1, 41) = 111.25, MSE = .01, p < .001, and Experiment 2, F(1, 41) = 29.34, MSE = .01, p < .001. No other main effects or interactions were significant. The lack of an experiment × condition interaction and any higher-order interactions involving these independent variables suggest that awareness of the lexical stress manipulation did not have an effect on the encoding of foreign lexical stress.

Latencies for yes responses

The analysis of response latencies was meant to study lexical access to memorized words. Hence, latencies were analyzed only for yes responses in the same-stress and opposite-stress conditions, since the words in the nonstudied condition were never memorized. There were also very few yes responses for the nonstudied conditions.

Table 4 summarizes the data. Latencies exceeding 2.5 SDs from each participant’s respective means were removed (5.7 % of data).

Table 4 Mean response latencies (in milliseconds) for yes responses

There were main effects of word type, F(1, 66) = 37.80, MSE = 75724.18, p < .001, stress type, F(1, 66) = 18.93, MSE = 20,598.37, p < .001, and condition, F(1, 66) = 4.58, MSE = 51,805.87, p = .04. These were qualified by two significant interactions involving the word type factor. Both interactions provided evidence for the influence of English word-form similarity.

As is illustrated in the left panel of Fig. 2, the word type × condition interaction, F(1, 66) = 4.32, MSE = 30,551.12, p = .04, was driven by the finding that for cognates, latencies did not differ between the same-stress (M = 1,288) and opposite-stress (M = 1,299) conditions, F < 1, while for noncognates, same-stress (M = 1,402) was faster than opposite-stress (M = 1,475), F(1, 68) = 6.50, MSE = 25,745.86, p = .01. This suggests that when the word forms were not similar to English, processing was disrupted by foreign stress patterns that were the opposite to what was studied, which is evidence for the encoding of foreign lexical stress. For both same-stress, F(1, 79) = 32.60, MSE = 16,076.20, p < .001, and opposite-stress, F(1, 70) = 29.06, MSE = 36,723.92, p < .001, conditions, there were faster responses for cognates than for noncognates, suggesting that similarity to English word forms facilitated processing.

Fig. 2
figure 2

Response latencies for word type in the different conditions (left panel) and the stress-type words (right panel)

As is illustrated in the right panel of Fig. 2, the word type × stress type interaction, F(1, 66) = 19.03, MSE = 38,602.68, p < .001, was driven by the finding that for noncognates, latencies did not differ between trochaic (M = 1,448) and iambic (M = 1,428) words, F < 1, while for cognates, trochaic words (M = 1,229) were responded to faster than iambic words (M = 1,356), F(1, 81) = 45.16, MSE = 13,039, p < .001. This suggests that maximal similarity to English word forms in terms of both segmental and suprasegmental features facilitated processing. For both trochaic words, F(1, 74) = 52.43, MSE = 31,302.85, p < .001, and iambic words, F(1, 73) = 6.74, MSE = 22,249.99, p = .001, there were faster responses for cognates than for noncognates, again suggesting that English word form similarity facilitated processing. No other main or interaction effects were significant.

Discrimination and criterion measures

Table 5 summarizes the d′ and C scores. The former measures how well participants were able to discriminate old from new words. The latter measures the tendency to respond old or new. The present design had one hit rate (yes responses in the same-stress condition) and two false alarm rates (yes responses in the opposite-stress and nonstudied conditions). Therefore, two d′ scores could be computed. The first, based on the hit rate and the nonstudied false alarm rate, measures how well participants could discriminate same-stress words from nonstudied words (labeled as the SN condition in the rest of the discussion). SN scores give an indication of the extent to which old words (in the form that was studied) could be discriminated from new words. The second, based on the same-stress hit rate and the opposite-stress false alarm rate, measures how well participants could discriminate same-stress words from opposite-stress words (labeled SO). SO scores give an indication of the extent to which participants could use the foreign lexical stress patterns and not just the segmental patterns during recognition. The same general logic applies to the computation of C scores from the yes response rates.

Table 5 Mean d′ and C values

Prior to the omnibus ANOVA, we determined whether the d′ scores in Table 5 were significantly above zero. A d′ of zero indicates that no discrimination was evident: Participants could not distinguish between old and new items. One-sample t-tests showed that all but one of the d′ scores were significantly above 0, ts(41) > 2.50, ps < .05, indicating that participants were able to discriminate same-stress from opposite-stress and nonstudied words in almost all conditions. In particular, the d′ scores in the SO conditions implies the encoding of foreign lexical stress, since that is the only way participants could discriminate same-stress from opposite-stress words. The one exception was a nonsignificant result for the cognate–trochaic SO condition in Experiment 1, t(41) = 1.32, p = .19. Recall that participants in Experiment 1 may have expected segmental changes, and so it is possible that when the cognate–trochaic foreign words were maximally similar to English words, lexical stress was not used during recognition, resulting in yes responses to the opposite-stress words. It is worth noting that all the other d′ scores for Experiment 1 suggest some level of discrimination and, hence, use of foreign lexical stress, despite the participants’ expectations.

Turning to the omnibus ANOVA, main effects of word type, F(1, 82) = 18.81, MSE = .77, p < .001, and condition, F(1,82) = 486.31, MSE = .45, p < .001, and the word-type × condition interaction, F(1, 82) = 89.32, MSE = .33, p < .001, were observed. The left panel of Fig. 3 shows the nature of the interaction. In the SN condition, cognates (M = 1.85) were better discriminated than noncognates (M = 1.14), F(1, 83) = 96.30, MSE = .22, p < .001; however, in the SO condition, there was no advantage for cognates (M = .29) over noncognates (M = .42), F(1, 83) = 2.00, MSE = .32, p = .16. The lack of a cognate advantage in the SO condition is not surprising, since the segmental information is the same between same-stress and opposite-stress words. As is shown in the right panel of Fig. 3, a similar pattern emerged in the stress type × condition interaction, F(1, 82) = 6.11, MSE = .28, p = .016. In the SN condition, there was better discrimination for trochaic (M = 1.61) than for iambic (M = 1.39) words, F(1, 83) = 7.81, MSE = .26, p = .006; however, in the SO condition, there was no advantage for trochaic (M = .37) over iambic (M = .35) words, F < 1. In addition, discrimination was better for SN than for SO for both trochaic words, F(1, 83) = 363.34, MSE = .18, p < .001, and iambic words, F(1, 83) = 249.28, MSE = .18, p < .001. No other main effect or interaction effects were significant.

Fig. 3
figure 3

d' values for word type (left) and stress type (right) in the different conditions

Turning to C scores, main effects of word type, F(1, 82) = 122.33, MSE = .31, p < .001, and condition, F(1, 82) = 486.31, MSE = .11, p < .001, and the word type × condition interaction, F(1, 82) = 89.32, MSE = .08, p < .001, were observed. The interaction, as is shown in the left panel of Fig. 4, is driven by the finding that the C scores were much more liberal for cognates (M = −.72) than for noncognates (M = −.03) in the SO condition, F(1, 83) = 177.47, MSE = .11, p < .001, than was the difference between cognates (M = .06) and noncognates (M = .33) in the SN condition, F(1, 83) = 31.33, MSE = .10, p < .001. Negative values (or a more liberal criterion) imply task difficulty (Rotello & Macmillan, 2008, p. 63) and is suggestive of an attempt to obtain as many hits as possible (resulting in more false alarms too). Clearly, the SO condition would be difficult, since the only difference is in the lexical stress placement and the similarity to English segmental forms increased the recognition difficulty. As is depicted in the right panel of Fig. 4, word type also interacted with experiment, F(1, 82) = 6.57, MSE = .31, p = .012. For cognates, the C difference between Experiments 1 (M = −.35) and 2 (M = −.31) was not significant, F < 1; but for noncognates, C was less conservative for Experiment 2 (M = .06) than for Experiment 1 (M = .24), F(1, 82) = 4.95, MSE = .13, p = .03. The awareness of the lexical stress factor during recognition in Experiment 2 could have made the task less difficult for participants, who then adopted a less conservative criterion.

Fig. 4
figure 4

C values for word type in the different conditions (left) and for the two experiments (right)

Discussion

We first discuss the findings with respect to the extent to which similarity between the foreign words and English word forms influenced the recognition process. The pattern of results indicates that there was a strong influence. Across both same-stress and opposite-stress conditions, word form factors that were more similar to English (i.e., cognate word type and trochaic stress type) consistently had more yes responses than did the word form factors that were less similar to English (i.e., noncognate word type and iambic stress type). There were also generally faster yes responses for cognates than for noncognates, with the fastest for cognate words with trochaic stress. The latter words matched segmentally and suprasegmentally with existing English words and were recognized faster than cognates that only matched segmentally (i.e., iambic-stress cognates), demonstrating that maximal similarity to English word forms facilitated processing.

Although word form similarity influenced the performance, two outcomes in the discrimination findings suggest that lexical stress also played a role. Condition (SN vs. SO) interacted with word type (cognate vs. noncognate) and also with stress type (trochaic vs. iambic). The advantage of cognates over noncognates and of trochaic over iambic words occurred only in the SN condition, where discrimination was between same-stress words and nonstudied words. The advantages of English word form similarity was not evident in the SO condition, where discrimination was between same-stress and opposite-stress words. This suggests that in the critical same-stress versus opposite-stress conditions, where discrimination can take place only on the basis of suprasegmental differences, English word form similarity did not provide any processing advantage.

A number of other findings suggest that foreign lexical stress features, such as pitch and intensity changes in stressed and unstressed syllables, were used by the English speakers to recognize newly learned Spanish words. First, discrimination scores for all but one of the conditions were significantly above zero, suggesting that participants were able to discriminate studied Spanish words and new words, including those that had the original lexical stress patterns switched. Second, the proportion of yes responses was higher in the same-stress condition than in the opposite-stress condition, and this was true for cognates and noncognates, as well as for trochaic- and iambic-stress words. Same-stress and opposite-stress conditions shared segmental information and differed only on suprasegmental information. Therefore, performance differences between these conditions can be attributed only to the use of suprasegmental features, which, in turn, suggests that some elements of these features must have been encoded in memory and used during the recognition process. Finally, latencies increased in the opposite-stress condition, as compared with the same-stress condition, for yes responses to noncognates, suggesting that when the word forms were not similar to English, participants encoded the foreign stress pattern in memory, and processing was disrupted when the opposite-stress pattern could not match the original studied trace.

It should be acknowledged, however, that the level of discrimination between same-stress and opposite-stress words, although statistically significant, is low relative to the discrimination between same-stress and nonstudied words, where the combined segmental and suprasegmental information is maximally different. Discrimination between same- and opposite-stress words was clearly difficult, as evidenced by the liberal C scores used in that condition. Nevertheless, the pattern of results cannot support a conclusion that lexical stress was completely ignored.

A limitation of the present study is that the tokens in the opposite-stress conditions had longer vowel durations in the stressed than in the unstressed syllables, a feature that is not typical of Spanish, which has similar vowel durations for both syllables (as was the case for the tokens in the study phase and same-stress conditions). Although it is possible that performance in the opposite-stress conditions may have been driven solely by this potential confound, the obtained results suggest that this is unlikely. If participants had focused solely on these duration differences, it would have made the iambic words in the opposite-stress condition more like the English word form (e.g., from loCAL to LOcal). Likewise, it would have made the trochaic words in the opposite-stress condition less like the English word form (MANgo to manGO). As such, one might expect more yes responses for iambic words, as compared with trochaic words, in the opposite-stress condition and a corresponding faster response time for iambic versus trochaic words. However, no evidence of these patterns was obtained, and the observed means in Tables 3 and 4 corroborate this.

Another issue relates to the fact that most of our participants were highly proficient English–Mandarin bilinguals. It might be argued that the present results may not apply to English monolinguals. However, we believe that the knowledge of Mandarin did not affect the results of this study. The first reason is that the results indicated facilitation for cognates, suggesting that English lexical representations were activated automatically. The second reason is that Mandarin is a tonal language and lexical stress distinctions are not evident in tonal languages (Akker & Cutler, 2003). Therefore, it is sensible to assume that our bilinguals must have used their experience with English and suprasegmental correlates of lexical stress, such as pitch and intensity, while processing stressed and unstressed syllables within words. Finally, previous research on the effects of word frequency and lexical competition during spoken word recognition has used Singaporean samples, and the results have yielded results comparable to those obtained with English monolingual samples (e.g., Suárez, Tan, Yap, & Goh, 2011).

The larger probability of yes responses for cognates than for noncognates indicated that the familiarity of cognate forms influenced processing. This is consistent with Masoura and Gathercole’s (1999) findings showing that stored knowledge of the phonological structure of the language facilitates the learning of new vocabulary. Costa, Santesteban, and Caño (2005) and Strijkers et al. (2010) gathered evidence indicating that cognates have facilitatory effects during lexical access and production; we showed facilitatory effects in memory (more retention of familiar forms). It is possible that Spanish cognates could have activated meaning in the participants’ English lexicon, given the fact that cognate words were chosen according to similarity and familiarity with existing English words. Van Donselaar et al. (2005) argued that word recognition occurs on the basis of segmental and suprasegmental matching processes between the acoustic signal and the lexical representation, which activate meaning. Although the present design does not test this directly, it is possible that cognate effects could be attributed to the deeper memory trace that would arise from activation of similar-sounding English words at the semantic level, in comparison with shallow memory traces for only phonological codes (Craik & Lockhart, 1972; Craik & Tulving, 1975). Strijkers et al. (2010) also suggested that whenever a cognate is heard, the lexical translation gets activated, and cognate words could be recognized faster because they had higher word frequencies than did noncognates. The response latency analyses conducted in our studies showed that cognates were recognized quicker than noncognates. These findings provide more evidence for the general facilitation effects of cognate words (Costa et al., 2005; Strijkers et al., 2010).

In comparison with cognates, noncognates were nonwords to our participants, and therefore, noncognates were not susceptible to lexical effects during word recognition. Gaskell and Dumay (2003) found that although exposure to nonwords created a durable episodic memory trace, those words did not show lexical competition (used as a test of lexicalization) at word recognition. Lexicalization of nonwords required a period of consolidation of about 1 week; after a week, lexical competition effects between the newly learned words and other phonologically similar words in the lexicon arose. This could explain why, in our experiments, memorization for noncognate words was significantly lower than for cognate words overall, as well as why response latencies for nonwords did not benefit from the English trochaic pattern (Kelly & Bock, 1988). Noncognates did not have any lexical representation, and therefore recognition must have been based on acoustic-phonetic properties exclusively.

While it has been acknowledged that lexical stress affects recognition, most spoken recognition models have not fully implemented lexical stress yet (e.g., TRACE [McClelland & Elman, 1986], Shortlist B [Norris & McQueen, 2008]). However, lexical stress is being incorporated into new models of visual word recognition and reading aloud (e.g., Arciuli, Monaghan, & Seva, 2010; Pagliuca & Monaghan, 2010; Zorzi, 2010). Moreover, in languages with different lexical stress cues and distributions, such as English and Spanish, lexical stress parameters may be different, so models need to be adjusted to the characteristics of each language, and predict possible interaction between languages during bilingual spoken word recognition.

The present results are not consistent with previous studies showing that lexical stress may not be used during word recognition in English. Differences in the tasks used—lexical decision in Cooper et al. (2002) and 4AFC in Creel et al. (2006)—make a direct comparison difficult. We note, however, that in Cooper’s study, the words were all English words, and so there were no foreign lexical stress patterns involved in the processing. In Creel’s study, although only nonwords were used, the acoustic features that were manipulated were those associated with English stress patterns, such as syllable length. In our Spanish words, vowel durations in stressed and unstressed syllables did not differ.

The present findings are, however, consistent with those in the wider literature on context effects in recognition memory (e.g., Goh, 2005; Goldinger, 1996; Lim & Goh, 2012, 2013; Peretz et al., 1998). In these studies, switching the speaker of the words and the timbre of the melodies from study to test, but keeping the identity the same, resulted in poorer recognition, as compared with the conditions in which both the central (word and melody identity) and peripheral (speaker and timbre) information were retained. These findings strongly suggest that these details are encoded in the episodic memory trace. Spanish has many words whose meaning depends on the allocation of lexical stress. English listeners seem capable of encoding foreign lexical stress patterns. Future studies could investigate whether the inhibitory effects of lexical stress during word recognition found in Spanish (see Soto-Faraco et al., 2001) would also appear throughout the course of English speakers becoming bilinguals in Spanish.

In summary, the pattern of results suggests that Spanish words’ lexical stress is encoded in memory and used during recognition by English listeners. Word form similarity between Spanish and English also influenced recognition performance, with words that matched segmentally (cognates) and words that matched suprasegmentally (trochaic-stress words) facilitating processing and recognition. However, in the conditions where discrimination was difficult (between same-stress and opposite-stress words), no processing advantage was seen for word form similarity, suggesting that discrimination could have been done only by the use of foreign stress patterns in the memory trace.