As language learners develop proficiency through experience, words are accessed with increasing speed and accuracy. According to some models, such increased processing efficiency comes from episodic learning events in which exposure leads to stronger representations (e.g., Reichle & Perfetti, 2003). Thus, with each exposure to a word, its association with its concept is strengthened. As a consequence, earlier-acquired and more-frequent words accrue more experience and become more strongly associated with their concepts, and therefore less difficult to access. Similarly, a more-experienced and more-proficient speaker will have stronger word–concept associations than will a less-proficient speaker for any given set of words, and will access them more easily (e.g., Kroll & Stewart, 1994). Thus, both the proficiency of a language user and the difficulty involved in accessing a word are largely products of experience. In the present study, we investigated whether language proficiency and item difficulty influence translation performance differently. In particular, we investigated these learning phenomena using a repetition-priming paradigm, in which the effects of experimental exposures can be studied by measuring increments in learning from exposure n to exposure n + 1, where n varies across participants and words. We report two bilingual word translation experiments that allowed for direct comparisons of the effects of participant proficiency and item difficulty and how they moderate the effect of additional exposures.

Processes in bilingual word translation

In proficient bilinguals, translation in both directions (from the first language to the second and the reverse) is generally thought to be concept-mediated (e.g., Brysbaert & Duyck, 2010; de Groot, Dannenburg, & van Hell, 1994; de Groot & Poot, 1997; Duyck & Brysbaert, 2004; Francis, Augustini, & Sáenz, 2003; Francis & Gallard, 2005; La Heij, Hooglander, Kerling, & van der Velden, 1996; but see Kroll, van Hell, Tokowicz, & Green, 2010, for a discussion of the interpretation of translation direction). In concept-mediated translation, the target word is comprehended, and on the basis of its meaning or concept, a corresponding word in the response language is produced (Potter, So, von Eckardt, & Feldman, 1984). This method of translation entails the assumption that pairs of translation equivalents access common conceptual representations (see Francis, 1999, 2005, for reviews of this evidence; but see Tokowicz & Kroll, 2007; Tokowicz, Kroll, de Groot, & van Hell, 2002, for evidence that these representations may not overlap completely). In the present study, we evaluated and compared the effects of experiential variables on translation performance and repetition priming, to better understand the learning processes that occur preexperimentally and the learning processes that occur with experimental item repetitions.

Factors affecting translation response times and error rates

Translation from the less-proficient language (L2) to the more-proficient language (L1) is typically faster and more accurate than L1–L2 translation (e.g., Cattell, 1947; Chen & Leung, 1989; Francis & Gallard, 2005; Kroll & Stewart, 1994; Potter et al., 1984; Sholl, Sankaranarayanan, & Kroll, 1995), although some studies have shown the opposite or no difference (e.g., de Groot & Poot, 1997; Francis, Corral, Jones, & Sáenz, 2008; La Heij et al., 1996). (Note that here, we use L1 to refer to the more proficient of the two languages, which does not always correspond to the first language learned.) This effect, known as the translation asymmetry, is stronger in less-balanced bilinguals, and weaker in more-balanced bilinguals.

As was pointed out by Snodgrass (1993), concept-mediated translation yields a response time (RT) advantage for L2–L1 translation when the asymmetry across languages in the time to produce a word is stronger than the asymmetry in time to comprehend a word, thus producing a net effect in RTs favoring the L2–L1 translation direction. This idea was corroborated by Francis and Gallard (2005). The idea that comprehension and production asymmetries partially cancel each other out has support from comparisons of other tasks across languages. For example, the comprehension-based asymmetry observed with semantic classification of words was larger than the translation asymmetry (Francis et al., 2011; Potter et al., 1984). Also, the production-based asymmetry observed with picture naming was larger than the translation asymmetry (e.g., Francis et al., 2008; Potter et al., 1984; Sholl et al., 1995).

As proficiency in the L2 increases, translation accuracy in both directions increases, because it becomes more likely that L2 words will be comprehended for L2–L1 translation and successfully retrieved for L1–L2 translation. Translation speed and accuracy are affected by the same lexical properties that affect performance on other monolingual and bilingual verbal tasks. Specifically, word frequency and familiarity are correlated with translation RTs (de Groot et al., 1994; van Hell & de Groot, 1998). A common quality of these participant and lexical attributes is experience: Translation becomes faster as a person acquires experience using the words of a language, and it becomes faster across individuals for words that occur frequently in the language. In the present study, we used repetition priming to examine this learning process, because this procedure allowed for an examination of the increments in learning that occur following experimental exposures.

Repetition priming in word translation

Repetition produces facilitation in word translation at delays of several minutes (Francis et al., 2011; Francis & Gallard, 2005; Francis & Sáenz, 2007) and one week (Francis & Sáenz, 2007). The long-term nature of repetition priming indicates that a single experimental translation trial will produce sustained learning. Facilitation in repeated translation is based on both speeded comprehension of the stimulus word and speeded retrieval of the response word. Evidence for a word comprehension component has come from studies in which translation was facilitated by prior semantic categorization of the to-be-translated word or by drawing a picture to represent it (Francis et al., 2011), or by translating the word to a neutral third language (Francis & Gallard, 2005). Evidence for a word retrieval component has come from studies in which translation was facilitated by prior picture naming in the response language (Francis et al., 2003; Francis et al., 2011; Sholl et al., 1995) or by prior translation to the response language from a neutral third language (Francis & Gallard, 2005). Thus, experimental exposures lead to learning in both comprehension and production.

The present study

In the present experiments, we examined increments in learning for word translation using a repetition-priming methodology to explain the changes in processing that arise as a result of participant proficiency and lexical difficulty. The pre-experimental state of learning for a given word is expected to depend on both the proficiency of the participant and the difficulty of the word. Experimental exposures to words through translation are expected to lead to increments in learning, and the increments may depend on participant proficiency and item difficulty. Normally, the effects of participant proficiency and item difficulty cannot be compared directly, because they are not measured on the same scale. Here, we accomplished this by using empirical error rates for participants and for individual words.

The purpose of the present study was to examine how translation performance changes with bilingual proficiency and the learning status of particular words. We conducted two translation experiments with Spanish–English bilinguals and measured translation RTs and error rates, along with the effects of repetition priming from prior translation in the same or the opposite direction. According to the principle of transfer-appropriate processing (Morris, Bransford, & Franks, 1977; Roediger & Blaxton, 1987), memory transfer depends on the degree to which cognitive processes match at encoding and test. Therefore, repetition priming should be stronger when the translation direction matches from encoding to test than when it does not.

The methodology and results are organized as follows. First, each study is described individually, and the results are compared across studies. Second, the association between bilingual proficiency and performance is assessed by pooling data from words that were common to both studies and analyzing performance as a function of participant proficiency. Third, the association between item difficulty and performance is assessed by analyzing the Experiment 2 data as a function of item difficulty. Finally, we compare and contrast the effects of bilingual proficiency and item difficulty on RTs and repetition priming, with the goal of better understanding their influences on the learning processes that give rise to more efficient lexical access with experience.

Experiment 1: Penn State study

Method

Participants

The participants were 40 bilinguals proficient in English and Spanish (18 men, 22 women) with a mean age of 24.2 years. All were students at either Pennsylvania State University or a neighboring university in central Pennsylvania, a primarily English-speaking environment. According to self-report, 58 % had learned English first, and 42 % had learned Spanish first. On average, participants were first exposed to the L2 at 22.2 years and had 2.1 years of experience. According to self-ratings of proficiency, 73 % were English-dominant and 27 % were Spanish-dominant. (Participants who indicated equal proficiency were classified as being English-dominant.) Seven additional participants were excluded from the analysis, because they had learned another language during childhood or translated fewer than 50 % of the new items correctly.

Apparatus

Words were presented on the monitor of an IBM-XT computer, with the sequence of presentation and timing being regulated by a Turbo-Pascal program. A microphone attached to a voice relay was used to record vocal RTs.

Design

The independent variables were encoding-phase translation condition (English–Spanish, Spanish–English, or none) and direction of test-phase translation (English–Spanish or Spanish–English). Thus, the experiment had a 3 (encoding-phase translation direction) × 2 (test-phase translation direction) within-subjects design. The dependent variables were the mean test-phase RTs and error rates for each condition.

Materials

The stimuli were English and Spanish names for 80 objects chosen from the Snodgrass and Vanderwart (1980) picture set. The median English word frequency was 37 per million (Kučera & Francis, 1967), and the mean word lengths were 5.0 letters (SD = 1.5) for the English words and 5.6 letters (SD = 1.7) for the corresponding Spanish words. The mean English normative age of acquisition was 40 months (based on 66 words; Morrison, Chappell, & Ellis, 1997); the mean Spanish age of acquisition was 39 months (based on 32 words; Pérez & Navalón, 2005). Words were randomly assigned to eight sets of ten words, with the sets being matched on word length and frequency. One set was assigned to each of the four repeated conditions, and two sets were assigned as new items for each translation direction. The assignment of item sets to conditions was counterbalanced across participants.

Procedure

Participants were tested individually in sessions lasting approximately 30 min. The encoding phase had two blocks of translation trials, one in each direction, consisting of 20 practice trials (filler items) and 20 experimental trials. The test phase consisted of two 40-trial blocks, each containing ten words previously translated from English to Spanish, ten words previously translated from Spanish to English, and 20 new words. The language order was counterbalanced across participants and was consistent from encoding to test. On each trial, a fixation cross appeared, and the participant pressed a button to initiate presentation of the word to be translated. The word remained on the screen for 500 ms or until a response triggered the voice key, whichever came first, and was replaced by the fixation cross. After completing the computerized experiment, participants completed a language history questionnaire.

Results

Data processing

Invalid trials were eliminated from analysis. In the test phase, an average of 14.6 % of the trials (SD = 11.7 %) were removed because of translation errors (including “don’t know” responses), 1.5 % for machine timing errors, and 4.1 % for spoiled trials. Spoiled trials were those that had correct test-phase responses with valid times, but the prime status of the word was compromised because the prime-phase response was unacceptable (3.1 %) or had a machine timing error (0.8 %), or because the answer was given as an error response to another item (0.2 %). Trials with RTs greater than 4,500 ms, less than 250 ms, or more than 2.5 SDs from the mean of the correct trials were removed as outliers (2.0 % of trials). Thus, on average, 77.8 % of the trials were retained for analysis, approximately 10.4 trials per condition.

Encoding phase

Mean translation RTs and error rates are shown in Table 1. Because of the known effects of language dominance on translation RTs and error rates, data were recoded according to the dominant language, which was determined for each participant on the basis of self-reported proficiency ratings. L2–L1 translation was faster than L1–L2 translation, t(39) = 2.060, p = .046, and error rates were lower for L2–L1 translation than for L1–L2 translation, t(39) = 2.257, p = .030.

Table 1 Mean translation response times (RTs), priming scores (PR), and error rates (ER) as a function of experiment, translation direction, and encoding condition with all of the original items included

Test phase

As in the encoding phase, new-item RTs were faster for L2–L1 translation than for L1–L2 translation, t(39) = 3.458, p = .001. Priming scores were obtained by subtracting the RTs of the repeated conditions from the RTs of the new-item conditions (keeping final translation direction consistent) and are illustrated in Fig. 1. Repetition priming was statistically reliable for each of the four language combinations (all ps < .01). Priming scores for the four language combinations were analyzed using a 2 (encoding match) × 2 (final translation direction) repeated measures ANOVA. Priming did not benefit significantly from having the translation direction match from the encoding phase to the test phase, F(1, 39) = 1.531, MSE = 15,104, p = .223. However, priming was stronger by 156 ms when the final translation direction was from L1 to L2, F(1, 39) = 20.650, MSE = 46,907, p < .001. Direction match and final translation direction did not interact, F(1, 39) = 1.090, MSE = 24,415, p = .303.

Fig. 1
figure 1

Repetition priming as a function of direction match and final translation direction in the original analyses of Experiments 1 and 2

As in the encoding phase, error rates for new items were higher for L1–L2 translation than for L2–L1 translation, t(39) = 2.152, p = .038. Error-rate priming was restricted to items for which the translation direction changed from encoding to test (ps < .05) and was not observed among items for which the translation direction remained the same (ps > .50). The main effect of language match was statistically reliable, F(1, 39) = 24.867, MSE = .00846, p < .001, and it interacted with the final translation direction, such that the advantage of a mismatch was stronger for final L1–L2 translation, F(1, 39) = 6.349, MSE = .00772, p = .016. The main effect of translation direction was not statistically reliable, F < 1.

To summarize, L2–L1 translation was faster and more accurate than L1–L2 translation. Repetition priming was stronger for L1–L2 translation, but priming did not depend on whether the translation direction matched from encoding to test. Repeated words had lower error rates than new words, but only when the direction changed from encoding to test. This effect most likely derives from the fact that in the mismatched-direction conditions, the eventual correct responses were seen at encoding, making it possible to retrieve more-difficult words in a participant’s receptive vocabulary for test-phase production. In contrast, in the matched-direction conditions, seeing the stimulus word at encoding did not help to retrieve the translation at test.

Experiment 2: University of Texas at El Paso (UTEP) study

Method

Participants

The participants were 48 bilinguals proficient in English and Spanish (19 men, 29 women) with a mean age of 20.1 years. All were students at the University of Texas at El Paso, recruited primarily from introductory psychology courses. The El Paso–Juarez region on the U.S.–Mexico border is a bilingual community, thus providing ample opportunity for daily exposure to both English and Spanish. All participants reported Hispanic ethnicity. According to self-report, 90 % of the participants had learned Spanish first; 6 % had learned English first; and 4 % had learned Spanish and English simultaneously from early childhood. On average, participants were first exposed to the L2 at 7.9 years of age and had 12.2 years of experience. According to self-ratings of proficiency, 46 % were English-dominant and 54 % were Spanish-dominant. (Participants indicating equal proficiency were classified as being English-dominant.) They estimated that their language usage over the preceding month had been 48 % English, 42 % Spanish, and 10 % mixed; this pattern corresponded to 56 % dominant language and 34 % nondominant language. Fifteen additional participants were replaced because of failure to follow instructions or failure to translate at least 50 % of the items correctly in both languages.

Apparatus

Words were presented on the monitor of a Macintosh computer using programs written with PsyScope software (Cohen, MacWhinney, Flatt, & Provost, 1993). A PsyScope button box (New Micros, Dallas, TX) with a high-impedance microphone was used to record vocal RTs.

Design and materials

The design was very similar to but not identical to that of Experiment 1. The words were the names of 192 pictures from the Snodgrass and Vanderwart (1980) picture set. The median English word frequency for the experimental words was 14.5 occurrences per million (Kučera & Francis, 1967), and the mean word lengths were 5.4 letters (SD = 2.0) for the English words and 6.0 letters (SD = 1.7) for the corresponding Spanish words. The mean English normative age of acquisition was 49 months (based on 171 words; Morrison et al., 1997), and the mean Spanish age of acquisition was 48 months (based on 93 words; Pérez & Navalón, 2005). The items were randomly assigned to six sets of 32 items. These sets were rotated through the six experimental conditions across participants using a Latin square to control for specific item effects.

Procedure

The participants were tested individually in sessions lasting approximately 30 min. The encoding phase had two blocks of trials, each consisting of four practice and 64 experimental trials. The test phase consisted of two 96-trial blocks. Each block had 32 items previously translated from English to Spanish, 32 items previously translated from Spanish to English, and 32 new items, all randomly intermixed. The language order was counterbalanced across participants and was consistent from encoding to test. On each trial, the stimulus word appeared on the screen and remained until a response was given. After a 1,250-ms intertrial interval, the next word appeared. The experimenter noted unexpected responses and voice relay malfunctions on a worksheet containing the expected responses. After completing the computerized experiment, participants completed a language background questionnaire.

Results

Data processing

Invalid trials were eliminated from the analysis. In the test phase, an average of 19.3 % of the trials (SD = 6.6 %) were removed because of translation errors (including “don’t know” responses), 0.9 % for machine timing errors, and 8.8 % for spoiled trials. Here spoiled trials included those in which the prime-phase response was unacceptable (5.6 %), was acceptable but inconsistent with the test-phase response (1.4 %), or had a machine timing error (0.6 %), or in which the answer was given as an error response to another item (1.2 %). Another 3.6 % of the trials were removed as outliers, using the same criteria as in Experiment 1. Thus, on average, 67.4 % of the test-phase trials were retained for analysis, approximately 21.6 trials per condition.

Encoding phase

Mean translation RTs and error rates are shown in Table 1. As in Experiment 1, the data were recoded according to the dominant language. RTs for L2–L1 translation and L1–L2 translation did not differ significantly, t(47) = 0.687, p = .496, but error rates were lower for L2–L1 translation, t(47) = 4.198, p < .001.

Test phase

As in the encoding phase, new-item RTs did not differ significantly for L2–L1 and L1–L2 translation, t(47) = .401, p = .690. Repetition priming (illustrated in Fig. 1) was statistically reliable for each of the four language combinations (all ps < .001). Priming scores for the four language combinations were analyzed using a 2 (encoding match) × 2 (final translation direction) repeated measures ANOVA. Priming was stronger by 104 ms when the translation direction matched from the encoding phase to the test phase, F(1, 47) = 35.335, MSE = 14,737, p < .001. Priming was also numerically stronger when final translation was from L1 to L2, but this effect was not statistically significant, F(1, 47) = 1.621, MSE = 78,816, p = .209. Direction match and final translation direction did not interact, F(1, 47) = 1.825, MSE = 9,395, p = .183.

Error rates for new items in the test phase were similar for L1–L2 translation and L2–L1 translation, t(47) = 1.368, p = .178. Error-rate priming was restricted to items for which the translation direction changed from encoding to test. That is, error-rate priming was significant in the reversed-direction conditions (ps < .001), but not in the same-direction conditions (ps > .05). The main effect of direction match was statistically reliable, F(1, 47) = 30.117, MSE = .00443, p < .001. In the UTEP sample, this effect did not interact with final translation direction, F(1, 47) = 1.011, MSE = .00582, p = .320, nor did we find a main effect of final translation direction on error priming, F < 1.

To summarize, RTs were equivalent for the two translation directions, but L2–L1 translation was more accurate during the encoding phase. Repetition priming was stronger when the direction matched from encoding to test, but the direction of final translation did not have a reliable effect. Error rates were lower for repeated than for new items, but only when the direction changed from encoding to test. As we indicated previously, this effect was most likely due to the fact that the eventual correct responses had been seen at encoding.

Comparison of findings across Experiments 1 and 2

Encoding-phase and test-phase performance exhibited some common and some distinct patterns across experiments. For new items in the encoding and test phases, the Penn State bilinguals translated faster from L2 to L1 than from L1 to L2, but the UTEP bilinguals exhibited no translation asymmetry in RTs. In both groups, L1–L2 translation had a higher error rate than did L2–L1 translation, and repeated words were translated faster than new words. However, the patterns of priming differed. In the Penn State experiment, priming effects were stronger for final L1–L2 translation, but this trend was not significant in the UTEP experiment. In contrast, in the UTEP experiment, having the translation direction match from encoding to test yielded more priming than did a mismatch, but in the Penn State experiment, the direction match did not matter.

The most interesting discrepancy in the initial comparison of results was that in Experiment 1, priming was determined primarily by the direction of final translation, whereas in Experiment 2, it depended primarily on the direction match between initial and final translation. We explored whether the reasons might lie in differences across the experiments in the characteristics of the participants or the properties of the words (see Table 2). The Penn State experiment included bilinguals who were L1-dominant and lived in a primarily English-speaking environment, and that experiment included an easier set of translation stimuli. The UTEP experiment included earlier and more balanced bilinguals living in a bilingual environment and included more-difficult translation stimuli. The 126 nonshared UTEP items were of lower frequency and were less accurately translated than the shared items. These critical differences between the studies were logical candidates for explaining the different patterns of results obtained, and each of the differences was examined in the analyses reported in the following sections. These differences also suggest that the discrepancies between studies may be quantitative rather than qualitative, reflecting quantitative differences in the ranges of proficiency and item difficulty included, and therefore in the power to detect effects of translation direction and direction match (Table 3).

Table 2 Participant and stimulus information for Experiments 1 and 2
Table 3 Mean translation response times (RTs), priming scores (PR), and error rates (ER) as a function of experiment, translation direction, and encoding condition for items shared or not shared across the experiments

Translation performance and translation priming as a function of bilingual proficiency

The association between bilingual proficiency and translation performance was assessed systematically in an analysis of data from the 66 words that were shared across the experiments. The shared items had a median word frequency in English of 24 per million (Kučera & Francis, 1967). The samples of bilinguals who participated in the two experiments differed in proficiency, as was indicated by the mean encoding-phase error rates of 19.7 % in the Penn State sample and 9.7 % in the UTEP sample for the shared items, F(1, 87) = 18.62, MSE = .024, p < .001. Because the distributions for the two samples overlapped substantially, a continuous measure of proficiency was derived. Rather than rely on self-reports of proficiency or proxy measures for proficiency, such as age of acquisition or experience, bilingual proficiency was operationally defined as the overall accuracy in encoding-phase translation.

The individual shared-item data from both studies were combined for an analysis of test-phase RTs as a function of bilingual proficiency. An analysis of covariance (ANCOVA) was used to test the effect of proficiency and its interactions with translation direction and encoding condition. On the basis of previous findings, we expected increased proficiency in L2 to be associated with faster RTs because of stronger links between L2 words and their concepts. Specifically, we expected proficiency to affect L1–L2 translation to a greater extent than L2–L1 translation (e.g., Kroll, Michael, Tokowicz, & Dufour, 2002). On the basis of previous results showing stronger priming effects in picture naming for less-proficient speakers (e.g., Francis et al., 2008; Gollan, Montoya, Fennema-Notestine, & Morris, 2005), we also expected less-proficient bilinguals to exhibit stronger priming.

New-item RTs in the test phase were analyzed as a function of bilingual proficiency and final translation direction. The upper panel of Fig. 2 shows regression lines for new-item RTs in the test phase as a function of participant error rate. Less-proficient participants had longer RTs than did more-proficient participants, F(1, 86) = 5.685, MSE = 106,512, p = .019. The main effect of translation direction did not approach significance, F < 1. However, the effect of translation direction was stronger for less-proficient bilinguals, F(1, 86) = 4.905, MSE = 53,459, p = .029, such that less-proficient bilinguals translated faster from L2 to L1 than from L1 to L2. Stated differently, L1–L2 translation varied more as a function of proficiency than did L2–L1 translation. The correlations of proficiency with new-item RTs were .33 for L1–L2 translation (p = .002) and .07 for L2–L1 translation (p = .510).

Fig. 2
figure 2

New-item response times (RTs) and repetition priming as functions of bilingual proficiency. The plots show regression lines for the range of participant error rates observed

Priming was analyzed as a function of bilingual proficiency, direction of final translation, and language match. The lower panel of Fig. 2 shows regression lines for priming scores in each condition as a function of participant error rate. Repetition priming was stronger for the less-proficient participants, F(1, 86) = 9.951, MSE = 110,199, p = .002. Although the main effect of final translation direction did not approach significance, F(1, 86) = 1.043, MSE = 77,779, p = .310, we did find an interaction of proficiency and final translation direction on priming, F(1, 86) = 8.060, MSE = 77,779, p = .006. This interaction indicated that less-proficient bilinguals showed more priming for L1–L2 translation, but the difference was not evident in more-proficient bilinguals. That is, priming of L1–L2 translation was affected more by proficiency than was priming of L2–L1 translation. Priming was stronger when the translation direction matched from encoding to test, F(1, 86) = 7.436, MSE = 25,316, p = .008, but this effect did not interact with proficiency, F < 1. Final translation direction and direction match did not interact, nor was a three-way interaction evident, Fs < 1.

To summarize, with increased proficiency (decreased participant error rate), RTs decreased, and this effect was stronger for L1–L2 translation. For every 1 % change in participant error rate, we found a 1.7-ms change in RTs for L2–L1 translation, and an 8.2-ms change in L1–L2 translation RTs. Repetition priming also decreased as proficiency increased. Less-proficient bilinguals exhibited longer RTs and stronger priming for L1–L2 translation. These language effects disappeared for more-balanced bilinguals. When the direction of translation changed from encoding to test, priming was reduced, but this effect did not vary with bilingual proficiency. Thus, the apparently discrepant findings of the two studies seem to be due to quantitative rather than qualitative differences.

Translation performance and repetition priming as a function of item difficulty

The association between item difficulty and performance was assessed systematically in the UTEP data. A continuous measure of item difficulty was derived on the basis of encoding-phase translation accuracy. The stimuli in the UTEP experiment had a wide range of difficulty, with encoding-phase error rates ranging from 0 % to 97 %. For the item difficulty analysis, items were excluded if they had encoding-phase error rates of 75 % or more, or if any of the experimental conditions represented in the test phase had no valid trials. After excluding the 21 words that met these criteria, the mean error rate for the remaining 171 items was 17.1 % (SD = 18.4 %), and the median error rate was 9.4 %. The mean encoding-phase RTs for valid items ranged from 782 to 1,895 ms (M = 1,250 ms, SD = 230).

Several lexical characteristics, including familiarity and frequency, correlated with translation RTs and error rates (de Groot, 1992; de Groot et al., 1994). Less-familiar and less-frequent words were more difficult to translate than were more-familiar and more-frequent words. We considered using one of these measures, age of acquisition, or a composite as a proxy measure for difficulty. However, we reasoned that a more direct measure of item difficulty would be based on the performance of the very bilinguals who participated in the study, to make the measure based on their collective experience or knowledge of the items and their ability to access them, rather than on normative knowledge of the items in monolinguals.

Thus, item difficulty was operationally defined as encoding-phase translation accuracy, averaged across translation directions. Encoding-phase translation RTs (r = .61, p < .001) and error rates (r = .65, p < .001) were positively correlated across translation directions, and RTs and error rates were correlated with each other (r = .76, p < .001). Not surprisingly, encoding-phase translation accuracy was correlated with both word frequency (Kučera & Francis, 1967) and normative age of acquisition (Morrison et al., 1997). When word frequency and age of acquisition were log-transformed, these correlations were −.42 and .50, respectively. However, translation accuracy was a better predictor of RTs (L1–L2 translation RT, R 2 = .41; L2–L1 translation RT, R 2 = .39) than were log word frequency (R 2s = .14 and .18), log age of acquisition (R 2s = .09 and .12), or both considered together (R 2s = .18 and .24). Because the available frequency and age-of-acquisition norms are based on monolingual data, it should not be surprising that the empirically derived accuracy scores were better predictors of translation RTs. Another advantage of a difficulty measure based on error rates is that it can be mapped directly onto the participant-level error rates used to measure proficiency.

We expected RTs to increase with item difficulty, as has been seen in past research showing word frequency effects on translation (de Groot, 1992). Furthermore, we expected that more-difficult items would show larger priming effects, as has been seen in research on word frequency and age-of-acquisition effects on repetition priming in picture naming (e.g., Barry, Hirsh, Johnston, & Williams, 2001).

The inferential analyses of the effects of item difficulty were similar to those used in the proficiency analysis, except that the random factor was items (i.e., these are F 2 tests). Test-phase RTs for new items were compared across translation directions using ANCOVAs with difficulty as a covariate. The upper panel of Fig. 3 shows regression lines for the new-item RTs in the test phase as a function of item error rate. As expected, more-difficult items had longer RTs, F(1, 169) = 167.71, MSE = 137,908, p < .001. The correlations of difficulty with RTs were .659 for L2–L1 translation and .598 for L1–L2 translation (ps < .001). RTs did not differ significantly across the two directions of translation, nor did we find an interaction of direction of translation and difficulty, Fs < 1.

Fig. 3
figure 3

New-item response times (RTs) and repetition priming as functions of item difficulty. The plots show regression lines for the range of item error rates observed

Repetition priming was analyzed as a function of final translation direction and direction match in an ANCOVA with item difficulty as a covariate. The lower panel of Fig. 3 shows regression lines for priming scores in each experimental condition as a function of item error rate. Consistent with their longer RTs, more-difficult items exhibited a greater degree of facilitation than did less-difficult items, F(1, 169) = 88.887, MSE = 177,468, p < .001. Final translation direction did not affect priming, F(1, 169) = 1.257, MSE = 196,690, p = .264, and the effect of translation direction did not interact with difficulty, F < 1. The effect of direction match from encoding to test was reliable, F(1, 169) = 10.678, MSE = 58,692, p = .001, and the effect of direction match was greater for more-difficult items, as indicated by a significant interaction, F(1, 169) = 8.319, MSE = 58,692, p = .004. Final translation direction did not interact with direction match, and no three-way interaction was apparent, Fs < 1.

To summarize, as expected, with increased difficulty, RTs increased, and this effect was equivalent for the two translation directions. For every 1 % change in item error rates, we observed a 14.7-ms change in RTs for L2–L1 translation, and a 13.7-ms change for L1–L2 translation. Repetition priming also increased with increased item difficulty. Final translation direction had no reliable effect on priming at any difficulty level. When the direction of translation changed from encoding to test, priming was attenuated, with a stronger effect for more-difficult items.

General discussion

In the following sections, the effects of bilingual proficiency and item difficulty on RTs and repetition priming are compared and contrasted. We attempted to integrate these factors and the effects of repetition into an account of learning that could accommodate the similarities and differences. Finally, more general implications for repetition priming, theories of bilingual lexical access, and methodology are discussed.

Comparing and contrasting the effects of bilingual proficiency and item difficulty

Participant proficiency and item difficulty are both considered to reflect pre-experimental learning, and the distinction between them is not addressed in models of bilingual lexical processing. However, it would be a mistake to assume that these two variables affect processing in the same manner.

Response times

A comparison of the effects of bilingual proficiency and item difficulty on performance revealed interesting similarities and differences (see Table 4). Both participant proficiency and item difficulty affected RTs to new items. As is shown in Figs. 2 and 3, less-proficient participants and more-difficult items produced slower RTs. Participants’ proficiency interacted with translation direction for RTs, such that less-proficient bilinguals showed a greater translation asymmetry, and the asymmetry disappeared for the more-proficient bilinguals. This phenomenon can be explained with reference to the steeper proficiency slope for L1–L2 than for L2–L1 translation. For less-proficient bilinguals, the difference between L1 and L2 was greater in production speed than in comprehension speed, which likely accounts for the translation asymmetry (Francis, Durán, Sáenz, & Regalado, 2013; see Hanulovà, Davidson, & Indefrey, 2011, on L2 production). With more experience, if L2 word production speed improves more as a function of practice than does L2 word comprehension, it would be logical for L1–L2 translation to improve more quickly than L2–L1 translation.

Table 4 Effects of bilingual proficiency and item difficulty on translation performance

In contrast to the effects of participant proficiency, item difficulty did not interact with translation direction. In fact, we found no evidence of a translation asymmetry in RTs at any level of item difficulty. It is possible that item difficulty impacts comprehension and production equally or that it affects L1 and L2 processes equally. Either way, it appears that the translation asymmetry is a person-based, not an item-based, phenomenon. That is, the translation asymmetry diminishes as a bilingual becomes more proficient, but for an individual, the translation asymmetry does not vary across items.

Repetition priming

In same-direction repetition conditions, as is shown in Figs. 2 and 3, less-proficient participants and more-difficult items exhibited greater facilitation, consistent with the idea that difficult tasks or processes have more room for improvement. Participant proficiency interacted with translation direction for priming, such that priming of L1–L2 translation decreased with proficiency, but priming of L2–L1 translation did not. This means that less-proficient bilinguals exhibited stronger priming in L1–L2 translation. In contrast, item difficulty did not interact with translation direction, and no translation asymmetry emerged in this item-based analysis. The pattern of priming effects obtained with identical repetition was consistent in every respect with the pattern of new-item RTs.

Reversed-direction repetitions produced priming patterns similar to those obtained with identical repetition. In comparing the identical and reversed conditions, overall, priming was stronger in both the participant proficiency and item difficulty analyses when the translation directions matched between encoding and test. This effect did not interact with participant proficiency. However, item difficulty did interact with the effect of direction match, with the advantage for identical repetition increasing with item difficulty. For easier items, the effect of direction match was smaller, consistent with the performance of the Penn State bilinguals on relatively easy items. In the proficiency and difficulty analyses, final translation direction and direction match did not interact, and no evidence of a three-way interaction was observed in either analysis.

These differing patterns of the effects of participant proficiency and item difficulty on priming parallel and help to explain the differences in the original findings of our two experiments. The Penn State participants were less proficient, and consistent with the proficiency analysis, they showed stronger effects of translation direction. The UTEP participants translated more-difficult items, and they showed stronger effects of direction match. Thus, the differences in the results of the two studies appear to be quantitative rather than qualitative.

In comparing the effects of proficiency and difficulty shown in Figs. 2 and 3, it is also of interest to compare the predictions for the participant and item error rates. For translation RTs, the slope for items (Fig. 3) is steeper than the slope for participants (Fig. 2). Although the predictions are similar at a 0 % error rate, a 25 % item error rate corresponds to a higher predicted RT than a 25 % participant error rate.

Role of pre-experimental and experimental exposures in translation RT and priming

The results can be understood within a framework in which both pre-experimental and experimental exposures to words increase the strength of connections between concepts and the word forms that are used to express them. The number of pre-experimental exposures that a person has to a word in natural contexts depends on both the amount of exposure that the person has to the language and the frequency with which the word occurs in the language. A bilingual will generally have more exposures to words in the language that they use more, which will, as a consequence, be the language with higher lexical proficiency. A monolingual individual will have more exposures to the words of their only language than a bilingual of the same age (a point raised by Gollan, Montoya, Cera, & Sandoval, 2008). It cannot be determined how many naturalistic exposures an adult has had to a given word. Therefore, we estimated the level of pre-experimental learning empirically and then measured the increment in learning caused by an additional exposure.

In the present study, the performance of more-proficient participants (i.e., those with lower error rates) was associated with lower RTs, presumably because such participants are farther along on a learning curve than those with higher error rates. Consistent with this characterization, an experimental exposure decreased RTs less for more-proficient participants. With the increased proficiency of the participant, the translation asymmetry decreased, and for more-proficient participants, the translation asymmetry decreased even further with an additional repetition. Similarly, items with lower difficulty (i.e., those with lower error rates) were associated with lower RTs because such items are on average farther along on a learning curve than are items with higher error rates. Consistent with this characterization, an experimental exposure decreased RTs less for the less-difficult items.

Implications of the present results for studies of bilingual production

An interesting consequence of the encoding-phase practice trials is that the effect of proficiency on RTs is smaller for repeated than for new items. Similarly, the effect of difficulty of lexical access on RTs is smaller for repeated than for new items. These patterns suggest that encoding-phase exposures raise the functional proficiency of the participants for the item set studied and the functional frequency of the studied items for the participants who were exposed to them. Repetition also decreased the discrepancy between easier and more difficult items in previous research. In picture naming, repeated items show reduced effects of item factors, such as name agreement (Park & Gabrieli, 1995), word frequency (Wheeldon & Monsell, 1992), and normative age of acquisition (Barry et al., 2001), as well as of the participant factors monolingual/bilingual status (Gollan et al., 2005) and bilingual language dominance (Francis et al., 2003; Francis et al., 2008; Francis & Sáenz, 2007). The effects of word frequency are also reduced with repetition in lexical decision in both monolinguals (e.g., Forster & Davis, 1984; Kinoshita, 1995) and bilinguals (Kirsner, Smith, Lockhart, King, & Jain, 1984; Scarborough, Gerard, & Cortese, 1984). Furthermore, individuals who have greater extra-experimental experience with words because of age or using only one rather than two languages also exhibit smaller word frequency effects in picture naming (Gollan et al., 2008).

Related to these phenomena, in many studies of language production participants get training on the experimental items, practicing them several times before performing the critical experimental trials or using the same items for several conditions for the same participant (e.g., Abunuwara, 1992; Bloem & La Heij, 2003; Costa, Caramazza, & Sebastian-Galles, 2000). This procedure is used to reduce error rates and RT variability and to get RTs to a stable level before measuring the effects of the independent variables; items practiced to asymptote can be used in multiple conditions without further practice effects. However, this approach ignores other episodic consequences of training exposure. The present results, along with the other studies cited above, suggest that even a single training trial for each item can have the unintended consequence of reducing the effects of language proficiency or item characteristics, perhaps leading to underestimating their influences (see Kroll, Bobb, & Wodniecka, 2006, for a discussion of this issue).

The present analysis can account for some apparent discrepancies in the literature with respect to the translation asymmetry that have led researchers to different conclusions about whether the two directions of translation engage different processes. For bilinguals with high proficiency translating easy items, translation RTs were equivalent for the two directions, whether trained (e.g., Duyck & Brysbaert, 2004) or untrained (Potter et al., 1984, high-proficiency group). The same pattern was observed when high-proficiency bilinguals translated difficult items, whether trained (La Heij et al., 1996) or untrained (Francis & Gallard, 2005, in English and Spanish; Francis & Sáenz, 2007; but see Kroll & Stewart, 1994). These results are consistent with the present findings, in that higher-proficiency bilinguals did not exhibit a translation asymmetry and that the translation asymmetry did not change across levels of item difficulty. For bilinguals who are clearly dominant in the L1, asymmetries in RTs were observed for untrained items that were either easy (e.g., Sholl et al., 1995) or difficult (Francis & Gallard, 2005, English–French and Spanish–French combinations). Similarly, in the present study, L1-dominant bilinguals, who are more proficient in the L1 than in the L2, exhibited a translation asymmetry, but the translation asymmetry was not moderated by item difficulty. Overall, in the proficiency range tested, there may be no need to incorporate qualitatively different processes to account for the different patterns of results.

A number of studies have investigated the factors that affect vocabulary learning in training paradigms, including word concreteness (e.g., van Hell & Mahn, 1997), phonological familiarity (e.g., Kaushanskaya, 2012), and translation ambiguity (e.g., Degani & Tokowicz, 2010). Studies like the present study can supplement these training studies by helping to characterize word processing across the proficiency continuum, both for participants and for items. As such, they reveal how word-learning processes are likely to unfold over the course of acquisition. Unlike training studies, investigating the changes that occur to existing vocabulary knowledge may provide critical information that is unavailable in studies that have introduced novel and limited vocabulary during initial stages of word learning.

The role of control processes in translation

The recent literature on bilingual production and its neural basis has suggested that many of the differences that have been observed for performance in the L1 and L2 can be understood as differences in the demands to regulate accurate production in the two languages (e.g., Abutalebi & Green, 2007; Green, 1998; Kroll & Gollan, in press). To what extent can differences in the cognitive control processes engaged by different types of bilingual speakers account for the findings in the present study? Less research on production has focused on translation than on picture naming, but in both domains there is a suggestion that inhibitory control must be engaged at both local and global levels (e.g., de Groot & Christoffels, 2006; Misra, Guo, Bobb, & Kroll, 2012). Evidence also indicates that proficiency and the difficulty of the processing task may determine how these cognitive control processes are manifested (e.g., Costa, Hernández, Costa-Faidella, & Sebastián-Gallés, 2009; Costa & Santesteban, 2004). The translation data in the present study have shown that bilinguals who are more dominant in their L1 and also more proficient in the L1 than in the L2 (although relatively proficient in the L2) are more likely to speak the L2 more slowly and to benefit more from priming in the L2 than in the L1. According to a cognitive control account, the priming in L2 may have the consequence of reducing both within- and across-language competitors that are particularly problematic when speakers are more proficient in the L1. More-balanced and high-proficiency bilinguals are more likely to speak each language with equal speed. The effects of direction match observed for the high-proficiency speakers may be a normal reflection of the transfer-appropriate processing effects observed when the component processes within a task are the same (e.g., Roediger & Blaxton, 1987).

A cognitive control account of the data that we have presented might also enable us to support a quantitative rather than a qualitative difference across speakers and across the two directions of translation. However, other recent studies using electrophysiological methods to investigate the time course of bilingual speech planning (e.g., Strijkers, Costa, & Thierry, 2010) have suggested that even for highly proficient and balanced bilinguals, differences occur in the earliest stages of planning for L1 and L2. Strijkers et al. tested Catalan–Spanish bilinguals who, like the bilinguals in the El Paso group, were early and balanced bilinguals. It is possible that the timing of the behavioral methods used in the present study masked differences in the ways that each language engages the production system, making it appear that the two languages were similar because the aggregate effects of processing resulted in the same final outcome. It will remain to be seen in future research whether methods that may be more sensitive to the time courses of processing will provide information that converges with the results that we have reported here.

Limitations of the study

The inclusion of more-difficult items in Experiment 2 may have affected RTs to the easier items. However, we expect that, if anything, this effect would have created a homogenization pattern, with slowed responses to easier items and speeded responses to more-difficult items, as has been seen with picture naming (e.g., Lupker, Kinoshita, Coltheart, & Taylor, 2003). The slowing of responses to easier items would tend to diminish the proficiency effects between Experiments 1 and 2. Such homogenization would tend to work against our item difficulty effects. Thus, the effects of participant proficiency and item difficulty may have been underestimated in the present study.

Conclusions

Two translation-priming experiments with very similar designs were conducted independently in different laboratories with different bilingual populations and yielded apparently discrepant findings. Analysis of the combined data highlighted the systematic effects of bilingual proficiency and item difficulty on word translation RTs and repetition priming. Low proficiency and high item difficulty were both associated with longer RTs and larger priming effects. However, only lower proficiency was associated with larger asymmetries across translation directions in RTs and priming; in contrast, only high item difficulty was associated with a stronger effect of direction match on repetition priming. Additional research will be required in order to determine the reasons why bilingual proficiency and item difficulty exhibit such different patterns of effects on translation RTs and repetition priming. The effects of proficiency, difficulty, and repetition priming were consistent with a model in which the strengths of associations between concepts and words in a person’s vocabulary exhibit incremental learning in semantic memory with each episodic exposure.