Introduction

A central issue addressed in bilingual psycholinguistic research is how words of two languages are represented and accessed. To investigate this fundamental issue with respect to visual word recognition, numerous studies have utilized the masked translation priming paradigm with a lexical decision task. Since the first translation priming effects were reported from the first (L1) to the second language (L2) in de Groot and Nas’s (1991) seminal study, follow-up studies have used the non-cognate masked translation priming paradigm to investigate both L1–L2 and L2–L1 priming in unbalanced bilinguals. These early studies have consistently reported significant L1–L2 translation priming effects, whereas they failed to find L2–L1 translation priming effects (e.g., Finkbeiner, Forster, Nicol, & Nakamura, 2004; Gollan, Forster, & Frost, 1997; Jiang, 1999; Kim & Davis, 2003).Footnote 1

To explain this translation priming asymmetry found in early studies, two theoretical frameworks have been proposed in the literature. According to the episodic L2 hypothesis (Jiang & Forster, 2001; Witzel & Forster, 2012), L1 words are represented in lexical memory whereas L2 words are represented in episodic memory. This predicts that it is impossible for L2 primes to impact L1 targets. In contrast, the Sense Model (Finkbeiner et al., 2004) proposes that translation priming asymmetry is attributed to a representational asymmetry between L1 and L2 words, namely, L1 words are associated with more semantic senses than L2 words. As a consequence, semantic senses activated by L2 primes are insufficient to facilitate the recognition of L1 targets, whereas L1 primes can facilitate L2 targets. Both of these theoretical accounts predict that there is no L2–L1 translation priming. However, these models cannot account for some of the recent studies that found significant translation priming effects in both directions (e.g., Greek–English: Dimitropoulou, Duñabeitia, & Carreiras, 2011; Dutch–English: Duyck & Warlop, 2009; Japanese–English: Nakayama, Ida, & Lupker, 2016). In light of these recent findings, Schoonbaert, Duyck, Brysbaert, and Hartsuiker (2009) proposed a refined Distributed Representation Model (DRM), which was based on the distributed model of de Groot (1992). According to the DRM, L1 words are connected to more semantic nodes than L2 words, so L1 primes activate a larger proportion of semantic nodes of L2 targets than vice versa. Critically, the assumption of the DRM is that L1–L2 and L2–L1 translation priming are quantitatively rather than qualitatively different as implied in other models (Jiang & Forster, 2001; Kroll & Stewart, 1994; Witzel & Forster, 2012). Thus, the model predicts significant L2–L1 translation priming effects which are smaller than L1–L2 translation priming effects.

Several studies have provided narrative reviews of the asymmetric translation priming effects reported with unbalanced bilinguals in the literature (e.g., Altarriba & Basnight-Brown, 2007; Dimitropoulou et al., 2011; Nakayama et al., 2016; Xia & Andrews, 2015). As summarized in Dimitropoulou et al. (2011), only 8 out of 21 experiments reported significant L2–L1 translation priming (mean priming effect: 9 ms, ranging from –6 to 26 ms). Although robust L1–L2 translation priming was reported in all experiments (cf. Davis et al., 2010), the priming effects varied greatly from 16 to 100 ms (mean: 41 ms). A range of possible factors modulating the translation priming effects have been discussed in the narrative reviews, such as factors related to statistical power (i.e., number of participants), the prime-target presentation procedure (i.e., prime duration, inter-stimulus interval, stimulus onset asymmetry), the stimuli (i.e., number of experimental items, the languages involved are the same or different scripts) and general processing speed (i.e., response speed of participants). Because existing empirical studies differ dramatically in terms of all these factors and no studies in the literature have so far considered some or all of these potential moderators systematically, the tentative conclusions of narrative reviews remain inconclusive. Importantly, these insightful reviews have so far focused mainly on the magnitude of the priming effects (in milliseconds), which are unstandardized estimates of the effect sizes and thus cannot be compared across studies. Surprisingly, there are no meta-analytic reviews conducted in the literature, so far as we are aware, that quantitatively assessed the standardized effect sizes of L1–L2 and L2–L1 translation priming and the impact of potential experimental moderators on translation priming effects.

To fill this important gap in the literature, we present here a meta-analysis that investigated masked translation priming effects of non-cognates word pairs in lexical decision tasks. A meta-analysis uses standardized effect sizes and their variance observed in studies and tests statistically whether the overall effect size provides evidence of the experimental effect (Borenstein, Hedges, Higgins, & Rothstein, 2009). Another unique advantage of a meta-analysis is that potential moderators can be tested statistically. These moderators may explain inconsistent findings in experiments reported in the literature. Therefore, the aim of the meta-analysis in the present study was twofold. First, the primary goal was to determine the overall effect size of L1–L2 and L2–L1 translation priming and to statistically compare the effects sizes between the two translation directions. The second aim was to statistically test whether effect sizes of translation priming are influenced by moderators previously suggested in the literature. The following seven potential moderators were considered: the number of participants, the prime duration, the SOA (Stimuli Onset Asynchrony, i.e., the interval between the onset of prime and the onset of target), the ISI (Inter-Stimulus Interval, i.e., interval between the offset of prime and the onset of target), script type, number of items per cell and response speed.

Method

Literature search and study selection

A literature search was conducted using “masked translation priming” as the search string in PsycINFO, Web of Science and PubMed (up to 31 March 2016) to identify possible studies to be included in the meta-analysis. To find additional studies, recent studies and reviews of masked translation priming were consulted (Dimitropoulou et al., 2011; Duñabeitia, Perea, & Carreiras, 2010; Nakayama et al., 2016; Schoonbaert et al., 2009; Xia & Andrews, 2015). The following criteria were used to select the final set of studies and experiments for the meta-analysis: (1) prime duration ≤ 100 ms, (2) primes were masked, (3) a lexical decision task was used, (4) prime-target pairs were non-cognates, (5) L1/L2 of bilinguals were clearly specified, and (6) either the F or t value of the translation priming effect was reported. Using these selection criteria, we found 24 published articles. For the L1–L2 translation priming direction, 31 experimental observations were extracted from 20 studies. For the L2–L1 translation priming, 33 experimental observations were extracted from 18 studies. A detailed description of these studies is provided in the Supplementary Material.

Meta-analysis

Effect sizes

The effect sizes (d) were calculated using t values or F values, the number of participants (n) and the formula proposed by Rosenthal (1991): \( \mathrm{d}=\frac{t}{\sqrt{n\ }} \) or \( \mathrm{d} = \sqrt{\frac{F\ }{n}} \). Previous meta-analyses (e.g., Van den Bussche, Van den Noortgate, & Reynvoet, 2009) have also used this formula to estimate effect sizes in within-subject experiments. In line with other meta-analyses for masked priming effects in monolingual studies (Lucas, 2000; Van den Bussche et al., 2009), t values or F values were taken from the subject analyses. To indicate the direction of the priming effects, the effect size was specified as positive or negative based on the means of the translation priming and unrelated (control) conditions. Thus, a positive effect size indicates a facilitatory translation priming effect. Sampling variance of the effect sizes was calculated using the formula provided by Morris and DeShon (2002):

$$ Sampling\ variance=\left(\frac{1}{n}\right)\left(\frac{n-1}{n-3}\right)\left(1+n{d}^2\right)-\frac{d^2}{{\left[c\left(n-1\right)\right]}^2} $$

in which d is the effect size, n is the number of participants and c(n − 1) is defined as (Hedges, 1981, 1982):

$$ c\left(n-1\right)=1 - \frac{3}{4\left(n-1\right)-1}\ . $$

Moderator coding

The seven factors mentioned in the introduction were included as moderators in the present meta-analysis. Six factors were included as continuous moderators: the number of participants, number of items per cell, the prime duration (in ms), the ISI (in ms), the SOA (in ms) and overall response speed as measured by the mean reaction time in the unrelated (control) condition (in ms). Script type was coded categorically, as either as same-script languages (e.g., Dutch and English) or different-script languages (e.g., Chinese and English).

Data analyses

The meta-analysis was conducted using the metafor package (Viechtbauer, 2010) in R v.3.2.4 (R Core Team, 2016). For both translation directions, a random-effects model without any moderators was first conducted to estimate the effect sizes of L1–L2 and L2–L1 translation priming. A z test was conducted to compare the overall effect sizes of the two translation directions (Borenstein et al., 2009). A significant z test would suggest that the effect sizes of the two translation directions are different and separate analyses are warranted. Next, for both translation directions, Q tests of variance were conducted to investigate the heterogeneity of the observed effect sizes. A significant Q test would indicate that the observed effect sizes are heterogeneous, and that potential moderators are likely to exist. To investigate the influence of the potential moderators, we used a meta-regression approach similar to Van den Bussche et al. (2009). First, each of the seven moderators were separately entered into a random-effects model. Next, when more than one of the moderators was significant, we included these significant moderators in the initial model of the meta-regression. In order to address the issue of the collinearity between moderators, we orthogonalised moderators that significantly correlated by fitting a linear model to obtain the residuals (see, for example, Siyanova-Chanturia, Conklin, & van Heuven, 2011, for a similar approach). The residuals of the model were then included in the meta-regression. Finally, a backward model selection procedure was used in which non-significant moderators were step-by-step eliminated from the model.

Results

The overall effect size (d) for the L1–L2 translation priming direction was 0.86, z = 12.869, p < 0.0001, whereas the overall effect size for L2–L1 translation priming was 0.31, z = 6.3481, p < 0.0001. The difference of 0.55 between the overall effect sizes of the L1–L2 and L2–L1 translation priming directions was significant, z = 6.61, p < 0.0001. Figure 1 illustrates the effect sizes of translation priming for the two directions with 95 % CIs. Because the effect sizes are significantly different for each translation direction the next analyses were conducted for each translation direction separately.

Fig. 1
figure 1

Overall effect sizes for L1–L2 and L2–L1 non-cognate masked translation priming (with their 95 % confidence intervals)

L1–L2 translation priming

Figure 2 presents an overview of the observed effect sizes for L1–L2 translation priming. A Q test of variance revealed that the effect sizes across experiments were heterogeneous, Q = 82.00, df = 30, p < 0.001. The separate random-effects model analyses that each included a different moderator revealed that ISI and SOA were the significant moderators (Table 1). Because ISI and SOA are highly correlated, r = 0.98, p < 0.001, the collinearity between ISI and SOA was reduced by using the residuals from the linear model in which ISI was predicted by SOA. When ISI and the residuals of SOA were both entered into a random-effects model, SOA was not significant anymore, β = 0.0015, SE = 0.0056, z = 0.262, p = 0.793, whereas ISI was still significant, β = 0.0022, SE = 0.0011, z = 2.072, p = 0.0382. The final model only included ISI and it explained 16.00 % of the variance between studies. The AIC (Akaike’s Information Criterion) of this model was 35.925, which is smaller than 38.177 for the model without any moderators, indicating that the model with ISI was a better model.

Fig. 2
figure 2

Observed effect sizes for L1–L2 non-cognate masked translation priming ordered by magnitude of the effect size and the overall effect sizes (with their 95 % confidence intervals)

Table 1 Meta-regression analysis with one moderator for L1–L2 translation priming

L2–L1 translation priming

Figure 3 presents an overview of the observed effect sizes for L2–L1 translation priming. A Q test of variance showed that the effect sizes across studies were again heterogeneous, Q = 60.93, df = 32, p = 0.0015. Separate random-effects models with each moderator revealed that number of items per cell was the only significant moderator (Table 2). The final model included number of items per cell as the moderator, which explained 74.60 % of the heterogeneity between studies. The AIC of this model was 7.784, which is smaller than the 17.219 for the model without any moderators, which suggested that the model with the number of items per cell was a better model.

Fig. 3
figure 3

Observed effect sizes for L2–L1 masked non-cognate translation priming ordered by magnitude of the effect size and the overall effect sizes (with their 95 % confidence intervals)

Table 2 Meta-regression analysis with one moderator for L2–L1 translation priming

General discussion

A meta-analysis of 64 experimental observations across 24 studies was conducted to quantitatively assess the overall effect sizes of masked translation priming effects from L1 to L2 and vice versa. The results revealed significant translation priming effects for both directions with L1–L2 translation priming significantly larger than L2–L1 translation priming (i.e., overall effects sizes: 0.86 vs. 0.31). This finding supports the view that the translation priming asymmetry between L1–L2 and L2–L1 is quantitative rather than qualitative (Schoonbaert et al., 2009).

The meta-analysis further investigated the influence of seven potential moderators (number of participants, prime duration, ISI, SOA, number of items per cell, script type and general response speed). The results revealed that the effect sizes of L1–L2 translation priming were moderated by ISI, and the effect sizes of L2–L1 translation priming were moderated by the number of items per cell.Footnote 2 These findings have two important implications. For L1–L2 translation priming, it is very likely that a longer ISI increases the time to process the prime, resulting in a stronger priming effect. Therefore, it is crucial for future studies to systematically investigate how considerable variations in ISI influence L1–L2 translation priming. For L2–L1 translation priming, our results confirm earlier concerns in the literature about the large variation in the number of items per cell in masked translation priming experiments (Dimitropoulou et al., 2011; Nakayama et al., 2016). A possible explanation for the impact of the number of items on L2–L1 priming is that using more items may increase priming effects because it reduces the noise in the data (Van den Bussche et al., 2009). Although the number of items per cell varied from 12 to 80 in the studies included here, researchers rarely provided a justification for the choice of the number of items per cell and no studies, as far as we know, have investigated this systematically. Therefore, future studies should use a sufficient number of items per cell to investigate the L2–L1 translation priming. It is beyond the scope of the meta-analysis to provide a recommendation about the number of items per cell, but it is important to note that it is possible to calculate the number of items needed a priori for a high powered experiment (see Stevens, Mandera, Keuleers, & Brysbaert, 2015). Selecting more translation pairs could easily be accomplished by using large-scale databases with translation norms (e.g., Prior, MacWhinney, & Kroll, 2007; Tokowicz, Kroll, de Groot, & van Hell, 2002; Wen & van Heuven, 2016). Taken together, the impact of ISI and the number of experimental items should be considered when judging the mixed findings.

Our meta-analysis is also useful for researchers because the overall effect sizes estimated here can be used as a benchmark to calculate the number of participants needed for a study to detect an effect. For example, in a one-tailed paired t test with an effect size of 0.86, only 10 participants are required to obtain a power of 0.8 for L1–L2 translation priming studies. In contrast, for a L2–L1 translation priming effect size of 0.31, 66 participants are necessary to obtain a power of 0.8 in a one-tailed repeated t test. As can been seen in Fig. 4, due to the differences between the effect sizes of L1–L2 and L2–L1 translation priming, there is a large difference in power between L1–L2 and L2–L1 translation priming when the same number of participants are tested.

Fig. 4
figure 4

The relationship between power and number of participants estimated with the overall effect sizes for L1–L2 and L2–L1 masked non-cognate translation priming

The findings of the current meta-analysis are clear. However, there are some limitations. First of all, because the literature search did not include any unpublished data, it is not feasible to reliably estimate the publication bias in the field. Secondly, although studies have found that the L2 proficiency of bilinguals (Dimitropoulou et al., 2011; Nakayama et al., 2016) and the age of L2 acquisition (Sabourin, Brien, & Burkholder, 2014) modulated translation priming effects, we, unfortunately, could not consider the second language profile of the bilingual participants (e.g., L2 proficiency, language dominance, age of L2 exposure/acquisition) in the present study because the majority of the studies included in the present meta-analysis failed to assess or provide detailed descriptions of the participants’ second language profile. Across the 24 studies, only 12 used self-assessed proficiency ratings as an estimate of the bilinguals’ L2 proficiency, and only 10 studies provided information about the age of first L2 exposure/acquisition. Critically, studies have suggested that self-assessment is less reliable than objective language proficiency measures such as obtained with LexTALE (Khare, Verma, Kar, Srinivasan, & Brysbaert, 2013; Lemhöfer & Broersma, 2012). Surprisingly, only two studies provided detailed (objective) information about the L2 proficiency (Dimitropoulou et al., 2011; Nakayama et al., 2016). To move the field forward, future studies are strongly encouraged to include objective L2 proficiency information so that future meta-analyses can shed more light on whether or not L2 proficiency moderates translation priming.

In addition, it is crucial to notice that studies in the field seldom reported the standardized effect sizes, which is not in line with the guidelines of the American Psychological Association (2010). Future meta-analyses would benefit if standardized effect sizes of translation priming are reported (for more discussion about effect sizes, see Judd, Westfall, & Kenny, 2016; Lakens, 2013).

To summarize, we conducted the first meta-analysis of L1–L2 and L2–L1 masked translation priming in the literature, which quantitatively assessed the effect sizes of translation priming. The results not only revealed significant translation priming effects for both directions with larger L1–L2 than L2–L1 translation priming but also revealed that the effect sizes of L1–L2 were moderated by ISI and those of L2–L1 translation priming were moderated by the number of items used per cell. These findings contribute to the discussion about the mixed findings for the existing translation priming studies and provide methodological recommendations for future research.