Introduction

In the Stroop task (Stroop, 1935), participants are required to name the ink color of written words which can be color-words (e.g., RED), regular words (e.g. WINDOW), or same-letter-strings (e.g., XXXX). For color-words, two conditions are presented to participants: a congruent condition in which the written word and the ink color are the same (e.g., RED written in red) and an incongruent condition in which the word and the ink color are different (e.g., RED written in blue). Commonly, reaction time (RT) is longer for incongruent compared to non-word neutral trials (also known as the “interference effect”) while RT for congruent trials is faster than for neutral trials (the “facilitation effect”, which is commonly smaller and more fragile; Bugg et al., 2011a). Nevertheless, neuroimaging studies have shown that the anterior cingulate cortex, a brain area that is highly involved in conflict monitoring (e.g., Kerns et al., 2004), is activated in both incongruent and congruent conditions, compared with neutrals (Aarts et al., 2009; Bench et al., 1993; Carter et al., 1995; Milham et al., 2002; for similar results with pupil dilation see Hershman & Henik, 2019). Recent investigations suggested that this contradiction between the behavioral data (indicating a facilitation effect) and the imaging data (indicating a conflict in congruent trials but not in neutrals) can be settled by considering that two, rather than only one, types of conflicts are reflected in the Stroop task—information conflict, between the incongruent word and ink color, which exists only in incongruent trials and affects the interference effect; and task conflict, between the relevant color-naming task and the competing, irrelevant, automatic word reading task, which exists in both incongruent and congruent trials (e.g., Augustinova et al., 2019; Hershman & Henik, 2019; Kalanthroff et al., 2018; Parris, 2014; for a review and computational model, see Kalanthroff et al., 2018; Littman et al., 2019). In several studies, it has been suggested that task conflict is triggered not only in congruent and incongruent trials but also when a neutral word is presented (e.g., TABLE) but not when a letter-string is presented (e.g., Goldfarb & Henik, 2007; Kalanthroff et al., 2013). However, despite the extensive literature on task conflict in the Stroop task, it is not yet known at what stage the executive control system determines that the presented stimulus is a real-word, and hence, task conflict is triggered (and task control should be recruited). The current study aims to compare different types of Stroop neutrals, in both manual and vocal tasks, in order to determine the boundary conditions of task conflict and the level of processing in which task conflict is triggered.

The notion that task conflict is triggered in Stroop congruent and word neutral trials is based on the classic psychological idea that stimuli have the ability to trigger a task that is strongly associated with them (Gibson, 2014; Koch & Allport, 2006; Monsell, 2003; Waszak et al., 2003) and that, accordingly, words trigger the automatic reading task (Rogers & Monsell, 1995). Thus, the irrelevant reading task is a stimulus-driven task that needs to be suppressed to perform efficiently. Accounting for the discrepancies between brain findings showing a conflict in the congruent condition and behavioral findings showing a facilitation effect, it has been suggested that a specific control mechanism, namely task control, is required to suppress the stimulus-driven behavior of reading (Goldfarb & Henik, 2007; Kalanthroff et al., 2013, 2017; Steinhauser & Hübner, 2009). In healthy adults, task control is very efficient and thus a Stroop facilitation effect is commonly evident (congruent RT < neutral RT), and task conflict is hardly observed (Goldfarb & Henik, 2007; La Heij & Boelens, 2011). Notably, several studies were able to behaviorally demonstrate a reverse facilitation effect (i.e., slower RTs for congruent trials compared to neutral trials) either in specific populations (e.g., young children or patients with OCD or GAD) which are characterized by low proactive control (e.g., Kalanthroff et al., 2016, 2017; La Heij et al., 2010) or by manipulating cognitive control levels in healthy adults (Kalanthroff et al., 2015; For a review see: Littman et al., 2019). Perhaps the most common method to reduce control levels in the Stroop task is by increasing the proportion of neutral trials to 75% thereby “relaxing” the control system (Goldfarb & Henik, 2007; Kalanthroff & Henik, 2013; Kalanthroff et al., 2013). Several studies and computational models have shown that when task control levels are low (relaxed), ignoring the irrelevant stimulus-driven task of reading becomes more difficult and a reverse facilitation effect is demonstrated, as congruent trials tend to be slower than non-word same-letter strings trials (e.g., XXXX; Goldfarb & Henik, 2007; Kalanthroff et al., 2013). However, it is still not clear which stimuli and which conditions exactly lead to the activation of the reading task, and hence to task conflict, and which do not.

Recently, it has been shown that the question of “what triggers task conflict”, and more specifically “what triggers the task of reading”, in different neutral conditions of the Stroop task, depends on the response-type (Augustinova et al., 2019; Kinoshita et al., 2017). Although different response-type methods were used in the literature on the Stroop task (for oculomotor see Hasshim & Parris, 2015; for mouse tracking see Incera & McLennan, 2016), the two most common response-types are the manual and vocal responses (MacLeod, 1991). While the manual task requires categorization of color names into defined categories using a keypress, the vocal task simply requires color-naming. In most cases, the Stroop interference effect tends to be larger in magnitude in the vocal task compared to the manual task (Augustinova et al., 2019; MacLeod, 1991). In addition, other processes such as semantic processing, response conflict (Sharma & McKenna, 1998), and phoneme-to-grapheme sub-lexical processing (Parris et al., 2019), have been shown to be stronger in the vocal version. Studies that tested the differences between different types of neutrals in the manual task consistently showed that real-words trigger the task of reading and hence task conflict, while non-word neutrals, such as same-letter strings (e.g., XXXX) or symbols-strings (e.g., $%^&), do not (or do to a much lesser degree; Entel et al., 2015; Entel et al., 2018; Goldfarb & Henik., 2007; Kalanthroff et al., 2018 for a review). This is evident by the existence of a reverse facilitation effect in a low control Stroop task when same-letter strings or neutral symbols are used, and no facilitation (Goldfarb & Henik, 2007), or sometimes even regular facilitation (Kalanthroff et al., 2013), when real-word neutrals are used. In addition, a recent study by Hershman et al., (2021a), found that real-words triggered more task conflict when compared to same-letter strings, which triggered more task conflict when compared to abstract forms, and this was shown by larger pupil dilation. However, no manual Stroop studies have found evidence of task conflict for other types of neutrals, on the continuum between symbols and real-words, such as pseudo-words or illegal-letter strings that violates language rules or consist of a very rare combination of letters. It is important to note that some studies reported finding no differences in the existence of task conflict between different types of neutrals in the manual task (including between real-words, pseudo-words, same-letter strings, and symbols; Augustionva et al., 2019; Hershman et al., 2021b; Keele, 1972; Kinoshita et al., 2017; Sharma & McKenna, 1998; Zahedi et al., 2019). However, these studies all used within-subject within-block designs (in which, the different types of neutrals were mixed within a block), and while this raises some interesting questions (for example, about carry-over effects), in the current study we will focus on a between-subject paradigm.

In the vocal Stroop task, it was suggested that different processes are activated as compared with the manual task. Roelofs (2003) in his ‘verbal action account’, argued that color-naming activates the verbal speech production system, which in turn activates the phonological system that maps letters with their corresponded phonemes, without being mediated by the retrieval of word concept. Hence, according to Roelofs theory, in the vocal task, each and every orthographic stimulus (that consists of letters) is expected to trigger phoneme to grapheme mapping and thus larger task conflict. More recent findings agree with this prediction. Levin and Tzelgov (2016), asked participants to name the color of different neutral stimuli, including orthographic (real-words and same-letter strings) and non-orthographic stimuli (geometric shapes). These researchers found that all neutral orthographic stimuli that consisted of letters triggered task conflict to the same extent, while non-orthographic geometric shapes triggered less (or no) task conflict.Footnote 1 McWilliam et al. (2009), found a similar pattern—only orthographic conditions (real-words, pseudo-words, and non-pronounceable heterogeneous strings (e.g., HDK)) triggered task conflict,Footnote 2 while neutral symbols did not (or did to a significantly lesser degree). However, other researchers have argued that in addition to the activation of task conflict by the presence of letters, there is another process that triggers task conflict to a greater extent in the vocal task, which is the pronounceability of the stimuli. For example, similar to the studies mentioned above, Monsell et al. (2001) also showed that in the vocal task, orthographic stimuli trigger more conflict than non-orthographic stimuli, however, they also found that pronounceable stimuli (e.g., real-words, pseudo-words) trigger more task conflict in the vocal task in comparison to other non-pronounceable orthographic stimuli (e.g., same-letter strings, non-pronounceable heterogeneous string). Finally, Kinoshita et al. (2017) found a different pattern of results. Similar to Monsell et al., Kinoshita et al. found that pronounceable stimuli (e.g., real-words, pseudo-words) trigger more task conflict in the vocal task than non-pronounceable stimuli (e.g., same-letter strings, non-pronounceable heterogeneous string, and symbols), but they did not find a consistent effect for orthography as both same-letter strings and symbols triggered little (or no) and comparable task conflict.Footnote 3 Taken together, while some studies found differences only between orthographic and non-orthographic stimuli but not between pronounceable and non-pronounceable stimuli (Levin & Tzelgov and McWilliam et al.), some have found differences between orthographic and non-orthographic stimuli and between pronounceable and non-pronounceable stimuli (e.g., Monsell et al.), and some have found no differences between orthographic and non-orthographic stimuli but did find differences between pronounceable and non-pronounceable stimuli (Kinoshita et al.). As noted above, one potential source for these inconsistencies is the fact that all studies mentioned above employed different sets of stimuli in within-subject designs, and the carry-over effects of the different stimuli are not entirely clear (we further discuss the potential differences between the two designs in the Discussion section). In the present study, we employed a between-subject design which is less prone to carry-over effects.

The present study is the first to introduce a comprehensive between-subject low-control Stroop task to investigate the existence of task conflict in different types of neutrals in both manual and vocal tasks. To that aim, we administered the low-control Stroop task to 10 groups of participants, which differed in response-type (manual vs. vocal) and the type of neutral used (see Fig. 1). First, based on the literature that shows a stronger manifestation of conflicts in the vocal task (Macleod, 1991), we expect a more robust manifestation of task conflict (i.e., larger reverse facilitation) in the vocal task compared to the manual task. Most importantly, in the manual task we expect to find a ‘wordlikeness gradient’, such that the real-words would elicit the strongest task conflict (and, therefore, the smallest reverse facilitation or even regular facilitation), symbols would not trigger the reading task at all (therefore, will have the largest reverse facilitation), and the rest of the conditions will display varying amounts of reverse facilitation based on their resemblance to real-words—pseudo-words would trigger more task conflict than same-letter strings and illegal-letter strings (Fig. 1). In the vocal task, based on the ‘verbal action account’ (Roelofs, 2003) we expect that the orthographic stimuli (i.e., real-words, pseudo-words, illegal-letter strings, and same-letter strings) will trigger the strongest task conflict (smallest or no reverse facilitation) and that symbols will trigger no task conflict (largest reverse facilitation). Given the contradictions in the literature reviewed above, we tested at an exploratory level, whether pronounceable conditions (real-words and pseudo-words) will trigger more task conflict as compared to non-pronounceable neutrals (same-letter strings and illegal-letter strings).

Fig. 1
figure 1

Neutral conditions in the current experiment. The neutral conditions that appeared in the experiment were ordered according to the ‘wordlikeness’ gradient. On the left side of the continuum, a string of symbols is expected to trigger minimal activation of the reading task and hence minimal task conflict. On the right side of the continuum, real-words are expected to trigger reading and thus they activate task conflict to its largest extent. The illegal-letter string appears in Hebrew letters, note that the red letter is a final form letter that breaks language rules (as it is allowed only at the end of the string)

Method

Participants

Two hundred and eighty-nine university students took part in the experiment in return for course credit or small monetary compensation (~ 6 USD). All participants had a normal or corrected-to-normal vision, reported no history of attention deficit disorder or dyslexia, were native Hebrew speakers and were naïve to the purpose of the study. Participants were recruited using the university’s experiments system and were randomly assigned to one of the ten groups: 2 tasks (manual vs. vocal) X 5 types of neutral stimuli. Seven participants were excluded from the analysis due to an extremely low accuracy rate (< 85%) and four participants were excluded due to extremely slow RTs (> 2.5 SD). The analyzed sample included 278 participants (46 males, 232 females, mean age: 24.6, range = 20–44, SD = 6.3). A power analysis using G*Power 3.1.9.7 (Faul et al., 2007) indicated that the current sample allowed for the examination of the three-way interaction (congruency X neutral-type X response-type) at a power > 99% to test for small effect sizes with a Type 1 error (α < 0.05). The parameters that were used were as follows: Cohen's f effect size of 0.20, 10 groups and 3 measurements.

Stimuli

Congruent and incongruent stimuli were four color words that could appear in four different colors each (green, blue, yellow, and red). This yielded 12 incongruent stimuli (e.g., ‘RED’ written in blue) and 4 congruent stimuli (e.g., ‘RED’ written in red). Neutral stimuli were divided into five different conditions, with 3 exemplars in each condition (which were chosen based on a pilot study of a lexical decision task in which the non-words were the stimuli of interest—see supplementary materials): (1) symbols—strings of non-letters symbols (e.g., %$#@,) (2) meaningless same-letter strings (e.g., XXXX), (3) illegal-letter strings that were created by using the Hebrew ‘final forms’ of letters in the middle of the word,Footnote 4 (4) pseudo-words, which follow the structure of regular words, are readable, but meaningless, (5) real-words, all with the same frequency of use (i.e., ‘Building’, ‘Form’, ‘Window’; Frost & Plaut, 2005). The first letter of each of the lexical neutral stimuli differed from the first letter of all four color-words. Each of the neutral exemplars (3 for each neutral condition) could appear in all 4 colors and therefore, 12 different neutral exemplars were created for each condition.Footnote 5 All stimuli appeared at the center of a black screen in bold Courier New font. Each word measured approximately 0.5 cm in height and 1.5 cm in width. Data collection and stimuli presentation were controlled by E- Prime 3 software (Psychology Software Tools, Pittsburgh, PA, USA). For the vocal task, we used E-Prime Chronos microphone device.

Procedure

Participants sat approximately 60 cm in front of the screen in a quiet experiment room and were tested separately. They were asked to respond as quickly and as accurately as possible to the ink color of each stimulus. In the manual task, the response keys were the numbers 1–4 on the numeric keypad and were marked with colored stickers. Participants were asked to respond using four fingers (two fingers on each hand) and received feedback throughout the whole experiment only in cases of an incorrect response. In the vocal task, participants were asked to respond to a microphone that was placed in front of them. An experimenter recorded the actual responses of the participants.

The manual task started with 48 trials of a key-mapping practice block (participants were asked to manually respond to the ink color of a colored asterisk) and then moved on to a Stroop practice block, whilst the vocal task started directly with the Stroop training. As part of the practice block, participants completed 48 practice trials, which were identical to the experimental trials but were not analyzed. Each trial began with a 1000 ms fixation point followed by a task-cue that appeared for 1000 ms. After a 500 ms interval, the target Stroop stimulus appeared for 2000 ms or until response. Each trial ended with a 500 ms interval. Following the high neutral proportion procedure (Bugg et al., 2011b; Goldfarb & Henik, 2007; Tzelgov et al., 1992), the proportion of the neutral trials was 75% and the proportion of congruent and incongruent trials was 12.5% each. The experimental block consisted of 192 trials: 144 neutrals (each of the 12 color X neutral exemplars appeared 12 times), 24 incongruent (each of the 12 color X color name exemplars appeared twice), and 24 congruent (each of the 4 color X color name exemplars appeared 6 times), presented in random order. In each task (manual or vocal), the procedure was identical for all neutral-type conditions, except for the type of neutral that was presented. At the end of the experiment, participants were debriefed and thanked for their participation.

Results

We carried out two identical analyses, for reaction time (RT) and accuracy rates, separately. Overall accuracy rates were very high (0.957) and thus we focus our analyses on the more sensitive and traditional RT analyses. Accuracy analyses are reported below.

RT analysis

Mean RT was calculated for correct responses of each participant in each condition. Trials in which the RTs were shorter than 200 ms were excluded (an average of < 1 trial per participant). A 5X3X2 mixed model analysis of variance (ANOVA) was carried out on RT data in the Stroop task with congruency (congruent vs. neutral vs. incongruent) as a within-subject factor and neutral-type (symbols vs. same-letter strings vs. illegal-letter strings vs. pseudo-words vs. real-words), and response-type (manual vs. vocal) as between-subject factors (see Table 1). A significant main effect was found for congruency, F(2, 536) = 884.78, p < 0.001, η2p = 0.77, indicating a significant interference effect [incongruent RT > neutral RT; t(268) = 30.19, p < 0.001], but no facilitation effect [neutral RT—congruent RT; t(268) = 0.69, p = 0.999], after a Bonferroni correction. In addition, we observed a main effect for response-type, such that the vocal group was faster than the manual group, F(1, 268) = 30.81, p < 0.001, η2p = 0.10, and no main effect for neutral-type condition, F(4,268) = 0.071, p = 0.586. The two-way interaction between response-type and congruency was significant, F(2, 536) = 20.71, p < 0.001, η2p = 0.07, such that the simple main effect for congruency that was obtained in the vocal task, F(2, 268) = 357.90, p < 0.001, η2p = 0.73, was stronger than the simple main effect for congruency that was obtained in the manual task, F(1, 268) = 254.23, p < 0.001, η2p = 0.67. Most importantly, there was a significant neutral-type X congruency X response-type three-way interaction, F(8, 536) = 3.38, p < 0.001, η2p = 0.05. To further investigate this three-way interaction, we have carried out 2 two-way ANOVAs with congruency as a within-subject factor and neutral-type as a between-subject factor, for each response-type separately. The simple interactions of congruency X neutral-type were significant in both the manual, F(8, 268) = 3.46, p = 0.008, η2p = 0.09, and the vocal tasks, F(8, 268) = 7.78, p < 0.001, η2p = 0.19 (see Table 1).

Table 1 Summary of results in the vocal and manual tasks across congruency and neutral-types

To further investigate the data according to our a-priori hypotheses, we calculated the interference (RTincongruent—RTneutral) and facilitation (RTneutral—RTcongruent) effects for each participant in each condition. We then tested whether the interference and facilitation effects were affected by the different neutral-types and response-types, by carrying out 2 two-way ANOVAs, on interference and facilitation data separately, with neutral-type and response-type as between-subject variables (see Table 1). For the interference effect, we found a significant main effect for neutral-type, F(4, 268) = 5.38, p < 0.001, η2p = 0.07. Post-Hoc comparisons with Bonferroni correction revealed that the interference for symbols was larger than the interference of illegal-letter strings, t(268) = 2.86, p = 0.045, d = 0.35, and real-words, t(268) = 4.53, p < 0.001, d = 0.55. There was no main effect for response-type, F(1, 268) < 0.001, p = 0.99, and no response-type X neutral-type interaction, F(4, 268) = 2.00, p = 0.095, indicating that this pattern of differences between the interference effects of the different neutral-type conditions did not differ between manual and vocal tasks. For the facilitation effect, we carried out the exact same analysis and observed significant main effects for neutral-type, F(1, 268) = 84.33, p < 0.001, η2p = 0.24, and for response-type, F(4, 268) = 12.79, p < 0.001, η2p = 0.16, and most importantly, a significant two-way response-type X neutral-type interaction, F(4, 268) = 4.80, p < 0.001, η2p = 0.07 (Fig. 2).

Fig. 2
figure 2

Means of the facilitation effect for the different neutral groups in the experiment across manual and vocal tasks. Error bars represent ± one standard deviation around the mean of each condition. The three effects in accordance with our hypothesis are represented by two black lines, each representing the neutral-type groups’ means that were used in the comparison. *significant at 0.05 level. **significant at 0.001 level. n.s. not significant at 0.05 level

Next, we further analyzed this two-way interaction on facilitation by comparing the different neutral-type conditions for the manual and vocal tasks separately, using a Bonferroni correction. For the manual task, we found a main effect for neutral-type, F(4, 268) = 2.96, p = 0.020, η2p = 0.04, with a significant reverse facilitation effect (RTcongruent > RTneutral) in all non-word conditions (same-letter strings: t(268) = 3.39, p = 0.004, d = 0.21; illegal-letter strings: t(268) = 2.82, p = 0.026, d = 0.17; pseudo-words: t(268) = 5.44, p < 0.001, d = 0.33; and symbols: t(268) = 3.72, p > 0.001, d = 0.23). However, in the real-words condition, a non-significant reverse facilitation was exhibited, t(268) = 0.84, p = 0.99. This non-significant reverse facilitation revealed a lexical effect since it was statistically different compared to the significant reverse facilitation of the non-lexical conditions, t(268) = 1.69, p = 0.046, d = 0.1. However we did not observe either an orthographic effect (there was no difference between symbols and non-lexical neutrals, t(268) = 0.08, p = 0.475) or a pronounceability effect (there was no difference between pseudo-words and non-pronounceable letter strings (same-letter strings + illegal-letter strings), t(268) = 1.20, p = 0.115). For the vocal task, we found a significant main effect for neutral-type, indicating a reverse facilitation effect only in the symbols condition, t(268) = 3.22, p = 0.007, d = 0.20, whilst there was a significant regular facilitation effect (RTneutral > RTcongruent) in all other neutral conditions (same-letter strings: t(268) = 2.90, p = 0.002, d = 0.18; illegal-letter strings: t(268) = 3.13, p = 0.004, d = 0.19; pseudo-words: t(268) = 2.67, p < 0.042, d = 0.16; and real-words: t(268) = 7.39, p < 0.001, d = 0.45; Fig. 2). As in the manual task, we obtained a lexical effect in the vocal task such that the facilitation that was obtained in the real-words condition differed from the facilitation in the other non-lexical conditions, t(268) = 3.22, p < 0.001, d = 0.39. In addition, we observed an orthographic effect such that non-words orthographic conditions (pseudo-words, illegal-strings and same-letter strings) differed from symbols, t(268) = 3.22, p < 0.001, d = 0.39) but there was no pronounceability effect, since the non-pronounceable lexical conditions did not significantly differ from pseudo-words, t(268) = 0.17, p = 0.433.

ACC analysis

A 5X3X2 mixed model ANOVA, identical to the previous RT analysis, was carried out for ACC data (see Table 1). A significant main effect was found for congruency, F(2, 536) = 96.53, p < 0.001, η2p = 0.26, indicating lower accuracy for incongruent trials compared to congruent trials, (268) = 10.11, p < 0.001), and neutral trials, t(268) = 3.00, p = 0.008, after a Bonferroni correction. In addition a significant main effect was found for response-type group with significant lower accuracy in the vocal group compared to the manual group, F(1, 268) = 22.67, p < 0.001, η2p = 0.07, but no significant main effect for neutral-type condition, F(4, 268) = 0.82, p = 0.515. Finally, the 3-way interaction was not significant, F(8, 536) = 0.92, p = 0.502, but a significant interaction between response-type and condition was observed, F(2, 536) = 37.29, p < 0.001, such that the difference between incongruent trials and neutral and congruent trials were larger in the vocal group, t(268) = 2.50, p = 0.004, t(268) = 3.25, p = 0.004, for neutral and congruent conditions respectively, compared to the manual group, t(268) = 11.43, p < 0.001, t(268) = 11.45, p < 0.001, for neutral and congruent conditions, respectively).

Discussion

The goal of the current study was to investigate the boundary conditions in which the cognitive system triggers the stimulus-driven task of reading and thus task conflict is evoked. To that aim, we set up a low control (75% neutrals) between-subject design with a different neutral stimulus for each condition (symbols, same-letter strings, illegal-letter strings, pseudo-words, and real-words) using a manual and a vocal Stroop tasks. Contrary to our hypothesis, in the manual task, a comparable reverse-facilitation effect was observed for all neutral conditions except for the real-words condition. This indicates that all non-word neutral conditions, triggered a similar and a very low (if at all) task conflict, whilst the real-words condition triggered a stronger task conflict compared to the other neutral conditions. For the vocal task, our results indicated a regular facilitation effect in all conditions except for the symbols condition, which exhibited a reverse facilitation effect. This indicates that all neutral conditions, except the symbols condition, triggered task conflict in the vocal task. Furthermore, a stronger facilitation effect was obtained in the real-words condition compared to the other non-word orthographic conditions (same-letter strings, illegal-letter strings, and pseudo-words), which did not differ from each other in the magnitude of the facilitation. In sum, in both the manual and the vocal tasks we found that the reading task, and hence task conflict, is most strongly triggered in the real-words condition and that symbols trigger very minimal (or no) task conflict. However, while in the manual task we found that all non-word orthographic neutrals that are composed of letters trigger minimal (or no) task conflict (comparable to symbols), in the vocal task, we found that these non-word orthographic neutrals trigger significantly more task conflict (in-between the symbols and real-words conditions). In both tasks, there were no differences between these non-word orthographic neutral-types. Finally, for the interference effect, we found no significant differences between the different response-types and neutral-types, indicating that our results are restricted to task conflict.

As mentioned in the Introduction, competing models argue about the processes that trigger the task of reading in the vocal task. While all models agree that the phonological system is activated in the vocal task, some researchers suggested that orthographic stimuli will trigger more task conflict than non-orthographic stimuli (Levin & Tzelgov, 2016; McWilliam et al., 2009), and others have suggested that orthographic and pronounceable stimuli will trigger more task conflict than non-pronounceable stimuli (Kinoshita et al., 2017; Monsell et al., 2001). Our results from a between-subject low-control design indicate that real-words trigger the strongest task conflict (largest regular facilitation), all other non-word orthographic conditions trigger lower and similar task conflict (regular facilitation), and symbols trigger the smallest (or no) task conflict (a reverse facilitation). In other words, our results mostly agree with the prior models, which suggest that all orthographic stimuli trigger the task of reading in the vocal task, but do not support the latter model as we did not find an effect for pronounceability. In addition, and similarly to the results from the manual task, we did find an indication for ‘word superiority’ effect, manifested in significant differences between real-words and the rest of the non-word orthographic stimuli. While in the manual task this word superiority effect seems to be the only process that took place, in the vocal task the word superiority complimented the effect of the orthographic process. Nevertheless, the data from both tasks indicate a specific and additional process that is restricted to real-words.

That real-words trigger the task of reading and task conflict more than all other non-word conditions is supported by brain studies that demonstrate a stronger left hemisphere activation for real-words compared to pseudo-words (Binder et al., 2003; Xiao et al., 2005), same-letter strings (Menard et al., 1996), or non-pronounceable combination of letters (e.g., XPTBN; Joubert et al., 2004). In addition, EEG studies have demonstrated the superiority of real-words compared to pseudo-words at 200-300 ms after stimulus onset in different brain regions (Gansonre et al., 2018; Martin et al., 2006; Proverbio & Adorni, 2009). These results indicate that the system is able to differentiate real-words from non-words at a very early stage of processing. The unique effect of real-words triggering the stimulus-driven behavior of reading can also be discussed in light of the dual-route cascade model (Coltheart et al., 1993; Forster & Chamber, 1973; Pritchard et al., 2012; Sadoski & Paivio, 2004). According to this model, the visual recognition of real-words is carried out in two routes: (1) a lexical route, which can recognize known words through a lexicon lookup procedure, and (2) an orthographic (sub-lexical) route, which can be used to read non-words through letter recognition and grapheme-phoneme correspondence rules. That symbols did not differ from other non-word orthographic conditions in the manual task indicates that in this modality, only the lexical route activated task conflict, whereas the orthographic route did not trigger the task of reading. In the vocal task, on the other hand, we found evidence that the stimulus-driven reading task, and task conflict, were triggered by both the lexical and the orthographic routes. The largest task conflict in all non-word orthographic neutrals (consisting of letters), indicates activation of task conflict via the orthographic route, while the strongest task conflict in the real-words condition indicates activation of task conflict via the orthographic and lexical routes.

The results of the present study revealed qualitative and quantitative differences between the manual and vocal tasks. Consistent with Kinoshita et al. (2017), our results agree with the notion that triggering task conflict in different neutral conditions depends on response-type. Similarly, our results do not stand in accordance with response competition models, which predict no interaction between response-type and neutral conditions (Morton & Chambers, 1973; Roelofs, 2021). These latter models posit that the differences between the manual and vocal tasks occur only in the response selection or response execution stages, thus the task of reading itself, which occurs at an early stage, is not expected to be contingent upon response-type. Our results suggest that the differences between response-types occurred prior to the response selection stage, such that the two modalities activated the lexical and the orthographical processes in the task of reading, in a different manner. Furthermore, the differences in task conflict activation between manual and vocal tasks are consistent with the Bayesian reader model (Norris, 2006). According to this model, the ideal reader combines perceptual information with prior data on the probabilities of the input, in order to maximize performances. In other words, the goal of the reader, or more specifically the requirements of the task, can affect the processing of the input in an endeavor to achieve better performance in the task at hand. In the case of the present study, the reading task was activated differently such that in the vocal task, the system considered the lexicality and orthography of the input to be equally important, while in the manual task only the lexicality of the input mattered. Therefore, the task of reading changes according to the tasks or goals of the reader. Thus, the results of the present study are in accordance with Bayesian models of cognitive control (Jiang et al., 2014), which suggest a more flexible and context-dependent control process rather than a dual-control mechanism. Finally, the results are also not in accordance with Parris et al. (2019) who demonstrated grapheme-to-phoneme processing in the manual version of the Stroop task. The lack of differences between the non-word orthographic conditions and the symbols condition in the manual task rules out sub-lexical processing in this response-type, therefore, even though we found evidence for a semantic effect in the manual task, there was no evidence for sub-lexical processes. However, in accordance with the literature, both the semantic (Brown & Besner, 2001) and sub-lexical processes (Parris et al., 2019) were found in the vocal task.

As mentioned above, the literature on the existence of task conflict in different types of neutrals is inconsistent (Goldfarb & Henik, 2007; Kalanthroff et al., 2018; Keele, 1972; Kinoshita et al., 2017; Levin & Tzelgov, 2016; McWilliam et al., 2009; Monsell et al., 2001). We suggest that one of the major sources of variance that has been widely overlooked is study design: within- vs. between-subject designs. It has been shown that priming real-words in a Stroop block could elicit the processing of non-words conditions (Duscherer & Holender, 2002). Moreover, it has been demonstrated that the mere existence of real-word neutrals increases the tendency to read when non-word neutrals are presented. For example, Mills (2017) had shown that words in trial N-1 reduced task conflict in trial N as compared with symbols in trial N-1. These findings might account for the lack of pronounceability effect in both the manual and the vocal conditions of the current study, which contradicts previous findings from vocal within-subject designs (Kinoshita et al., 2017; Monsell et al., 2001). It is possible that the existence of real-word neutrals in the within-subject designs facilitated the reading task in the entire block, and thereby elicited task conflict in non-word neutrals as well. In the between-subject design, on the other hand, pseudo-words never appeared with the presence of real-word neutrals. Moreover, it is important to note that even in some vocal within-subject experiments, pronounceability effects were not present (McWilliam et al., 2009), and thus the question of whether and when pronounceable stimuli trigger task conflict awaits future investigations. Taken together, the utilization of a between-subject design ruled out the possible carry-over effects that characterize within-subject designs and might explain the discrepancy in findings between different within-subject studies that tested different neutral conditions and task conflict activation. In addition, the within-subject designs mentioned above (Kinoshita et al., 2017; Levin & Tzelgov, 2016) included blocks in which the majority of the trials were different lexical conditions (e.g., high or low-frequency words, color-related words, etc.) and therefore, an alternative explanation for the different pattern between the present between-subject design and the above-mentioned within-subject designs can be the frequency of lexical trials in the block. This suggestion awaits future research that will focus on the effect of control adaptation due to different lexical and non-lexical conditions. Finally, it is important to consider that although a between-subject design is not vulnerable to carry-over effects, it might be vulnerable to a selection threat to internal validity caused by potential primary individual differences. The relatively large sample size of university students, the random allocation, and the focus on effects (interference/facilitation) rather than on RTs to different conditions, minimizes this threat, although it should still be considered.

The current study is the first to conduct a large-scale, low-control, between-subject investigation on the effect of different neutral stimuli on task conflict in both manual and vocal tasks of the Stroop task. The results of the present study offer a possible solution to the question ‘what triggers the stimulus-driven reading task, and thus task conflict, in the Stroop task?’ In both tasks, we found that real-word neutrals trigger the task of reading significantly more than any other stimuli, indicating a ‘word superiority’ effect. This finding supports the notion that in both tasks the system is able to identify real-words very early in the process and that the task of reading, and task conflict, are triggered via the lexical route. While this is the only process evident in the manual modality (there was little or no task conflict in all other neutrals), in the vocal task additional processes took place. Based on the finding that all stimuli consisted of letters triggered task conflict, we conclude that in the vocal task, conflict was also triggered via the orthographic (sub-lexical) route, which is responsible for letter recognition and grapheme-phoneme correspondence. We believe that this pattern indicates that in the vocal task, the phonological route is more activated in general. Our data also implies that the answer for what triggers the stimulus-driven reading task and thus, task conflict, should consider not just the response-type and the level of control in a specific trial, but also the type of the design (within vs. between-subject designs), as within-subject designs, might be affected by carry-over effects between the different neutral-type. The latter requires future investigation.