Introduction

Participants in the Stroop task (Stroop 1935) are presented with a color-word printed in color and must respond to the color and ignore the word. They are faster, on average, to name the print color of a congruent color-word stimulus (e.g., the word RED printed in red) than an incongruent stimulus (e.g., GREEN in red). The Stroop effect is calculated as the difference in response time (RT) between congruent trials and incongruent trials, and demonstrates the unintended influence of the word. It is one of the most replicated experimental effects in cognitive psychology, yet despite years of research, there is no agreed theoretical resolution as to the cause of the effect (MacLeod 1991; Eidels et al. 2010; Eidels 2012).

Theoretical accounts of the Stroop effect (e.g., Palef & Olson, 1975; Logan, 1980; Cohen, Dunbar, & McClelland, 1990; Melara & Algom, 2003) must assume that participants process the meaning of the printed words despite instructions to ignore them and focus on the print color; otherwise the time to respond ‘red’ should be the same for any word printed in that color, regardless of whether it is congruent or incongruent—and hence, there would be no behavioral Stroop effect.

RTs have been the preferred dependent variable in many psychological experiments (see Luce, 1986), including the Stroop task. Researchers used RTs to determine that the Stroop effect is contingent on attentional resources (Kahneman and Chajczyk 1983), practice (MacLeod and Dunbar 1988), dimensional discriminability and experimental correlation (Dishon-Berkovits and Algom 2000), target set size (Heij and Vermeij 1987), and the number of colored letters in the stimulus word (Besner et al. 1997).

Despite their benefits, RTs provide only a single estimate of processing duration at the end of each trial, meaning there are limitations to what RTs can tell us about the time course of an experimental effect. For example, an RT of 500 ms on a given trial of the Stroop task suggests that it had taken 500 ms to perceptually encode a stimulus, process and decide on the color of the stimulus, and execute a behavioral response. However, we do not know how long each of these sub-processes takes.

There are statistical methods that provide insight into the time course of experimental effects. Parametric studies can fit sequential sampling models to RT distributions and estimate perceptual encoding time and rate of processing (e.g., Ratcliff & McKoon, 2008; S.D. Brown & Heathcote, 2008), but such models require strong assumptions and are therefore less general. Alternatively, graphical exploration methods of RT distributions, such as the delta plot, inform researchers about the time course of experimental effects with less assumptions (Jong et al. 1994).

Delta plots of Stroop data

Delta plots display graphically how an experimental effect changes across different points of two RT distributions. For instance, the left panel of Fig. 1 shows congruent and incongruent RT distributions of a hypothetical Stroop task. For delta plots, instead of calculating the Stroop effect as the difference between mean RTs of incongruent and congruent conditions, a researcher calculates the effect at a desired number of percentiles (e.g., at each decile of the two distributions). They could then plot the effect at each decile against the mean RT of the two distributions at each decile.Footnote 1 The resulting delta plot is shown in the right panel of Fig. 1. This function is always above 0, meaning that RTs in the incongruent distribution are slower than RTs in the congruent distribution for every decile. The positive slope suggests that the difference between the incongruent and congruent RTs is bigger for slower RTs than faster RTs.

Fig. 1
figure 1

The left panel depicts RT distributions for the congruent and incongruent conditions of a Stroop task. The right panel depicts the resulting delta plot

Pratte et al. (2010) used delta plots to investigate the distributional properties of the Stroop and Simon effects, and found delta plots with different slopes. Specifically, the slope for the Stroop effect delta function was positive, with small values for fast responses and larger values for slower responses. In contrast, the slope of the Simon effect delta function was negative, with large values for the fast responses and smaller values for slower responses. However, the delta plot slope depends on the exact nature of the task (cf. Proctor, Vu, & Nicoletti, 2003; Proctor & Shao, 2010; Dittrich, Kellen, & Stahl, 2014). The negative slope for the latter suggests that the Simon effect results from a conflict at the motor response stage, which decays over time. The positive slope for the Stroop effect delta function suggests the effect results from a conflict at the processing stage, which grows in magnitude as the participant processes the stimulus for a longer duration.

A potential limitation of the delta plot method is its sensitivity to the difference in variance between the distributions in question. This point is illustrated in Fig. 2, where we show delta plots that compare gamma distributions with different means and standard deviations (SD) to a gamma distribution with fixed arbitrary parameters—mean = 12 and SD = 3.Footnote 2 The middle panel serves as a benchmark and shows the delta plot of two identical gamma distributions with mean = 12 and SD = 3, resulting in a flat line at 0. Each column represents distributions with a different variance and each row represents distributions with a different mean. The effect of changes in variance on the slope of the delta plot, while the mean is held fixed, can be observed by moving along the columns within any given row. Critically, if one RT distribution had the same mean but larger variance than the other RT distribution, then the slope of the delta plot will be positive, regardless of the mean RT. Note that for empirical RT distributions, the standard deviation of RT typically increases linearly with the mean (Wagenmakers and Brown 2007), although there are cases where this trend does not hold, such as the Simon task (Pratte et al. 2010).

Fig. 2
figure 2

Simulated delta plots. Each delta plot is calculated by comparing gamma distributions with different means and SDs to a gamma distribution with mean = 12 and SD = 3. Each column shows a distribution with different SD and each row shows a distribution with different mean

There are limitations to investigating the time course of the Stroop effect using mean RTs, as they rely on a single measurement of latency at the end of each trial and ignore distributional information. The delta plot method makes use of the entire RT distribution, but effects of mean and variance are hard to discern (see Fig. 2). Moreover, RT distributions can have different shapes and be shifted in time. Ideally, researchers would like a measure that produces identically shaped distributions so that they can compare responses across the two distributions at the same points in time. We offer an alternative to the delta plot method, a method that allows researchers to look at experimental effects across identical distributions at the same points in time.

Reach-to-touch paradigm

A promising method in cognitive science is the reach-to-touch paradigm (Finkbeiner et al. 2014). For instance, in the Simon task literature, the reach-to-touch paradigm has already been used to investigate temporal properties of the effect (Porcu et al. 2016; Buetti and Kerzel 2010; Finkbeiner and Heathcote 2016). In a typical design, participants may be presented with a cognitive task that requires a speeded choice between two or more response alternatives. Participants execute their response by reaching out to designated spatial locations, say, left for color green and right for color red. The arm-movement trajectories are recorded and serve as the dependent measure.

There are two key components to the reach-to-touch paradigm. First, it is a continuous response measure that can reveal experimental effects as they emerge over time. Arm movements in the reach-to-touch paradigm have been considered a window into cognitive processes (Song and Nakayama 2009; Spivey et al. 2005). More recently, Finkbeiner and colleagues (Finkbeiner et al. 2014; Quek and Finkbeiner 2013; 2014) pointed out that this continuous response measure should be used with the second key component, the response signal procedure (Reed 1973; 1976).

The current study instructs participants to initiate their movement within 300 ms of an imperative ‘go’ signal. This go signal is the final beep in a sequence of three beeps. Importantly, on each trial, the three beeps occur randomly, so that the final ‘go’ beep appears at different points in time relative to the onset of the target stimulus. The time at which the participant begins moving relative to stimulus onset is the movement initiation time (MIT). For example, MIT = 0 means that the subject started to move their finger at the same time as the stimulus was presented. Similarly, MIT = 300 indicates that the subject lifted their finger from the start point 300 ms after the stimulus onset. A negative MIT means that the subject starting moving their finger before having seen the stimulus. MITs represent movements that commence at a range of different stimulus processing times. In the Stroop milieu, we can examine the magnitude of the Stroop effect for various processing times (i.e., is the observed effect larger on late lift-off trials, which presumably allow more time for processing).

The forced-reading Stroop task

As well as the statistical methods discussed, recent experimental methods have shed light on the nature of the Stroop effect. Eidels et al. (2014) employed a novel forced-reading Stroop task and found the standard Stroop effect is only a proportion of the Stroop effect that could be observed. In the standard task, participants are asked to classify the print color of color-words irrespective of the content of the word. In the forced-reading task, participants were asked to classify the print color of color-words (e.g., RED, GREEN), but withhold their response when presented with non-color-words (BED, GREED). To conform with the instructions, participants were forced to read every word presented. Consequently, the forced-reading Stroop task yielded a Stroop effect derived from fully processed words on every trial. Eidels et al. found a larger Stroop effect in the benchmark forced-reading task compared to the standard Stroop task, and suggested that the nature of reading occurring in the two tasks is not comparable.

The current study

In the present study, we use both the standard and forced-reading Stroop tasks in conjunction with measurements of arm-reaching trajectories to understand the time course of the Stroop effect.Footnote 3 The forced reading task is a useful benchmark, as it yields a Stroop effect from fully processed words.

The key aspect of our study is that distributions of MITs do not differ across conditions in our experiment (see Fig. 3). There were no differences in the means or the SDs of how long subjects view and presumably process the stimulus before initiating their movement. Therefore, we compared arm reaching trajectories across two identically shaped Stroop distributions to see if the Stroop effect unfolds at a different rate, for the standard and forced task, at the same points in time. Our analysis is not compromised by differences in variances or shapes between the congruent and incongruent distributions, thus our study addresses concerns with the delta plot.

Fig. 3
figure 3

Distribution of MITs for congruent and incongruent conditions in the standard and forced Stroop tasks. The four MIT distributions of interest do not differ in location or scale. Zero value on the x-axis means that the participant initiated movement at the same time as the stimulus onset. The figure shows that the majority of responses were initiated after stimulus onset. In the standard task, the mean MIT was 172 ms in the congruent condition and 169 ms in the incongruent condition. In the forced task, the mean MIT was 166 ms in both the congruent and incongruent conditions

We address four key research questions in our study. First, we expect that participants will be more informed of the correct response with additional processing time. So, do participants get a better idea of how to respond at later MITs? Second, there is an increased task demand in the forced-reading Stroop task because participants are required to read each and every word, but does this task demand result in the decision process unfolding faster in the standard Stroop task compared to the forced-reading Stroop task? Third, researchers have inferred from delta plots that Stroop interference grows over time (Pratte et al. 2010). This conclusion is also in line with extant theories of the Stroop effect (Cohen et al. 1990; Melara and Algom 2003). However, given the limitations of the delta plot, we investigate whether the Stroop effect (when it exists) grows over time in the reach-to-touch paradigm. Finally, the standard Stroop has previously been found to be a proportion of the benchmark forced-reading Stroop effect (Eidels et al. 2014). With our method we look at whether the Stroop effect grows in magnitude in the forced-reading task more than the standard Stroop task as stimulus-processing/viewing time increases.

Method

Participants

Twenty psychology students from Macquarie University participated in the study in return for course credit. All participants were native English speakers with normal or corrected-to-normal vision, intact color vision, and reported to be right-handed. All participants took part in both the standard and forced-reading Stroop tasks.

Apparatus

A schematic of the experimental apparatus is presented in Fig. 4, with important materials labeled with numbers. Participants sat in front of a table and placed their right index finger on a small Velcro square (marked ‘0’ in Fig. 4), which marks the starting position and the return position for every trial. Stimuli were presented on a 27” Samsung LCD/LED monitor using the software ‘Presentation’. The monitor was situated 1 m away from the participants and centered with their body mid-line. Lateral response boards (30 cm x 9 cm) were placed to the left (1) and right (2) of the monitor, 75 cm apart and 50 cm from the front of the desk. A third response location (3) was marked on the desk between the participant and the monitor, 50 cm away from the front edge of the desk. A small motion-tracking sensor was taped to the tip of the right index fingertip of each participant. A Polhemus Liberty (240 Hz) electromagnetic motion tracking system was used to record the participants arm trajectories during the experiment. Participants wore headphones adjusted to a comfortable volume level, which were used to present a sequence of beeps.

Fig. 4
figure 4

A front-facing view of the apparatus used for the current experiment. Subjects placed their index finger on position 0 to start the trial. On each trial, participants reached toward the color response options, denoted by 1 and 2. In the forced-reading task, participants could also reach towards a neutral response option, denoted by 3

Stimuli

The standard Stroop task and the forced-reading Stroop task used the same stimuli. The stimuli were the color-words: RED and GREEN; and the non-color-words were: ROD, BED, RENT, QUEEN, GRAIN, and GREED. These non-color stimuli were specifically selected to ensure that participants would not base their responses on local cues. The non-color stimuli were the orthographic neighbors of the color-words with the closest frequency, such that each non-color-word shared all but one or two letters with a color-word (see Eidels et al., 2014). All words were printed in either the color red or green (with RGB values of 220/0/0 and 0/170/0, respectively) and were written in uppercase Garamond font, which at a viewing distance of 1 m allowed for a visual angle of 4 degrees. Each of the color-words could be congruent to the font color (e.g., RED printed in the color red) or incongruent (e.g., RED printed in the color green). All non-color words can be considered neutral to the font color, whether they were printed in red or green (but see T. L. Brown, 2011).

Design and procedure

Each participant attended two experimental sessions: the standard Stroop task and the forced-reading Stroop task. Sessions were separated by a minimum of 1 day and a maximum of 7 days. The order of task administration was counterbalanced across participants so that half of the participants performed the standard Stroop task first, and the remaining half performed the forced-reading Stroop task first. The order of word presentation was random for each participant. For each session, the participant performed in 840 trials. These trials were partitioned into seven blocks of 120 trials each. There were 2-min breaks between each block administration. In each block, color-words were presented 15 times per combination of color × word (RED in red, RED in green, GREEN in red, and GREEN in green), which made for 60 color-word trials. The six non-color words were presented five times per combination, making for 60 non-color word trials within the same block.

In the standard Stroop task, the participant classified the color of all the words presented by reaching out to the left or right lateral response boards (‘1’ and ‘2’ in Fig. 4). The left and right response boards corresponded to a red or a green color and were counter balanced across participants. In the forced-reading Stroop task, participants classified the color of color-words but did not classify the color of non-color words. For non-color words, participants responded by reaching towards a neutral response location (‘3’ in Fig. 4).

On each trial, a single word in color was presented at the center of a black screen. The timing of stimulus presentation was relative to the sound of three auditory beeps that were played through the participant’s headphones. The stimulus was randomly presented, with equal probability, at one of five different times prior to the third beep (300, 230, 150, 70, or 0 ms before the third beep). In four of the five timing conditions (i.e., 80% of trials), the stimulus was presented before the onset of the third beep, whereas in the 0-ms condition (20% of trials) the stimulus and the third beep were presented simultaneously. This procedure controls for participant’s anticipation of stimulus display. In both tasks, participants had to initiate their movement between 100 ms before and 200 ms after the third beep, meaning all movement begun within a 300-ms window around the third beep. Two example trial sequences are presented in Fig. 5. If participants failed to initiate movement within the allotted time-window, they would receive a loud buzzing sound and visual feedback to indicate they had responded ‘Too Early!’ or ‘Too Late!’. Once a movement was initiated, participants were required to maintain a continuous forward motion. Failing to do so terminated the trial and participants were provided with a buzz and appropriate visual feedback. Trials that were terminated via movement errors were repeated at a later stage of the block. The presentation of the trial terminated when the participant responded via the response points. The next trial followed after the sensor was returned to the start point.

Fig. 5
figure 5

Example trial sequences for trials in which stimuli were presented simultaneously with the third beep (top panel; 0-ms gap between the onset of the stimulus and the third beep) and 300 ms before the third beep (bottom). The red vertical bars below the time line indicate the onset of the three auditory beeps, the green bar above the time line indicates stimulus onset, and the blue box shows the 300-ms window in which participants begun their movements. In addition to the 0- and 300-ms trial types, there were also trials in which stimulus onset preceded the third beep by 70, 150, or 230 ms (not shown in the figure)

Data analysis

From the trajectories (Fig. 6), we calculated the velocity along the x-axis (x-velocity), which serves as our dependent measure. X-velocity quantifies how fast a participant is moving in the correct direction at any time during the trial. X-velocity is positive for movements towards the correct direction and negative for movements toward the incorrect direction. Thus, x-velocity provides data that ranges between fast movement in the correct direction (large positive values) and fast movement in the incorrect direction (large negative values). It is a more informative measure compared to nominal accuracy rates (correct/incorrect) or RTs, which range from ‘slow’ to ‘fast’ in only a positive direction.

Fig. 6
figure 6

Arm trajectories and mean arm trajectories of a single participant. The four panels include arm trajectories related to the four possible conditions obtained from crossing Task by Congruence. X and Y labels refer to the movement planes presented in Fig. 4. The Y-axis denotes forward motion and the X-axis denotes lateral motion. Trajectories only include correct responses. Thus, any differences between the left and right tracks are natural deviations in how the hand moves to a target situated to the left versus right of mid-line

Before calculating x-velocity, the positional data taken from the Polhemus Liberty device was filtered with a two-way low-pass Butterworth filter at 7 Hz, which reduced noise in the data. Then, x-velocity was derived from the numerical differentiation of the filtered positional data. The onset of movement was identified as the first of 20 consecutive samples in which the tangential velocity exceeded 10 cms/s. The offset of movement was identified as the first of 20 consecutive samples of tangential velocity that occurred after peak velocity and that were less than 10 cms/s.

For our analysis, we first improved the signal-to-noise ratio of the trajectories with a modified version of orthogonal polynomial trend analysis (OPTA). The OPTA procedure used here has been described in detail in Finkbeiner et al. (2014) and Finkbeiner and Heathcote (2016). In summary, OPTA uses a regression model with x-velocity as the dependent variable and MIT (with polynomial terms up to the 15 th order) as the predictor variable. Terms that did not explain significant variance were removed from the model, leaving only significant coefficients to predict x-velocity for each trial. After the OPTA analysis, we calculated the mean predicted x-velocity values from the first 350 ms of the reaching movement (initial x-velocity; Finkbeiner et al., 2014). We limit our dependent measure to the first 350 ms because the initial part of the trajectory represents the motor plan participants had formulated just prior to initiating their movement. The MIT latencies were used to group the initial x-velocity profiles into 20 equal bins (i.e., semi-deciles). Finally, the mean predicted initial x-velocity values were then subjected to a linear mixed-effects model with MIT semi-decile included as a fixed effect.

Results

Accuracy

Overall, across all participants, 91% of the responses were correct and valid. Mean error rate amounted to a negligible 1%. Invalid responses consisted of responding too early (3%), responding too late (5%), and not moving fast enough (2%). None of the participants were excluded from analysis due to accuracy.

Linear-mixed effects analysis

The linear mixed-effects analysis on predicted initial x-velocity (x-velocity hereafter) was conducted only for correct responses. We used a model comparison approach with the Bayesian information criterion (BIC Schwarz, 1978), which selects the best-fitting model while penalizing for complexity (i.e., number of parameters). The best-fitting model included task (forced, standard), condition (congruent, incongruent), and MIT (semidecile) as fixed effects. The model also included subjects as a random effect. The relationship between x-velocity and MIT was curvilinear and so the model included up to 3rd order terms for MIT. Here we report the coefficients (b), standard errors, and t-values of the best-fitting model. The criterion for significance is a coefficient magnitude of at least twice the corresponding standard error. For the ‘condition’ factor, the congruent condition was used as a baseline meaning that negative coefficients represent smaller x-velocities relative to the congruent condition. For the ‘task’ factor, the standard task was used as a baseline, meaning that negative coefficients represent smaller x-velocities relative to the standard task.

X-velocity was smaller in the forced Stroop task compared to the standard task (b=−34.80, SE = 0.32, t=−109.32). There was also a smaller x-velocity in the incongruent condition compared to the congruent condition (b=− 4.07, SE = 0.32, t=−12.87). X-velocity increased as a function of MIT semidecile (b=1606.03, SE = 21.07, t= 76.21). There was a significant interaction between task and condition, where the difference in x-velocity between congruent and incongruent trials was bigger in the forced task than the standard task (b=−2.76, SE = 0.46, t=−6.05). There was an interaction between task and MIT semidecile (b=−825.75, SE = 30.10, t=−27.43), but no interaction between condition and MIT semidecile (b=59.50, SE = 29.85, t=1.99). Finally, there was a three-way interaction between task, condition, and MIT semidecile (b=−665.88, SE = 43.08, t=−15.46).

To understand the nature of the three-way interaction we ran paired t-tests (congruent vs. incongruent) at each of the 20 MIT semideciles for both the standard Stroop task and the forced Stroop task (Fig. 7). We corrected for an inflated type I error rate with Bonferroni corrected p values. This analysis showed the Stroop effect unfolding over time. In the standard task, the Stroop effect was not significant for any of the 20 MIT semideciles.Footnote 4 However, in the forced task, the Stroop effect was significant for movements that commenced at the 7th MIT semidecile (∼ 133ms) through to the 20th and final MIT semidecile (∼ 338ms).

Discussion

Participants performed in both a standard and forced-reading Stroop task. The dependent measure for both tasks were the reaching trajectories. Using arm-reaching trajectories coupled with a signal-to-respond procedure allowed us to compare Stroop effects that are calculated from two identically shaped distributions. This way, we could compare Stroop effects at the same points in time and presumably equivalent processing times. At each point in time we observed how fast the participant initially moved towards the correct response—initial x-velocity.

First, we wanted to know if participants get a better idea of how to respond with increased stimulus processing/viewing time (processing time for brevity). Initial x-velocity significantly increased as a function of MIT. Thus, the participant moved faster toward the correct response when they had more processing time. This finding might not be surprising, as the participant would be more informed of the correct response with additional time. Nonetheless, this finding supports our claim that the impact of increased processing time can manifest in our initial x-velocity-dependent measure.

Second, we looked at whether their was a difference in overall performance in the forced-reading Stroop task compared to standard Stroop task, as the forced task had a greater task demand. We found that initial x-velocity increased more quickly as a function of MIT for the standard task than the forced task. This suggests that the participant’s decision process unfolded at a faster rate over time in the standard task compared to the forced task.

Finally, we wanted to know if the Stroop magnitude emerged with more processing time and if the effect grew in the forced-reading task more than the standard Stroop task. We found that the Stroop effect was not evident in neither the standard nor forced tasks prior to approximately 133 ms of processing time. Yet, after 133 ms, the Stroop effect was only evident in the forced task and not the standard task. In the forced task, the Stroop effect continued to grow in magnitude after 133 ms. The lack of effect in the standard task suggests the standard Stroop effect is only a proportion of the benchmark forced-reading Stroop effect. Crucially, this finding does not depend on the amount of processing time—although some processing time, namely 133 ms, is needed for significant differences between the standard and forced-reading Stroop task to emerge.

Validating findings from delta plots and forced-reading

Pratte et al. (2010) advocated the delta plot as a method for examining the time course of experimental effects, such as the Stroop effect. In their application of the delta function, they found that the Stroop effect was small for fast responses and large for slow responses. Their finding suggested that the effect grows in magnitude as processing time increased, but the slope of the delta plot is sensitive to the variance of the distributions in question, limiting its applicability. We showed that when the Stroop effect is observed, it grows in magnitude as the processing time increases, even when assessed without the confounds of delta plots.

However, a significant Stroop effect only emerged in the forced Stroop task. The lack of a Stroop effect in the standard task is not a surprising result. Despite the reputation of the Stroop effect as a robust phenomenon, it has been shown to depend on design as well as other contextual factors. The effect appears only when certain conditions are met, but can be very small and even reversed given particular contextual factors (e.g., Kahneman & Chajczyk, 1983; MacLeod & Dunbar, 1988; Dishon-Berkovits & Algom, 2000; Besner et al., 1997; La Heij & Vermeij, 1987). In his comprehensive review of Stroop research, MacLeod (1991) listed set-size, mode of response, and relative speed of processing (among other factors) as factors that determine the magnitude of the effect. Since MacLeod, a substantial number of empirical papers have shown the malleable nature of the Stroop effect and how, with small set size and manual responses, it can be quite small and even vanish (see, e.g., Melara & Mounts, 1993; Dishon-Berkovits & Algom, 2000; Sabri, Melara, & Algom, 2001; Melara & Algom, 2003).

Our experimental design was limited to only two colors and to a manual (rather than vocal) mode of response, both known to limit the magnitude of the Stroop effect (see also Eidels et al., 2010). Nonetheless, a marked Stroop effect was registered in the forced-reading task of the current study, suggesting that the effect can emerge even with two colors and a manual mode of responding. Its absence in the standard task does not merely reflect sensitivity to set size or to the mode of responding, but rather suggests that words in the standard Stroop task may not be fully processed, at least not to the same extent they are processed in the forced task.

The asymmetry in Stroop effects across the standard and forced tasks could potentially be explained by the complexity of the forced-reading task. Specifically, Eidels et al. (2014) documented longer response times in the forced task, with the additional time allowing for the irrelevant word to interfere with color naming more (e.g., Melara and Algom, 2003).

The present study offers another way to expand on the findings of Eidels et al. (2014) by providing the means to directly examine the magnitude of the Stroop effect at the same points in processing time across the two tasks. Participants in the present study initiated their reaching responses in synchrony with an imperative go signal, as opposed to the target stimulus. Thus, we were able to equate the movement initiation times across the two tasks, despite the differences in task difficulty/complexity. When we compared the magnitude of the Stroop effect across tasks at similar points in stimulus-processing time, we observe a clear Stroop effect in the forced-reading version of the task at all time points greater than 133 ms. In contrast, the magnitude of the effect is reduced at the corresponding time points in the standard version of the task. Expanding on Eidels et al. (2014) we show that the larger Stroop effect under forced-reading instructions is not an artifact due to longer processing time, but a genuine effect.

Theoretical implications

A central result of the current study is the larger difference observed between the incongruent and congruent conditions (i.e., larger Stroop effect) at longer movement initiation times (see Fig. 7) in the forced reading task. Existing theories of the Stroop effect may differ in their predictions concerning the magnitude of the effect as processing time increases. We briefly survey three popular models and discuss whether they can predict this observed result.

The horse race model of the Stroop effect (Palef and Olson 1975) suggests that activation of the word and color information accumulates in parallel. Word and color information accumulate toward a response channel, where task irrelevant word information arrives first. Because the word channel finishes first our cognitive system needs to wait for a response activated by the slower color information, which manifests as Stroop interference. This model has been criticized as it cannot account for data where the word information is delayed (e.g., Glaser & Glaser, 1982). In regards to our study, the horse race account cannot accommodate a Stroop effect that grows over time, which we observed in the forced reading task.

A current and popular account of the Stroop task is the parallel distributed processing model (Cohen et al. 1990). This model suggests that our system receives information (input) from different dimensions that travel down specific pathways to response mechanisms (output). Some of these pathways have stronger activation than others and the strength of this activation, not the speed, determines the output. In the Stroop task, the word pathway is considered stronger than the color pathway. Because word processing is more likely to reach the output node before color processing, additional activation needs to be recruited from task-specific nodes, which cause the system to run for many more processing cycles.Footnote 5 This account is in line with our results as longer processing times produce greater Stroop interference.

Similarly, our results are in line with the tectonic theory of selective attention (Melara and Algom 2003). In this model, evidence from target relevant information lead to the response required on the trial and values of the non-presented target lead to an incorrect response. A ratio of this evidence is calculated, and once the ratio reaches 1, a response is made. When there is more evidence for the non-presented target (i.e., when you have processed the word for longer) than the presented target, more processing steps are required to exceed the response threshold.

The fact that we found a Stroop effect in the forced task, but not the standard, sheds light on the nature of reading in the Stroop task. For instance, on any particular trial of the standard task, a participant might be processing the word to some extent or not reading the word at all. Eidels et al. (2014) posit a simple probability-mixture model to account for these results. Under this model, the empirical congruent and incongruent distributions we observe are binary mixtures of two unobserved distributions. A given trial is a sample drawn from the distribution associated with reading (with probability p) or the distribution free of word reading (with probability 1-p). The forced reading task increases the probability of reading to (p = 1). This should lead to an inflated Stroop effect compared with the standard task, which is what we observe in our data.

Conclusions

Our study has methodological and theoretical implications. The arm reaching paradigm can potentially reveal how experimental effects emerge over time. We found that when the Stroop effect is observed, it grows in magnitude with more time for processing—and this finding was demonstrated without the confounds of delta plots. We also showed that the nature of reading in the standard Stroop task is not comparable to a task in which we know the participant reads on every trial.