Introduction

The Stroop interference effect (i.e., longer color-identification times for color-incongruent (e.g., “BLUE” displayed in yellow) than for color-neutral words (e.g., the word “DEAL” displayed in yellow)) is generally larger in healthy older adults than in their younger counterparts (see Comalli et al., 1962, for the first empirical demonstration). Also, and importantly, this age effect in the Stroop task (Stroop, 1935) persists even after controlling for differences in processing-speed (e.g., Aschenbrenner & Balota, 2015; Aschenbrenner et al., 2017; Bugg et al., 2007; Jackson & Balota, 2013; Nicosia & Balota, 2020; Spieler et al., 1996). It is therefore thought to reflect an inhibition deficit (e.g., Hasher & Zacks, 1988) due to which older adults are less efficient at suppressing the word-dimension of color-incongruent Stroop words, leading them to experience greater competition at the response output stage (Spieler et al., 1996).

Indeed, according to dominant single-stage response competition models (e.g., Roelofs, 2003), incidental semantic processing of the irrelevant word-dimension of color-incongruent Stroop items generates a single type of conflict: response conflict. According to this view, the Stroop interference effect is considered a unitary phenomenon due solely to competition between two alternative responses indicated by the two dimensions of the Stroop stimulus.

In contrast, multi-stage models anticipate this incidental processing to generate an additional level of conflict at the level of semantics: semantic conflict (e.g., Zhang et al., 1999; Zhang & Kornblum, 1998). They therefore view the Stroop interference effect as a composite phenomenon comprising both response and semantic conflict.

Taking this idea as their starting point, several studies have set out to investigate the level of processing (e.g., response and/or semantic) at which the age-related differences in the Stroop task take their effects and, more specifically, whether semantic conflict is or is not affected by healthy aging. Indeed, the idea proffered by Spieler and colleagues that older adults are less efficient in suppressing the word-dimension of Stroop stimuli leads to the somewhat straightforward prediction that they should (also) experience a greater amount of semantic conflict. This is not what studies have found.

Li and Bosman (1996) and, later, Augustinova et al. (2018) reported greater magnitudes of standard Stroop interference (e.g., BLUEyellowDEAL/****yellow) in healthy older adults, but neither study reported age-related differences in the magnitude of semantic-associative Stroop interference (e.g., SKYyellowDEAL/****yellow).Footnote 1 Augustinova et al. (2018) subsequently claimed that the locus of the age effect in the Stroop task is at the level of response conflict rather than the level of semantic conflict or a combination of the two. Contrary to past conceptualizations (e.g., Spieler et al., 1996), these results imply that both older and younger participants are actually equally (in)efficient at suppressing the word-dimension of Stroop stimuli. In line with the most recent contributions to the literature on the above-mentioned inhibition deficit (e.g., Rey-Mermet & Gade, 2018), it further implies that older participants are rather less efficient in inhibiting the irrelevant response that is primed by the (irrelevant) word-dimension. This in turn reinforces the idea that the age-related deficit in inhibition (e.g., Andrés et al., 2008), or, more broadly, the age-related deficit in cognitive control, is not general (e.g., Bugg, 2014).

However, single-stage response competition models argue that semantic-associative interference (SKYyellowDEALyellow) measured in these prior studies results entirely from response conflict (e.g., Roelofs, 2003). According to this position, semantic associates elicit incorrect response activity (e.g., say “blue”/press blue for SKYyellow) indirectly – through their association with the response-set colors (blue in this case) – which explains in turn the smaller magnitude of semantic-associative interference (SKYyellowDEALyellow) compared to its standard (BLUEyellowDEALyellow) counterpart (but see, e.g., Neely & Kahan, 2001, Schmidt & Cheesman, 2005). Under this account, neither Li and Bosman’s (1996) nor Augustinova et al.’ (2018) studies satisfactorily demonstrated that the type of conflict that is spared by healthy aging is semantic (i.e., due specifically to a slowdown that occurs whenever two distinct yet closely related semantic representations are simultaneously activated in an amodal semantic network (see, e.g., Seymour, 1977, for discussion)).

To address this issue directly, the present study replaced semantic-associative items with items that induce semantic conflict in a way that cannot be accounted for by single-stage response competition models. Specifically, the study employed the two-to-one Stroop paradigm (De Houwer, 2003; hereafter 2:1). In this paradigm, all the distractors are part of the response set (e.g., BLUE, RED, GREEN, YELLOW), while responses for paired target colors are mapped to only one response-key (e.g., ‘F’ for blue and red and ‘J’ for green and yellow). As a result of this response-mapping, standard incongruent Stroop trials like BLUEyellow provide evidence toward two different responses (they are therefore termed different-response trials). Indeed, relevant color-dimension (YELLOW) prompts the correct response activity toward the ‘J’ key, whist the irrelevant word-dimension (BLUE) prompts the incorrect response activity toward the ‘F’ key. There is no such (response) conflict on trials like BLUEred since both dimensions of the Stroop stimulus provide evidence toward the same response. Consequently, significant interference generated by these so-called same-response trials is interpreted as representing the independent contribution of semantic conflict to overall Stroop interference (De Houwer, 2003; see, e.g., Hershman & Henik, 2020, for the most recent example).

However, with the exception of a few notable studies (see below), all studies employing this measure of semantic conflict – including De Houwer (2003) – have used color-congruent trials as the baseline against which semantic conflict is measured. Problematically, the difference between same-response and color-congruent trials could be entirely driven by facilitation on color-congruent trials and thus not involve any semantic conflict (Hasshim & Parris, 2014, 2015) – as unitary models of Stroop interference (Roelofs, 2003) would predict. In line with this interpretation, Hasshim and Parris consistently reported significantly longer response times (RTs) for same-response trials than for color-congruent trials, but no difference between same-response trials and trials that were free of facilitation (i.e., color-neutral word trials; see, e.g., Brown, 2011, and MacLeod, 1991, for discussion).Footnote 2

In contrast to Hasshim and Parris, Burca and colleagues’ study (accepted for publication) reported a significant difference between same-response and color-neutral trials. This suggests that the difference between same-response and color-congruent trials (i.e., when no color-neutral baseline is included) simply confounds the (semantic) conflict produced by same-response trials and facilitation produced by color-congruent trials (MacLeod, 1991). However, the extent to which this is actually the case remains uncertain, since Burca et al.’s study did not include color-congruent trials. As a result, no study has so far demonstrated that semantic conflict contributes to overall Stroop interference in the 2:1 Stroop paradigm independently of both response conflict and facilitation. Considering this as a necessary prerequisite for any empirical demonstration of the specific age effect (or lack thereof) on semantic versus response conflict in the Stroop task, the present study aimed to address this more fundamental issue.

To this end, items that are traditionally included in the 2:1 Stroop paradigm (De Houwer, 2003) were supplemented by color-neutral word trials (Hasshim & Parris, 2014). This addition enabled us to test adequately for the presence of semantic conflict predicted by the multi-stage models of Stroop interference (e.g., Zhang & Kornblum, 1998) that were favored a priori in the current study over the still-dominant single-stage response competition models (e.g., Roelofs, 2003). With this design, the study was able to more unambiguously measure age-related differences in response and semantic conflict. Consequently, if, as reported by past studies (Augustinova et al., 2018; Li & Bosman, 1996), semantic conflict (same-response trials – color-neutral trails) is indeed spared in healthy aging, its magnitude will not differ between young and old adults. In contrast, response conflict (different-response – same-response trials) will be greater in healthy aging adults as compared to their younger counterparts.

Method

Participants and design

Fifty-one older (i.e., over 65 years of age) and 50 younger (i.e., below 35 years of age) native French-speakers reporting normal or corrected-to-normal vision and presenting no impairment in color discrimination initially volunteered to participate in the study approved by the local ethics committee. One older participant presented a medical history that included a head injury and one other was undergoing a medical treatment for depression. Six months prior to inclusion in the study, none of the other participants suffered from other psychiatric and/or neurological disorders. None of them declared taking any drug and/or following any medical treatment that is known to impact the nervous system during the 48 h prior to inclusion. To ensure that the remaining participants fitted the inclusion criteria, they completed a psychometric evaluation battery. To this end, the older adults completed the Mini Mental State Examination (Folstein, 1975). The scores of two participants were lower than the cutoff score of 25 points. The older adults also completed the Frontal Assessment Battery (Dubois et al., 2000). None of them presented with a cutoff score of 16 (or 15, depending on the participant's sociocultural level). A depression scale was then administered to both the older and the younger adults. No older adults reached the cutoff score of 7 on the short version (15 items) of the Geriatric Depression Scale (Sheikh & Yesavage, 1986). In addition, none of the younger adults reached the cutoff score of 8 on Beck’s Depression Inventory (Beck, 1988). In both groups, working memory was assessed with the forward and backward digit span (WAIS, Wechsler et al., 2008). All participants had scores within the norm, recalling seven plus or minus two items. Finally, to further assess differences in processing speed, the French equivalent (Bugaiska et al., 2007) of the letter-comparison test (Salthouse, 1990) was administered in both age groups. After the exclusion of five participants in total (one was unable to perform the manual 2:1 Stroop task due to reduced hand mobility), the Stroop data of 46 healthy older (36 females and 10 males; Mage = 74.04 years) and 50 younger adults (41 females and nine males; Mage = 21.48 years) were analyzed in a 4 (Stimulus-Type: different-response vs. same-response vs. neutral vs. congruent) × 2 (Age-Group: older vs. younger) ANOVA, with the former factor as within-participants factor.

Apparatus, stimuli, and procedure

After the psychometric evaluation presented above, the participants completed a computerized version of the Stroop Task run using Eprime 2.0 software (Schneider et al., 2002). The participants were seated 70 cm in front of a 13-in. portable computer and instructed to identify the color of the stimulus presented on the screen, as quickly and accurately as possible, by pressing the appropriate color-button and to ignore everything else in the display. To this end, they were instructed to concentrate on the fixation cross (‘+’) that appeared for 2,000 ms in the center of the screen at the beginning of each trial. The stimulus remained on the screen until the participant responded or until 3,500 ms had elapsed.

All stimuli were presented in lowercase Courier font, size 18, on a black background and subtended an average visual angle of 0.9° high × 3.0° wide. The participants responded manually using a modified SRBox® consisting of two handles, each of which had a single response button at the top flanked by two color-stickers (blue and red on one handle, yellow and green on the other). The participants pushed these response buttons with their thumbs. This allowed them to hold each handle comfortably in their palms with the remaining four fingers. The placement of the handles in the right or left hand, respectively, was counterbalanced across participants.

To familiarize themselves with the color-button correspondence before completing the experimental block, the participants first completed 96 practice trials consisting of asterisks. Due to the low accuracy rate, eight older participants had to repeat this practice block (three of them were later excluded from further analyses) before proceeding to the experimental trials. As in Hasshim and Parris (2014, Exp. 2A), these consisted of 96 different-response, 48 same-response, 48 color-neutral, and 48 color-congruent trials. The trials were randomly intermixed in a single block. To this end, four (French) color-words – rouge [red], jaune [yellow], bleu [blue], and vert [green] – presented in both congruent and incongruent colors, and four non-color words – plomb [lead], liste [list], page [page], and cave [basement] – presented in all the colors, were used. They were paired on length and frequency via Lexique 3.38 (New et al., 2004).

Results and discussion

Five older participants were excluded from further analyses: one due to faulty recording, and the four others due to the fact that more than 33% of their data were removed from the analysis after the 3 SD correction and the exclusion of the wrong answers (see Table S1 in the OSM for demographic and psychometric data of the remaining participants). RTs greater than 3 SDs above or below each participant’s mean latency for each condition were excluded from the analysis (i.e., less than 2% of the total data, corresponding to 0.9% of younger adults’ data and 1.5% of older adults’ data). Consequently, RTs and errors of the remaining 91 participants (41 older and 50 younger) were first analyzed in an omnibus 4 (Stimulus-Type: different-response vs. same-response vs. neutral vs. congruent) × 2 (Age-Group: older vs. younger) standard and Bayesian ANOVA. The values for this latter ANOVA were calculated with JASP (JASP Team, 2020) and interpreted according to Lee and Wagenmakers (2013, adjusted from Jeffreys, 1961). All priors were equal. Recall that further reported BF10 is the Bayes factor giving the evidence for H1 over the null hypothesis (H0), whereas BF01 is evidence for H0 over H1.

For errors (see Table 1), these analyses revealed a main effect of Stimulus-Type, F(3,267) = 19.03; p < .001, ηp2 = 0.176; BF10 = 4.450e+7, but not of Age-Group, F(1,89) = .018; p = .894, ηp2 < .000; BF10 = 0.227/BF01 = 4.396. The Stimulus Type × Age-Group interaction was also significant, F(3,267) = 3.11; p=.041, Greenhouse-Geisser corrected, ηp2 = 0.034; BF10 = 1.130/BF01 = 0.884Footnote 3. However, the BF evidence in favor of an interaction was only anecdotal.Footnote 4

Table 1 Color-identification performance (mean (M), standard error (SE), 95% confidence interval (CI), and percent error (PE)) observed as a function of stimulus and age

Given that the analysis of RTs showed a considerable but expected (see Table S2 in the OSM) general slowing in older adults (i.e., the significant Stimulus-Type × Age-Group interaction, F(3, 267)=14.78; p<.001; ηp2=0.142; BF10=1.378e+6), which was qualified by a significant simple main effect of Age-Group for each type of Stimulus (all ps < .001, see Table S2 in the OSM), these RTs were z-scored (e.g., Jackson & Balota, 2013). The same omnibus ANOVA then revealed a main effect of Stimulus-Type, F(3,267) = 128.59; p < .001, ηp2 = 0.591; BF10 = 1.459e+59, which was also included in the significant Stimulus-Type × Age-Group interaction, F(3,267) = 10.36; p < .001, ηp2 = 0.104, BF10 = 706286.31, thus indicating that age-related differences persist even after controlling for generalized slowing (see Table 1).

Is there any semantic conflict in the two-to-one Stroop paradigm?

To answer this key question, we first analyzed the aforementioned main effect of Stimulus-Type. This analysis revealed that, as in De Houwer’s original study, the total Stroop effect (Mdifferent-response–Mcongruent, p < .001; BF10 = 1.814e+24) resulted from a significant contribution of both response conflict (Mdifferent-response– Msame-response; p < .001; BF10 = 4.134e+11) and the difference between same-response and congruent trials (p < .001; BF10 = 2.880e+10) – taken in previous studies as evidence for semantic conflict. However, the crucial addition of color-neutral trials enabled us to show that, overall, this latter difference did indeed confound the contribution of semantic conflict (Msame-response–Mneutral; p <.001; BF10 = 27038.729) and that of Stroop facilitation (Mneutral–Mcongruent, p = .016), which was moderate (BF10 = 7.835). This finding is consistent with MacLeod’s reasoning (1991) that in the absence of color-neutral trials, the total Stroop effect (Mdifferent-response–Mcongruent) is likely to confound two qualitatively distinct phenomena: the Stroop interference (Mdifferent-response–Mneutral) and facilitation (Mneutral–Mcongruent) effects.

The decomposition of the Stimulus-Type × Age-Group interaction further revealed that the simple main effect of Stimulus-Type was significant in both older, F(3,87) = 76.86; p < .001, ηp2 = 0.726; BF10 = 1.876e+33, and younger, F(3,87) = 35.65; p < .001, ηp2 = 0.551; BF10=3.019e+26, participants. Further pairwise comparisons conducted in both age groups revealed that the significant total Stroop effect had the same structure, although excluding Stroop facilitation (see Table 1 for descriptive statistics and magnitudes), which was no longer significant in younger adults (p = .114, BF10 = 0.621/BF01 = 1.610). The Stroop interference effect – which was significant in both age groups (young group: p < .001; BF10 = 5.746e+10; older group: p < .001; BF10 = 3.195e+10) – again resulted from the significant contribution of semantic (Msame-respose–Mneutral) and response (Mdifferent-response–Msame-response) conflicts (see Table 1).Footnote 5

Taken together, these results are therefore consistent with the idea that both semantic conflict and response conflict contribute to Stroop interference. This prerequisite being satisfied (see Introduction), we can now go on to investigate the extent to which these independent components of Stroop interference are influenced by healthy aging.

How does healthy aging influence semantic versus response conflict in the Stroop task?

To address this issue, the magnitudes of semantic and response conflicts (see Table 1) were analyzed in a 2 (Conflict-Type) × 2 (Age-Group: older vs. younger) ANOVA. This revealed a non-significant main effect of Conflict-Type, F(1,89) = 1.13; p = .292, ηp2 = 0.012; BF10 = 0.481/BF01 = 2.078, as well as a significant, F(1,89) = 11.94; p = .001, ηp2 = .118, although anecdotal, BF10 = 1.529/BF01 = 0.654, main effect of Age-Group. It also revealed a marginally significant,[F(1,89) = 3.38; p = .069, ηp2 = 0.037, although anecdotal (BF10 = 2.330/BF01 = 0.429), Conflict-Type × Age-Group interaction. Even though evidence for this interaction was only anecdotal, we decomposed it further by testing the simple main effect of Age-Group at each level of Conflict-Type. Contrary to our expectations, this effect was significant for semantic conflict, F(1,89) = 9.288; p = .003, ηp2 = 0.094; BF10 = 11.683/BF01 = 0.086, with older adults presenting a much greater magnitude of semantic conflict than young adults. Additionally, and also contrary to our expectations, the simple main effect of Age-Group remained non-significant for response conflict, F(1,89) = 0.010; p = .922, ηp2 = 0.000; with evidence for the null effect of aging, BF10=0.222/BF01=4.512 (see Table 1).Footnote 6 Thus, the present study clearly extends the dissociative nature of the age effect to the 2:1 Stroop paradigm. However, completely unlike past studies using the semantic Stroop paradigm (Augustinova et al., 2018; Li & Bosman, 1996), it points to a greater magnitude of semantic conflict in older adults.

General discussion and conclusion

Given that in all past Stroop studies, semantic conflict was potentially confounded with either response conflict (e.g., when semantic-associative items [SKYblue] are used to induce semantic conflict) or with facilitation (when color-congruent items [BLUEblue] are used as a baseline to derive a magnitude for semantic conflict), its contribution to the Stroop interference effect has so far been uncertain. Using the 2:1 Stroop paradigm (De Houwer, 2003) with a color-neutral baseline, the present study clearly demonstrated that the contribution of semantic conflict is independent of both response conflict and Stroop facilitation. Therefore, the present study provides an unambiguous empirical basis for the composite nature of Stroop interference – as originally claimed by De Houwer (2003) based on the multi-stage models of Stroop interference (Zhang et al., 1999; Zhang & Kornblum, 1998).Footnote 7

Given that no such basis was available in past studies of age-related differences in the Stroop task (Augustinova et al., 2018; Li & Bosman, 1996), the present study also investigated the extent to which healthy aging influences these independent constituents of Stroop interference. The reported results suggest a dissociative pattern opposite to that reported in past studies: whilst response conflict was not affected by healthy aging, greater semantic conflict was found in older adults.

It remains possible that this reverse pattern is due to the fact that the present study mobilized different processes from those at work in past studies. Indeed, both Augustinova et al. (2018) and Li and Bosman (1996) employed a vocal response, which is known to induce greater phonological processing of the irrelevant word than a manual one (Kinoshita et al., 2017; Parris et al. 2019). Therefore, the pattern that these studies report could be due to less efficient control of this phonological processing in older adults. Such an effect would not have been observed in the present study due to the use of manual responses. Despite this, the issue surrounding the use of semantic-associative Stroop trials remains.

If, according to single-stage models of the Stroop task, the semantic associative Stroop trials used in these previous studies induce only indirect response conflict (e.g., Roelofs, 2003), then the only conclusion that can be drawn from the studies by Augustinova et al. (2018) and Li and Bosman (1996) is that overall response conflict is greater in older adults but its indirect portion is unaffected by healthy aging. However, since the present study unequivocally documented the existence of semantic conflict for the first time, it now seems reasonable to assume that both semantic-associative and same-response trials actually induce semantic conflict (but in unknown quantities for the former).

If we thus assume that the present and past studies mobilized the same processes (i.e., induced comparable levels of semantic conflict; Augustinova & Ferrand, 2014), the absence of an age effect on semantic associative interference could be potentially linked to the method used to control for age-related general slowing. Indeed, proportional transformation – applied first by Li and Bosman (1996) and later by Augustinova et al. (2018) – might actually (and counterintuitively) create an advantage for older adults in the presence of slower RTs (Hedge et al., 2018). This spurious advantage is no longer present when general slowing is controlled by means of a more suitable transformation (i.e., z-scores; Faust et al., 1999; Hedge et al., 2018) applied in the present study. To address this possibility directly, the data from Augustinova et al. (2018) were z-scored and re-analyzed in the same way as the 2:1 data reported above (see OSM for a full description and results of these analyses, pp.4-9). In line with Hedge et al.’s reasoning about proportional transformation, not only did the originally significant Conflict-Type × Aging interaction become non-significant, but the additional Bayesian analyses actually provided moderate evidence against this interaction. This suggests that the magnitudes of both semantic and response conflict in Augustinova et al.’s z-scored data tended to be greater in older adults than in their younger counterparts (see Table S3, OSM).

While the results regarding semantic conflict are in line with those reported above, discrepancies remain regarding the effect of healthy aging on response conflict. Although these differences could be accounted for by the response mode difference highlighted above, we also conducted cross-study analyses on the merged data sets (see OSM for a full description and results of the analyses, pp.9-11). Again, Bayesian analyses provided moderate evidence against a Conflict-Type × Aging interaction, suggesting that across two studies, healthy aging affected both the semantic and the response conflicts. It should, however, be noted that a Bayesian independent-samples t-test conducted for exploratory purposes actually revealed anecdotal evidence against the age effect on response conflict (see Table S4, OSM), a finding that appears consistent with the results obtained using the 2:1 paradigm reported above. Alternatively, it also remains plausible that response conflict is unaffected in the 2:1 Stroop paradigm, not because of its specific nature but simply because its magnitude (i.e., smaller in the manual task than in the vocal tasks used in past studies) is too small to be affected.

Although not our favored a priori hypothesis, the fact that the present study could have mobilized different processes compared to past studies emphasizes the importance of choosing the correct critical and control trials for measuring the variable under test. Of course, no measure is perfect and we must therefore consider a limitation of the 2:1 paradigm that could provide an alternative explanation for the apparently greater semantic conflict in older adults. Because both dimensions of same-response trials provide evidence towards the same response, they cannot (unlike semantic associates) generate response conflict. However, they can still produce response facilitation. This opens up the possibility that the larger difference between same-response and color-neutral trials observed in older adults in the present study could actually be driven by greater response facilitation in younger adults, and not greater semantic conflict in older adults. Nevertheless, while this account would directly predict greater Stroop facilitation (which involves both response and semantic facilitation) in younger adults, the present study actually reports the opposite – rendering this latter account unlikely.

To sum up, the present study has provided the clearest evidence yet of a contribution of semantic conflict to overall Stroop interference (see also Parris et al., 2021, for a thorough discussion of this issue). Moreover, this has enabled us to investigate the effect of healthy aging on the independent constituents of the composite Stroop interference effect. In contrast to previous studies, the present study showed that semantic conflict is affected by healthy aging. This finding prompted a re-analysis of the data from a previous study (Augustinova et al., 2018) using a more suitable method of controlling for the effect of general slowing in healthy aging (the same method as that employed in the present study). This re-analysis revealed that, as indicated by the present study, there is evidence of modified semantic conflict in healthy aging. Whilst the two studies diverge on the issue of the effect of aging on response conflict, the difference might be explained by the fact that a vocal response mode was used in both Augustinova et al. (2018) and Li and Bosman (1996), giving rise to the possibility that the control of phonological processing is reduced in healthy aging. Although both studies converged on the issue of semantic conflict, we would still recommend that future studies use the 2:1 paradigm rather than the semantic-associates method given that only the results from the present study show an unambiguous effect of aging on semantic conflict. However, to address the still-open issue of the characteristics shared (or otherwise) between same-response and semantically associated trials, future studies could combine the two (Schmidt & Cheesman, 2005) and measure the interference they generate against a color-neutral word baseline with more response-sensitive measures (e.g., EMG, mouse-tracking). Given that these latter measures are also more sensitive to the actual time course of interference, they are particularly suitable for further addressing the age-related differences in the Stroop task. Indeed, the issue of the extent to which a greater magnitude of a given conflict is due specifically to its greater activation (i.e., lower attentional selectivity, also implying an age-related deficit in proactive control) or to its less efficient resolution (i.e., less efficient inhibitory control, also implying an age-related deficit in reactive control) as yet remains unresolved (see, e.g., Coderre et al., 2011, for this type of distinction). In the light of past research demonstrating an age-related deficit in proactive (e.g., Braver et al., 2001) as opposed to reactive (e.g., Bugg, 2014) cognitive control, the first possibility seems more plausible than the second. This reasoning is reinforced by the fact that healthy aging might actually amplify task conflict (i.e., a more general conflict that – for all readable Stroop items including color-neutral ones – derives from the simultaneous preparation of two task sets: word-reading vs. color-naming, e.g., Goldfarb & Henik, 2007; Kalanthroff et al., 2018). Although the significant age effect on z-scored color-neutral stimuli observed in the present study is consistent with this idea, future studies – which should include more appropriate measures of task conflict – will need to address these possibilities directly.

The significant magnitudes of both semantic and response conflict observed in both younger and older adults clearly suggest that the historically favored single-stage response accounts of the Stroop interference effect are likely to be obsolete (e.g., Augustinova et al., 2018; De Houwer, 2003; Risko et al., 2006). Also, and importantly, so too are the customary implementations of Stroop interference/effect (BLUEgreenDEALgreen/BLUEblue) that are rooted in these unitary models and from which the involvement of response and semantic processes and their modulation are merely inferred. Thus, in conclusion, the present study strongly encourages both the development of new integrative models of the Stroop interference effect (i.e., models that make room for relatively new types of conflict, e.g., Parris et al., 2021, for discussion) and further empirical work addressing the processes underlying age-related differences in the Stroop task based on such integrative models.

Open Practices Statement

The data are available on the Open Science Framework at https://osf.io/t6cxr/