In Colore Veritas? Color effects on the speed and accuracy of true/false responses

Nadarevic, Lena; Symeonidou, Nikoletta; Kias, Alina

doi:10.1007/s00426-021-01528-z

In Colore Veritas? Color effects on the speed and accuracy of true/false responses

Original Article
Open access
Published: 29 May 2021

Volume 86, pages 919–936, (2022)
Cite this article

Download PDF

You have full access to this open access article

Psychological Research Aims and scope Submit manuscript

In Colore Veritas? Color effects on the speed and accuracy of true/false responses

Download PDF

1962 Accesses
6 Altmetric
Explore all metrics

Abstract

In addition to their perceptual or aesthetic function, colors often carry conceptual meaning. In quizzes, for instance, true and false answers are typically marked in green and red. In three experiments, we used a Stroop task to investigate automatic green-true associations and red-false associations, respectively. In Experiments 1 and 2, stimuli were true statements (e.g., “tables are furniture”) and false statements (e.g., “bananas are buildings”) that were displayed in different combination of green, red, and gray depending on the experimental condition. In Experiment 3, we used true-related and false-related words shown in green, red, or gray. Participants had to indicate the validity (or semantic meaning) of each statement (or word) as fast and as accurately as possible. We expected that participants would perform best when they had to categorize green stimuli as “true” and red stimuli as “false”. The prediction was only confirmed when green and red stimuli were presented within the same context (i.e., same experimental condition). This finding supports the dimension-specificity hypothesis which states that cross-modal associations (here: associations between color and validity) depend on the context (here: the color-context). Moreover, the observed color-validity effects were stronger when participants had to categorize single words instead of sentences and when they had to provide speeded responses. Taken together, these results suggest that controlled processing counteracts the influence of automatic color associations on true/false responses.

Color adjectives, standards, and thresholds: an experimental investigation

Article Open access 10 March 2017

Is color an integral part of a rich mental simulation?

Article Open access 24 April 2017

Ecumenicism, Comparability, and Color, or: How to Have Your Cake and Eat It, Too

Article 01 May 2015

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Colors are omnipresent in everyday life and shape the perception of our environment. However, colors do not only have a perceptual or aesthetic function; they also communicate information and can thus influence emotions, cognitions, and behaviors (Elliot and Maier 2007; Elliot et al. 2007). According to Elliot et al.’s color-in-context theory, color associations are evolutionary prepared (e.g., a brown fruit means “rotten”) or learned (e.g., a red traffic light means “stop”). Moreover, the theory proposes that colors can have different meanings depending on the context. The color red, for example, is associated with attractiveness in the context of sexual relations (e.g., Elliot and Niesta 2008; Elliot et al. 2010), but associated with failure in achievement contexts (e.g., Elliot et al. 2007; Maier et al. 2008). Elliot et al. (2007) even found that red induces an avoidance tendency in the latter context. Participants who were exposed to the color red on the front page of an IQ test chose easier tasks in the test than those who were exposed to other colors on the front page (e.g., gray or green).

Interestingly, although green is the complement to red on the perceptual level (Choudhury 2015), this does not necessarily hold for the conceptual level. Whereas the majority of studies clearly support a red–failure association in achievement contexts (Elliot and Maier 2007; Maier et al. 2008; but see Mehta and Zhu 2009; Moller et al. 2009), findings are mixed regarding an association between green and success in such contexts (Elliot and Maier 2007; Elliot et al. 2007; Moller et al. 2009). Similarly, although there is strong empirical evidence that red signals danger, the results are less clear with regard to a green–safety association (Pravossoudovitch et al. 2014). Besides such studies that have examined color associations with the concepts failure/success and danger/safety, there is little research on the meaning of the complementary colors red and green in other semantic contexts. Thus, possibly, there might be other contexts in which red and green clearly carry opposite meanings. For example, in a study on memory for truth and falsity, Pantazi et al. (2018) claimed that ”Green and red are generally associated with concepts of truthfulness versus falsity […]” (p. 179). In the present work, we aimed at testing these proposed color–validity associations.

To our knowledge, this is the first study to empirically investigate the proposed associations between the color green and the semantic attribute true as well as between the color red and the semantic attribute false. This is surprising, as there are several real-world examples that suggest such color–validity associations. At school, teachers mark false answers in red, in soccer the red card signals false behavior, and in quiz shows the true (vs. false) answer is highlighted in green (vs. red). Moreover, when entering the words “true” and “false” into a Google picture search, a large number of images appear that display the word true in green and the word false in red or that show a green tick mark and a red cross mark. Taken together, these examples speak in favor of green–true associations and red–false associations, respectively. However, it is unclear whether these associations are automatic in nature. By the term automatic we mean that these associations are triggered unintentionally and, once activated, are difficult to suppress (for various features of automaticity, see Moors 2016). In three experiments, we used a Stroop-like paradigm to test whether we would find evidence for automatic green–true and red–false associations.

In the classical Stroop task (Stroop 1935), participants are presented with individual color-words (e.g., green, red, blue, yellow) that either appear in the color they denote (e.g., the word green displayed in green) or in a different color (e.g., the word green displayed in blue). Participants have to indicate the color in which the word appears and to do so as fast and accurately as possible. Because reading is typically much more automatized than color naming, word reading tends to interfere with color-naming whenever word meaning and word color are incongruent, thus resulting in slower response times (RTs) and more errors. In contrast, congruency between word meaning and color can facilitate responding, thus leading to faster RTs and less errors. In a similar vein, a Stroop-like paradigm can be used to measure automatic color-meaning associations (e.g., Goodhew and Kidd 2020; Hong et al. 2020; Lorentz et al. 2016; Moller et al. 2009; Pravossoudovitch et al. 2014; Sherman and Clore 2009). For instance, Pravossoudovitch et al. (2014) asked participants to categorize words as danger words (e.g., emergency, threat) or safety words (e.g., shelter, home). Importantly, the words were displayed in red, green, and gray. The authors found a significant word type by color interaction on participants’ RTs. For the danger words, participants responded fastest when the words appeared in red, whereas for the safety words they responded fastest when they appeared in green. Interestingly, the red-effect for the danger words was much larger than the green-effect for the safety words, suggesting stronger red–danger associations than green–safety associations.

Following Pravossoudovitch et al. (2014), we used a Stroop task to investigate whether people associate red with the attribute false and green with the attribute true, respectively. In three experiments, participants had to provide true/false responses to stimuli presented in green, red, and gray. The chromatic colors only differed in hue but not chroma. Moreover, all colors were matched on lightness. Keeping lightness constant is important because differences in lightness lead to differences in readability against a given background. This in turn affects RTs and may even result in biased true/false judgments (Reber and Schwarz 1999). Moreover, there is empirical evidence that people associate darkness with negativity and lightness with positivity (Lakens et al. 2012; Meier et al. 2004). These lightness–valence associations would potentially confound the results for the color–validity associations, if the colors differed in lightness.

In a series of Stroop tasks that investigated lightness–valence associations, Meier et al. (2004) varied whether the tasks emphasized accuracy (e.g., by means of accuracy feedback) or speeded responses (e.g., by means of instructions and RT feedback or by means of a response deadline). When accuracy was emphasized, lightness–valence associations showed up in participants’ RTs, whereas when speed was emphasized, the associations showed up in participants’ response accuracies. As we aimed to test whether the predicted color–validity associations were reflected in participants’ RTs and accuracies alike, we implemented a similar procedure. The Stroop task in our experiments consisted of two test blocks. In block 1, participants were instructed to focus on speed and accuracy alike. However, because participants could take up to 5 s to respond, block 1 settings did not prompt participants to respond extremely fast. In contrast, this was the case in block 2, which involved a considerably shorter response deadline. In order to account for individual differences in response speed, this deadline varied across participants depending on their RTs in Stroop block 1.

In addition to manipulating stimulus color, stimulus validity, and response deadlines, we also implemented different color contexts within and between our experiments. This manipulation served to test whether green–true and red–false associations (if present), are inherently stable or depend on the color context. Lakens et al. (2012), for example, could show that black is associated with negativity regardless of the context, whereas white is only associated with positivity in the context of black. The authors found empirical evidence for both types of associations when black and white stimuli appeared within the same experimental task (i.e., were manipulated within participants), but not when they appeared in different contexts (i.e., were manipulated between participants). Similarly, when investigating various color associations by means of the implicit association test (IAT, Greenwald et al. 1998), Schietecat et al. (2018b) observed red–negative associations in the context of green, but not in the context of blue. In the latter context, red was associated not only with aggression, but also with enthusiasm, depending on the targets of the IAT. The authors interpreted this finding as evidence for their dimension-specificity hypothesis (Schietecat et al. 2018a, 2018b). This hypothesis states that cross-modal associations (e.g., between color and meaning) depend on the dimension of meaning that is most salient (e.g., evaluation, activity, or potency) in a given context and on the relative conceptual distance of opposing target concepts in this context. Importantly, the dimension-specificity hypothesis predicts that cross-modal associations should only become activated if both target dimensions (e.g., color and meaning) are characterized by clear plus and minus polarities.

In the following experiments, the task of the participants was to categorize stimuli presented in different colors (green, red, or gray) as “true” or “false”. On the conceptual level, true and false form polar opposites on the evaluation dimension. Because green and red should also form a plus and a minus pole on this dimension (see Schietecat et al. 2018b), we expected to find evidence for the predicted green–true and red–false associations if both colors appear within the same context. According to the dimension-specificity hypothesis, however, no color–validity associations should emerge in a color context lacking clear polar opposites. For example, as gray appears to be a neutral color when combined with red and green (e.g., Pravossoudovitch et al. 2014), color–validity associations should not be evident when the color context consists of green and gray stimuli or red and gray stimuli, respectively. In order to test the context-dependency of color–validity associations, Experiment 1 implemented three different color conditions between participants (green–red, green–gray, red–gray). In contrast, Experiments 2 and 3 manipulated all colors (i.e., green, red, and gray) within participants, thus enhancing the complexity of the color context. For all experiments, we will describe how we determined our sample sizes, and we will report all data exclusions (if any), all manipulations, and all measures. The materials and the data of all experiments are publicly available online (Nadarevic et al. 2020).

Experiment 1

In order to investigate the hypothesized associations between the colors green and red with the attributes true and false, we conducted a Stroop task in which participants had to indicate the validity of short statements. Depending on the experimental condition, the statements appeared either in green and gray (green–gray condition), red and gray (red–gray condition), or green and red (green–red condition). These color-context conditions were manipulated between participants. We predicted that if the proposed green–true and red–false associations are context-independent, a color by validity interaction should emerge in each of the three conditions. In particular, Stroop performance should be higher when true statements are displayed in green than when they are displayed in red (green–red condition) or gray (green–gray condition). Likewise, Stroop performance should be higher when false statements are displayed in red than when they are displayed in green (green–red condition) or gray (red–gray condition). However, if the assumed color–validity associations require reciprocal activation by the opposite color (as predicted by the dimension-specificity hypothesis), the expected Stroop effects should only appear in the green–red condition. Importantly, Stroop performance was measured by the speed and accuracy of participants’ true/false responses. We expected that in a first Stroop block, which did not require particularly fast responses, the effects should appear primarily in participants’ RTs. In contrast, under speeded conditions, which were implemented in a second Stroop block, the effects should primarily appear in the accuracy data.

Methods

Power analysis

We calculated the required sample size for the expected interaction of the within-subject factors color and validity by means of G*Power (Faul et al. 2007). Although G*Power does not have a built-in module to directly calculate power for interactions between repeated-measures factors, this can be accomplished with the program’s Generic F-test module by means of an iterative procedure.^{Footnote 1} The required input parameters for this procedure are α, the degrees of freedom (df) for the F-test, and the non-centrality parameter λ. For within-subject effects, λ is a function of the sample size n, the number of repeated measures m, the effect size f, and the repeated-measures correlation ρ (see Faul et al. 2007). Because λ and the df for the error term depend on the sample size, the power analysis requires to increase n in a step-wise fashion until the target power is reached. For Experiment 1, we assumed a medium-sized color by validity interaction effect of f = 0.25 and a repeated-measure correlation of ρ = 0.50. The number of repeated measures for the tested 2 \(\times\) 2 interaction was m = 4. Moreover, we set the type-I error probability to α = 0.05 and our target power to 1-β = 0.95. The power analysis indicated that this target power would be reached with a minimum sample size of n = 28 per condition (λ = 14, df_effect = 1, df_error = 27), i.e., a sample size of N = 84 in total.

Participants

Eighty-three participants were recruited at the University of Mannheim (61 females, 22 males). Participants had a mean age of M = 22.9 (SD = 5.9) years. Three participants were non-native German speakers, two of whom reported to have very good German skills, and the third reported to have only intermediate German language skills. Because reading speed was important in the experiment, we decided to exclude the latter participant from all analyses. Moreover, we excluded five participants based on their poor performance in the Ishihara’s color vision test. Thus, the final sample comprised 77 participants (green–gray condition: n₁ = 24, red–gray condition: n₂ = 26, green–red condition: n₃ = 27).^{Footnote 2}

Materials

Sentences containing exemplar-category assignments of the form “X are Y” (e.g., “bananas are fruits”, “towers are buildings”) served as stimulus material. We started by creating 20 true target statements based on 20 exemplars and 10 categories. We then created an equal number of false statements by exchanging the categories in pairs between the statements (e.g., “bananas are buildings”, “towers are fruits”). In the same way, we also created 12 statements for practice trials (i.e., 6 true and 6 false ones). The statements were always phrased as affirmatives to keep the material consistent. This is important because research suggests that comprehending affirmative sentences and negated sentences involves different cognitive processes (e.g., Beltrán et al. 2019; Tettamanti et al. 2008). For a complete list of statements, see our materials on the Open Science Framework (OSF; https://osf.io/b8wux/). Colors were selected to differ in hue, but not in lightness or chroma (green: LCh[55.187/82.195/136.016], red: LCh[55.187/82.195/40], gray: LCh[55.187/–/–]).

Design

Participants were randomly assigned to one of three experimental groups, manipulating the set of colors in which statements could appear to form different color-contexts between participants. All participants accomplished two blocks of the Stroop task that were characterized by different response deadlines. In each block, participants categorized true and false statements that appeared in different colors. Thus, the design was a 3 (color context: green–gray vs. red–gray vs. green–red) \(\times\) 2 (block: 1 vs. 2) \(\times\) 2 (validity: true vs. false) \(\times\) 2 (color: green vs. gray, red vs. gray, green vs. red, depending on the color context) design. All factors except for the color context varied within subjects. Participants’ RTs and their accuracy in the categorization task served as dependent variables.

Procedure

The experiment consisted of two blocks. In each block, true and false statements were presented on a black computer screen in one of two possible colors, which depended on the experimental group. Participants’ task was to judge a statement’s validity as fast and as accurately as possible by pressing the d or k key. The mapping of responses (true vs. false) to these keys was counterbalanced across participants. For each trial, a fixation cross appeared for 500 ms in the center of the screen followed by the statement. The statement disappeared as soon as participants provided their response or after 5000 ms. In the latter case the message “too slow” was displayed for 1000 ms. The intertrial interval was also 1000 ms. Participants familiarized themselves with the task in a practice phase consisting of 24 trials (i.e., twelve practice statements presented in each of the two colors of the respective color condition). In the following test block 1, 40 test statements were presented in each of the two colors, resulting in 80 trials in this block. The statements appeared in random order and it was randomly determined in which color a statement appeared first. Upon completion of test block 1, participants had a 30-s break, which was followed by a second block of the Stroop task. Block 2 was identical to block 1, except that it involved an adaptive response deadline to prompt speeded responses. For each participant, the deadline was computed as the 60th percentile of their RT distribution for correct responses in test block 1 (see Rinkenauer et al. 2004, for a similar procedure). Participants had another 24 practice trials to get used to the new deadline before the actual test block started, which again comprised 80 trials in total.

After another 30-s break, participants completed a perceptual fluency test. This test served as a manipulation check to make sure that all colors were equally discriminable on the black screen. Participants’ task was to detect the target letter O within a series of the letter X. Each string consisted of exactly five characters, irrespective of whether the target was present (e.g., XXXOX) or absent (XXXXX). If the target was present, it was displayed as 2nd, 3rd, or 4th character within the string. The overall procedure of the fluency test was similar to Stroop block 1, except that the statements were replaced by strings and participants provided yes/no instead of true/false responses, again by pressing the d or k key. “Yes” and “true” as well as “no” and “false” always shared the same response key. The fluency test comprised 36 trials in total. Half of the trials consisted of target-present strings and the other half of target-absent strings. The strings appeared equally often in each of the two colors of the respective color-context condition and were presented in random order.

Afterwards, participants saw five plates of Ishihara’s Test for Color Deficiency (Ishihara 2003) that were displayed one after the other on the computer screen. For each color plate, participants’ task was to type in the number displayed. Finally, participants were asked to write down their explicit color associations for truth and falsity, if they had any.

Results

Ishihara color vision test

On average, participants identified M = 4.2 (SD = 1.1) of the five Ishihara plates correctly.^{Footnote 3} Participants with more than two incorrect responses (n = 5) were excluded from analyses (see participant section).

Perceptual fluency test

We analyzed participants’ mean RTs for correct responses of the perceptual fluency test (96% of the responses) by means of a 2 (target: present vs. absent) by 2 (color: green vs. gray, red vs. gray, green vs. red) ANOVA in each color condition. Importantly, string color did not influence participants’ RTs, Fs < 1, indicating that the selected colors did not differ in perceptual fluency. RTs were also unaffected by target presence, Fs ≤ 1.82, ps ≥ 0.190, \({\eta }_{p}^{2}\) s ≤ 0.07, 90% CIs [0.00, 0.28], [0.00, 0.06], and [0.00, 0.20].^{Footnote 4} Moreover, there was no significant color by target interaction in any condition, Fs ≤ 1.62, ps ≥ 0.214, \({\eta }_{p}^{2}\) s ≤ 0.06, CIs [0.00, 0.20], [0.00, 0.15], and [0.00, 0.25].

Stroop task

Response times

Before analyzing participants’ RTs in the Stroop task, we excluded all incorrect responses, which were 2.8% of the responses in test block 1 and 15.5% of the responses in test block 2. We then excluded the smallest and the largest RT of each participant in each block to reduce the impact of RT-outliers. According to a simulation study by Bush et al. (1993), this trimming procedure is superior to other outlier exclusion procedures. Due to the different response deadlines in test block 1 (fixed deadline: 5000 ms) and test block 2 (adaptive deadline: M = 1048 ms, SD = 237 ms) mean RTs and the variability of RTs differed considerably between blocks. To increase the comparability of RT data between blocks, we z-standardized RTs in each block as recommended by Bush et al. (1993). That is, RTs were centered around the mean of each participant per block and divided by the participant’s standard deviation in the respective block. We then analyzed these z-standardized RTs by means of a 2 \(\times\) 2 \(\times\) 2 repeated-measures ANOVA with the factors block, color, and validity. We ran the analysis separately for each color-context condition because the different levels of the color factor within each condition (green-gray, red-gray, and green–red) did not allow to include condition as a between-subject factor. The descriptive results are illustrated in Fig. 1. Mean unstandardized RTs per condition are listed in Table 1.

Table 1 Mean (SD) unstandardized RTs and error rates for each condition of Experiment 1

Full size table

In all color-context conditions RTs were faster for true statements than for false statements, Fs ≥ 35.83, ps < 0.001, \({\eta }_{p}^{2}\) s ≥ 0.59, CIs [0.61, 0.85], [0.36, 0.72], and [0.43, 0.75]. Irrespective of the color context, this validity effect was qualified by test block, Fs ≥ 6.04, ps < 0.021, \({\eta }_{p}^{2}\) s ≥ 0.19, CIs [0.31, 0.70], [0.17, 0.60], and [0.02, 0.40]. Simple main effect analyses showed that the validity effect was stronger in test block 2, Fs ≥ 35.66, ps < 0.001, \({\eta }_{p}^{2}\) s ≥ 0.59, CIs [0.59, 0.84], [0.36, 0.72], and [0.40, 0.74], than in test block 1, Fs ≥ 11.22, ps ≤ 0.002, \({\eta }_{p}^{2}\) s ≥ 0.30, CIs [0.23, 0.66], [0.16, 0.59], and [0.08, 0.50]. There was no color main effect on participants’ RTs, Fs < 2.50, ps ≥ 0.126, \({\eta }_{p}^{2}\) s ≤ 0.09, CIs [0.00, 0.28], [0.00, 0.08], and [0.00, 0.29]. There was also no color by validity interaction effect in the green-gray condition or the red-gray condition, Fs < 1. In the green–red condition, in contrast, the predicted color by validity interaction emerged, F(1, 26) = 9.36, p = 0.005, \({\eta }_{p}^{2}\) = 0.26, CI [0.06, 0.47]. Participants in this condition responded significantly faster to true statements displayed in green compared to red (see Fig. 1), F(1, 26) = 7.79, p = 0.010, \({\eta }_{p}^{2}\) = 0.23, CI [0.04, 0.44]. However, they did not show any RT differences for false statements in green and red, F(1, 26) = 2.04, p = 0.165, \({\eta }_{p}^{2}\) = 0.07, CI [0.00, 0.27].

Accuracy

In order to examine the effect of the experimental factors block, color, and validity on accuracy, we ran a 2 \(\times\) 2 \(\times\) 2 repeated-measures ANOVA with mean error rate as the dependent variable. All types of errors (i.e., incorrect responses as well as omission errors) were considered for this analysis. Similar to the RT data, we conducted this ANOVA separately for each color-context condition. The descriptive results are illustrated in Fig. 2.

Not surprisingly, irrespective of the color context, error rates were considerably higher in test block 2 than in test block 1, Fs ≥ 57.32, ps < 0.001, \({\eta }_{p}^{2}\) s ≥ 0.69, CIs [0.66, 0.87], [0.65, 0.86], and [0.50, 0.79]. There was also a block by validity interaction in the green–gray condition, F(1, 23) = 7.26, p = 0.013, \({\eta }_{p}^{2}\) = 0.24, CI [0.03, 0.46]. The same interaction was evident in the red–gray condition, F(1, 25) = 4.13, p = 0.053, \({\eta }_{p}^{2}\) = 0.14, CI [0.00, 0.36], and the green–red condition, F(1, 26) = 4.23, p = 0.050, \({\eta }_{p}^{2}\) = 0.14, CI [0.00, 0.35], but did not reach statistical significance in the two latter conditions. Simple main effect analyses showed that error rates were slightly higher for true statements than for false statements in test block 1, Fs ≥ 6.56, ps ≤ 0.017, \({\eta }_{p}^{2}\) s ≥ 0.21, CIs [0.03, 0.46], [0.02, 0.42], and [0.08, 0.50], but not in test block 2, Fs ≤ 2.63, ps ≥ 0.119, \({\eta }_{p}^{2}\) s ≤ 0.10, CIs [0.00, 0.32], [0.00, 0.18], and [0.00, 0.17]. Similar to the RT data, error rates were unaffected by the color in which a statement appeared, Fs ≤ 1.25, ps ≥ 0.274, \({\eta }_{p}^{2}\) s ≤ 0.05, CIs [0.00, 0.16], [0.00, 0.14], and [0.00, 0.23]. There was also no color by validity interaction effect in any condition, Fs ≤ 2.93, ps ≥ 0.099, \({\eta }_{p}^{2}\) s ≤ 0.10, CIs [0.00, 0.12], [0.00, 0.31], and [0.00, 0.25]. In the green–red condition, however, there was a significant three-way interaction of block, color, and validity, F(1, 26) = 5.53, p = 0.027, \({\eta }_{p}^{2}\) = 0.18, CI [0.01, 0.39], which did not appear in the other conditions, Fs < 1. Separate analyses for each test block in the green–red condition revealed a significant color by validity interaction in test block 2, F(1, 26) = 4.36, p = 0.047, \({\eta }_{p}^{2}\) = 0.14, CI [0.00, 0.35], but not in block 1, F < 1. The interaction in block 2 emerged because participants made less errors when categorizing true statements displayed in green compared to red (see Fig. 2, for exact values of mean percent errors per condition see Table 1), F(1, 26) = 6.70, p = 0.016, \({\eta }_{p}^{2}\) = 0.20, CI [0.02, 0.42], but showed no differences for false statements in green and red, F < 1.

Explicit color–validity associations

Two participants failed to fill out a questionnaire about explicit color–validity associations. Of the remaining 75 participants, 60% indicated that they associated green with truth and even more (76%) indicated that they associated red with falsity, when asked about their color–validity associations in the questionnaire. A complete list of participants’ explicit color associations is displayed in Appendix A.

Discussion

The results of Experiment 1 suggest that automatic green–true and red–false associations are highly context dependent. We only found empirical evidence for the hypothesized color–validity interaction in the green–red condition. That is, statement color only influenced participants’ RTs in the Stroop task when green and red statements were presented in the same experimental context. For the accuracy data, this interaction additionally depended on test block. Specifically, a color by validity interaction on participants’ error rates in the green–red condition was only evident in block 2, in which participants had to respond very fast. Hence, under speeded conditions, the Stroop effect was evident in the RT data as well as the accuracy data. The observed Stroop effect was characterized by faster responses and less errors for true statements in green compared to red. In contrast, no Stroop effect was evident for the false statements. At first glance, this pattern speaks in favor of a green–true association, but not a red–false association. More general color effects irrespective of validity (e.g., simple green-go and red-stop associations), on the other hand, can be ruled out as an explanation. If participants had associated green with “go” and red with “stop” in the context of the experiment, we should have observed a color main effect on RTs, both in the Stroop task and in the perceptual fluency task. However, neither was the case. In contrast, other interpretations of the data seem plausible, which we outline below.

Overall, participants responded significantly faster to true than to false statements. Although we had not predicted this validity effect, it converges with findings of classical semantic-memory studies (e.g., Collins and Quillian 1969). Furthermore, there is reason to believe that statement verification and falsification rely on qualitatively different processes. For example, Marques et al. (2009) observed different patterns of brain activation during sentence verification and falsification. Verification corresponded with activation in brain regions presumed to be involved in search processes and matching processes with stored information, whereas falsification corresponded with activation in brain regions that are engaged in reasoning processes. Possibly, these more elaborate reasoning processes not only accounted for the slower responses to false statements, but also counteracted the effect of automatic color–validity associations for such statements. This might explain why we only found a color effect for true statements, but not for false statements in the green–red condition. What is more, prior research suggests that the contribution of interference effects to the Stroop effect is typically much stronger than the contribution of facilitation effects (Chen and Johnson 1991). Hence, possibly, the Stroop effect for the true statements reflects an interference effect of the color red rather than a facilitation effect of the color green. Because the two effects can only be separated by means of a reference color, we added gray as the reference color in the following experiments.

Experiment 2

Experiment 2 was similar to Experiment 1, but this time all participants were presented with statements displayed in green, red, and gray within the same context (i.e., within the same experimental condition). We implemented this change for the following reasons. First, we wanted to test whether the findings of the green–red condition would also replicate in the presence of gray. Considering the assumptions of the dimension-specificity hypothesis, this is by no means trivial. For example, Schietecat et al. (2018b) reasoned that a task with three instead of two colors “reduces the presence of a clear bipolar opposition, and, therefore, the strength of crossmodal associations in such a task might be much smaller, or maybe even zero” (p. 8). Second, in case of a successful replication, gray would serve as a reference condition that would help us to assess the relative contributions of the presumed green-true and red–false associations to the Stroop effect.