Working memory is the ability to temporarily maintain and manipulate acquired information (e.g., visual input) independently of continuous sensory stimulation and supports ongoing behavior by providing interface between perception and action (Baddeley, 2003; Myers, Stokes, & Nobre, 2017). Working memory plays an important role when humans search for an object. They store a mental representation (i.e., a “search template” or “attentional template”) of the object in visual working memory (VWM). Some theories have proposed that this VWM representation guides attention toward memory-matching stimuli in a top-down manner (Bundesen, Habekost, & Kyllingsbaek, 2005; Desimone & Duncan, 1995). Furthermore, even when the content stored in VWM is irrelevant to the current visual search task, participants direct attention toward a stimulus matching the content of VWM (Dalvit & Eimer, 2011; Olivers, Meijer, & Theeuwes, 2006; Olivers, Peters, Houtkamp, & Roelfsema, 2011; Soto, Heinke, Humphreys, & Blanco, 2005; Soto, Hodsoll, Rotshtein, & Humphreys, 2008; Soto & Humphreys, 2008). Typically, studies of this phenomenon require participants to remember a certain external item of visual input and perform a visual-search task. The item—referred to as an “accessory memory item”—is usually irrelevant to the search task yet relevant to a subsequent memory task. Several studies have demonstrated that participants can rapidly detect the target in visual-search tasks when it matches the accessory memory item. Moreover, automatic attentional guidance toward VWM-matching information is observed when mental representation of the target is deprioritized during visual-search tasks (Gunseli, Olivers, & Meeter, 2016; Olivers, 2009; Olivers et al., 2011), as when the target remains unchanged and participants do not need to actively remember it (Gunseli et al., 2016). When the representation of the item held in VWM is more active or dominant than that of any other item, participants direct attention toward the VWM-matching item regardless of whether the information is task relevant or task irrelevant.

Attentional guidance toward VWM-matching information can be explained by the prioritization of neural representations for specific pieces of information (Desimone & Duncan, 1995). When an item is retained in VWM, the memory representation biases neural activity in regions such as the primary visual cortex, thereby promoting the perception and selection of VWM-matching items during visual-search tasks. Several studies have reported that alterations in neural activity are associated with perceptual biases toward VWM-matching information (Soto, Humphreys & Rotshtein, 2007; Soto, Llewelyn, & Silvanto, 2012; Soto et al., 2011). Such studies have indicated that increased neural activity in the early visual cortices guides attention toward VWM-matching items. In Soto et al. (2012), participants were required to hold the color of a prime stimulus in VWM, after which they were asked to detect a target over which this color had been superimposed in a visual-search task. During the search task, a transcranial magnetic stimulation (TMS) pulse was delivered to the early visual cortex. The authors reported that occipital TMS modulated the perception of and enhanced sensitivity to a VWM-matching item, suggesting that occipital TMS enhances representation of the memory item in the early visual cortex, thereby directing attention toward the VWM-matching item.

Enhanced perception is observed not only when holding external information in VWM but also when imagining a stimulus without the influence of external information. Mental imagery refers to the internal representations and accompanying experience of sensory information in the absence of a direct external stimulus (Kosslyn, 2005; Pearson, Naselaris, Holmes, & Kosslyn, 2015). Perception activates internal representations through sensory organs, whereas mental imagery reactivates and evaluates internal representations stored in long-term memory in the absence of bottom-up processes (Ganis, 2013; Kosslyn, 2005). Previous studies that investigated the attentional guidance from VWM premised that external visual input was stored in VWM, whereas mental imagery is actually the internal generation of images and is also generated by mentally transforming and manipulating the stimulus (Ganis, 2013; Kosslyn, 2005; Olivers et al., 2006; Soto et al., 2005). Mental imagery allows humans to visualize objects, scenes, or themselves without external input. Previous behavioral studies have demonstrated that visual mental imagery enhances sensory perception (Mohr, Linder, Dennis, & Sireteanu, 2011; Pearson, Clifford, & Tong, 2008; Winawer, Huk, & Boroditsky, 2010). Pearson et al. (2008) examined the effects of mental imagery on sensory perception using binocular rivalry experiments in which a different pattern was presented to each eye. While visual perception of one of the rival patterns prior to the binocular rivalry experiment frequently resulted in enhanced awareness of that pattern, such enhancements were also observed when the pattern was merely imagined rather than perceived. Such findings indicate that mental imagery can lead to the formation of a short-term sensory trace in the absence of visual input, ultimately biasing future perception.

Several studies have utilized neuroimaging methods to investigate the enhancement of specific neural representations following mental imagery. Such studies have revealed that activation of early visual areas leads to enhanced sensory perception during visual mental imagery (Cui, Jeter, Yang, Montague, & Eagleman, 2007; Ganis, 2013; Kosslyn, 2005). Moreover, during mental imagery, the patterns of activation observed in the early visual areas are similar to those observed during actual visual perception (Cichy, Heinzle, & Haynes, 2012; Lee, Kravitz, & Baker, 2012). Additional studies have reported that the more vividly one can imagine a stimulus, the more strongly the early visual cortex is activated (Cui et al., 2007). Generating visual mental imagery without external stimuli enhances the neural representation of imagined items, similar to effects observed for actual perceived items stored in VWM.

Recent studies have indicated that VWM resembles visual mental imagery (Pearson et al., 2015; Tong, 2013). Visual mental imagery and VWM share several features (Bergmann, Genç, Kohler, Singer, & Pearson, 2016; Keogh & Pearson, 2011, 2014). Keogh and Pearson (2011, 2014) investigated VWM capacity and measured the individual strength of representations produced by mental imagery with the same binocular rivalry task used by Pearson et al. (2008). The authors reported positive correlations between the strength of mental imagery and VWM capacity, observing that participants with good mental imagery ability typically use this ability during working-memory tasks. In a subsequent study, the authors reported that the capacity of visual mental imagery to simultaneously generate representations is limited, similar to that of VWM (Keogh & Pearson, 2017). Moreover, both visual mental imagery and VWM have been associated with activation of the primary visual cortex (Albers, Kok, Toni, Dijkerman, & De Lange, 2013). Considering the cognitive and neural mechanisms underlying attentional guidance based on VWM, these findings suggest that visual mental imagery also influences attentional allocation. When the representation of an imagery item is more active than that of any other item, the enhanced neural representation may direct attention toward the imagery-matching item, even when the imagery information is task irrelevant. However, to my knowledge, no studies to date have investigated attentional guidance provided by imagery-matching information.

Previous studies have used the attentional blink and training paradigm to investigate the effects of mental imagery on visual attention. In the attentional-blink paradigm, participants were required to imagine and maintain the thought of a particular object (e.g., elephant) while viewing a rapidly changing sequence of eight pictures with the goal of detecting and identifying a target digit (Pashler & Shiu, 1999). In some trials, an imagery-matching picture was presented before the target digit, whereas in other trials, the imagery-matching picture was presented after the digit. The authors found that target detection was impaired when the imagery-matching picture preceded the target compared with when the picture followed the target. These findings indicate that a task-irrelevant imagery-matching picture attracts attention, leading to attentional blink. However, it remains unclear whether attentional guidance is observed for not temporal but spatial attention in a visual-search task.

In the training paradigm, Reinhart, McClenahan, and Woodman (2015) reported that mental imagery for a target in a visual-search task improves attentional selection of the target. In the study by Reinhart and colleagues, a target was presented during the first few trials, after which participants were required to imagine searching for the target in a search array. Participants in the imagery-training condition located the target more quickly during the visual-search tasks than did those in the perceptual-training condition. The search efficiency following the imagery training, however, disappeared when distractor items were removed in the visual-search task. When the distractors were not presented, participants did not need to imagine any stimuli except for a target, which led to a precise mental representation of a target in the perceptual training. Removing a salient distractor item in the search task also eliminated the search efficiency of imagery training (Clarke, Barr, & Hunt, 2016). Taken together, precise mental imagery of a target stimulus enhances attentional allocation to the target in a visual-search task. However, in these tasks, the imagined stimulus was a target in the visual-search task. That is, the imagined stimulus was task-relevant information and may have been stored as a search template. Thus, it remains unclear whether mental imagery enhances attentional allocation to the imagery-matching stimulus when the imagined stimulus is an accessory item (or is task irrelevant) in a visual-search task.

The main purpose of the present study was to investigate attentional guidance from visual mental imagery of task-irrelevant information. The present experiment was modeled after those used in typical studies regarding attentional guidance from VWM (Soto et al., 2005), except that participants were required to store the representation of an imagined item rather than an external stimulus in VWM during the visual-search task. In the experiment, participants were asked to imagine a color (Experiments 1 and 3) or object (Experiments 4 and 5) based on the prime word. They were then asked to locate the target in a visual-search task in which both the target and distractors were presented. The color of the target was matched to the imagined color (Experiments 1 and 3) or object (Experiments 4 and 5) in valid trials, while that of the distractors was matched to the imagined color or object in invalid trials. It was hypothesized that participants would rapidly direct attention toward the imagery-matching target even when the imagined information was task irrelevant. Therefore, it was expected that reaction times (RTs) in valid trials would be shorter than those in invalid trials. Moreover, it was hypothesized that attentional guidance would not occur when participants viewed the prime word only, without imagining color (Experiment 2); that is, RTs in valid trials would be comparable with those in invalid trials.

Experiment 1

In Experiment 1, participants were instructed to imagine a color when presented with a color-word cue (e.g., red). Previous studies have demonstrated that color imagery following the presentation of a color word enhances subsequent perception of color (Chang, Lewis, & Pearson, 2013; Wantz, Mast, & Lobmaier, 2015). In Wantz et al. (2015), participants visualized color based on a cue word that included the first two letters of a color word (e.g., “ye” for yellow), following which participants were asked to determine the color of a presented square. When the imagined color matched the color of the square, participants were able to determine the color of the square more quickly. Thus, it was hypothesized that color imagery following presentation of a color word would also modulate spatial attention, and that participants would direct attention toward the imagery-matching color in a visual-search task.

Method

Participants

Experiment 1 included 30 undergraduate participants (22 women, eight men; age range: 18–24 years, Mage = 19.2 ± 1.3 years). All participants had normal or corrected-to-normal vision and received monetary compensation for participation. Written informed consent was obtained from all participants prior to inclusion in the study. An a priori power analysis for a repeated-measures analysis of variance (ANOVA) was conducted using G*Power 3.1 software (Faul, Erdfelder, Lang, & Buchner, 2007). To achieve a medium effect size (f = .25), α = .05, and 1 − β = .80, the analysis suggested that at least 29 participants would be required. All experiments were conducted within the guidelines prescribed by the Declaration of Helsinki and were approved by the ethics committee of Kansai University.

Stimuli and procedure

Figure 1 illustrates the protocol for Experiment 1. All stimuli were presented on a 14-inch monitor (resolution: 1,920 × 1,080 pixels; refresh rate: 60 Hz) at a viewing distance of approximately 60 cm. Each trial began with a white fixation cross at the center of a gray screen, which was presented for 500 ms, following which the name of a color was presented for 1,000 ms (i.e., mental-image cue). The terms red, green, or blue were printed in kanji or kana.Footnote 1 Participants were required to imagine the color of the printed word for a later imagery task. They were instructed as follows: “Mental imagery of color might differ among people. Please visualize the color that comes to mind from the name of a color as precisely as possible in each trial. You do not have to remember the color that you imagined in previous trials.” The word presented in each task was selected at random, and each word was presented an equal number of times.

Fig. 1
figure 1

Example of the visual displays used in Experiment 1

A fixation cross was then presented for 4,000 ms, after which two stimuli were presented in a search task. The stimuli were Landolt-C-like squares (64 × 64 pixels) with an 18-pixel gap on the top, bottom, left side, or right side. One square had a gap on the top or bottom (i.e., target), whereas the other had a gap on the left or right side (i.e., distractor). The color of the stimuli was randomly selected during each trial from among red (RGB: 255, 0, 0), green (RGB: 0, 255, 0), or blue (RGB: 0, 0, 255), and each color was presented an equal number of times. The color of the target and the distractor always differed. The squares were located 350 pixels to the left and right of the fixation cross. Participants were asked to report whether the gap in the target was on the top or bottom by pressing appropriate keys as accurately and as quickly as possible. They were instructed to press an up-arrow key for the gap on the top and a down-arrow key for the gap on the bottom with the right hand. The arrow keys were located on the right side of the keyboard.

Following the participant’s response and a 500-ms blank interval with a fixation cross, two squares of different colors were presented 350 pixels to the left and right side of the cross in an imagery task. The color of the stimuli was the same as that indicated by the mental-image cue and was randomly selected from among four different colors of the same category (i.e., basic, pigment, Natural Color System [NCS], and Munsell colorsFootnote 2). Participants were asked to select the color that more closely matched the imagined color. They were also instructed as follows: “There is no correct answer. Even if the presented two colors are different from the imagined color, please select the one that is more closely matched the imagined color.”

In the first phase of the experiment, participants performed eight practice trials, after which they completed 72 experimental trials. There were three types of trials: valid, invalid, and neutral. For valid trials, the color of the target in the search task was within the same category as that indicated by the mental-image cue. For invalid trials, the color of the distractor was within the same category as that indicated by the mental-image cue. For neutral trials, the color of the stimuli in the search task differed from that indicated by the mental-image cue. Each trial type appeared an equal number of times, and the trial type was chosen at random.

Results

RTs in the search task after excluding data from incorrect trials were analyzed. Based on visual inspection of box plots, RTs not included between the whiskers (length: 3.0 interquartile range) were excluded for each participant. Thereafter, trials in which RTs deviated more than three standard deviations from the individual mean for each participant were excluded. The excluded outliers represented 4.4% of all trials (see also the Appendix for an alternative method of removing outliers).

One-way ANOVA for RTs in the three types of trials (valid, neutral, invalid) revealed a significant main effect, F(2, 58) = 28.83, p < .001, ηp2 = .50 (see Fig. 2). Bonferroni-corrected comparisons showed that RTs in invalid trials (M = 747.6 ms, SE = 31.4) were significantly longer than those in valid (M = 673.1 ms, SE = 28.8) and neutral trials (M = 708.6 ms, SE = 30.5) (all ps < .001). This analysis also revealed that RTs in neutral trials were significantly longer than those in valid trials (p < .001).

Fig. 2
figure 2

Mean reaction times for each trial in Experiment 1. Error bars represent standard errors

The error rates were 2.0%, 2.4%, and 2.6% in valid, neutral, and invalid trials, respectively. A one-way ANOVA for error rates revealed no significant main effect, F(2, 58) = 0.25, p = .78, ηp2 = .01. These results suggested that the RTs could not be explained by a speed–accuracy trade-off.

Discussion

Consistent with the hypothesis, these results indicated that the targets with the imagery-matching color attracted attention. Detection of the target was significantly enhanced when the target included the imagery-matching color. RTs in valid trials of the visual-search task were significantly lower than those in invalid trials.

It is possible that attentional attraction toward imagery-matching stimuli was due to a bottom-up priming effect rather than mental imagery. Considering that mental representations, rather than priming, are known to influence attentional guidance from VWM (Olivers et al., 2006; Soto et al., 2005; Soto & Humphreys, 2008), attentional guidance in Experiment 1 might also derive from mental imagery rather than from a bottom-up priming effect. In accordance with this hypothesis, a previous study reported that mental imagery—but not priming—enhanced color perception (Wantz et al., 2015). Although a color word may automatically activate color concepts, as observed in the Stroop task (Stroop, 1935), Wantz et al. (2015) demonstrated that priming with a color name does not enhance subsequent color perception and that mentally visualizing a color enhances processing of targets with an imagery-matching color. Wolfe, Horowitz, Kenner, Hyle, and Vasan (2004) also indicated that the prime of a pictorial color cue did not affect the efficiency of the search for a color target in the following visual-search task when the cue contained task-irrelevant information.

It is also possible that participants used the precueing name of a color to search for a target in the visual-search task in a top-down manner. That is, they set themselves up for searching for the stimulus that matched the name of the color. In this case, previous studies have shown enhanced visual-search performance (Müller, Reimann, & Krummenacher, 2003). When the color feature of the precue was likely defining the target in the following search task, the feature was attentionally weighted, and visual search for the target was enhanced. In the present experiment, however, the probabilities of valid, neutral, and invalid trials were equal. Therefore, participants may not have used the precueing color word as top-down tuning of processing to a likely target feature.

Experiment 2

To test the hypothesis that visual mental imagery rather than visual priming guides attention toward an imagery-matching color target, participants in Experiment 2 were required to view the prime word without the instruction to imagine the color. The search task was the same as that utilized in Experiment 1. If visual priming indeed influenced attentional allocation, participants would be expected to attend to the target whose color matched the meaning of the prime word. It was also possible that participants voluntarily set themselves up for searching for the stimulus that matched the meaning of the prime word, although there were no such instructions. In this case, participants would be expected to attend to the target whose color matched the meaning of the prime word, too. In contrast, if attentional guidance toward an imagery-matching color stimulus is not attributable to these reasons but due to mental imagery, attentional guidance would not be expected to occur.

Method

Participants

Experiment 2 included 31 undergraduate participants (29 women, two men; age range: 18–21 years, Mage = 19.1 ± 1.1 years). All participants had normal or corrected-to-normal vision and received monetary compensation for participation. Written informed consent was obtained from all participants prior to inclusion in the study.

Stimuli and procedure

The stimuli and procedure were identical to those of Experiment 1, with the exception of the following modification. Participants were only instructed that the name of a color was presented before the colored Landolt-C-like squares were presented. They were not instructed to imagine the color. Each trial finished with the participant’s response in the search task. Experiment 2 also included three types of trials: valid, invalid, and neutral. For valid trials, the color of the target in the search task was within the same category as the color indicated by the prime word. For invalid trials, the color of the distractor was within the same category as the color indicated by the prime word. For neutral trials, the color of stimuli in the search task differed from that of the prime word. There were no imagery tasks.

Results

RTs were analyzed in the same manner as described for Experiment 1, following the exclusion of outliers. The excluded outliers represented 5.5% of all trials. A one-way ANOVA for RTs in the three types of trials (valid, neutral, invalid trials) revealed no significant main effect, F(2, 60) = 1.39, p = .26, ηp2 = .04 (valid: M = 623.6 ms, SE = 14.9; neutral: M = 620.0 ms, SE = 13.8; invalid: M = 633.6 ms, SE = 14.9) (see Fig. 3). A Bayesian one-way ANOVA was performed using JASP (JASP Team, 2018; Wagenmakers et al., 2017). The analysis provided a Bayes factor of BF01 = 3.51, which indicated that the null hypothesis was 3.51 times more likely to be true than the alternative hypothesis.

Fig. 3
figure 3

Mean reaction times for each trial in Experiment 2. Error bars represent standard errors

The error rates were 2.5%, 3.4%, and 4.7% for valid, neutral, and invalid trials, respectively. A one-way ANOVA for error rates revealed no significant main effect, F(2, 60) = 1.70, p = .19, ηp2 = .05.

Discussion

Experiment 2 was conducted to investigate whether priming or voluntarily attentional setting influenced attentional allocation to prime-matching color stimuli. As the valid target did not attract attention, the results of Experiment 2 indicated that priming had no effect on the attentional allocation observed in Experiment 1, and participants may not have used the voluntarily attentional setting. That is, the prime stimuli did not guide attention in the search task when the semantic information conveyed by the prime was task irrelevant and an unpredictive cue for the search task that followed.

Another possibility of the priming effect is priming of pop-out (Maljkovic & Nakayama, 1994), in which observers are faster to respond when a target feature (e.g., color) in a visual-search task is the same as in a preceding target. Because the color of the search stimuli was randomly selected in each trial in Experiment 2, this priming effect may not have had an effect on the present results. In order to confirm this point, a 3 (validity: valid, neutral, invalid) × 2 (repetition: repeated, switched) ANOVA for RTs was conducted. In the repeated trials, the color of the target was the same as that of the previous one, whereas in the switched trials, the color of the target was different from that of the previous one. The ANOVA revealed no significant main effect of validity, F(2, 60) = 1.12, p = .33, ηp2 = .04, and repetition, F(1, 30) = 3.75, p = .06, ηp2 = .11, and no significant interaction, F(2, 60) = 1.23, p = .30, ηp2 = .04, although the effect size of repetition was not small. This result was consistent with those of previous studies (Kristjánsson, 2006; Michal, Lleras, & Beck, 2014), which indicated that the repeated priming effect was weak, but was observed even when the target feature was task irrelevant. Importantly, the interaction was not significant, which indicated that the priming of pop-out did not affect the effects of validity in the present experiment.

The result in Experiment 1 indicated that attention was attracted toward imagery-matching stimuli. However, RTs analyzed in Experiment 1 were insufficient to draw conclusions regarding attentional deployment. RTs include not only search efficiency but also the effect of target recognition. As Wantz et al. (2015) indicated, mental imagery of a color enhanced target recognition when the target had an imagery-matching color and RTs were lower than those when the target did not have an imagery-matching color. It is possible that the slowed RTs in invalid trials in Experiment 1 derived from slowed recognition of the target and not from attentional attraction toward imagery-matching stimuli because of confliction between imagery and perception. In order to address this problem, a set-size manipulation is necessary. When the number of the stimuli (i.e., set size) increases, RTs increase, which produce search efficiency, or search slope, and intercepts (Wolfe, 2016). The slope refers to the search time per each item, whereas the intercept refers to processes that occur before the beginning of the search or after the termination of the search, for example the amount of time to recognize the target and reject distractors (Wolfe, 2016). In Experiment 3, the effects of set-size manipulation on search efficiency and target recognition were investigated.

Experiment 3

In Experiment 3, a set size was manipulated as two, four, and six to investigate the slopes and intercepts in the visual-search task. If the imagery-matching color attracted attention, then the slopes in the valid trials would be shallower than those in the neutral and invalid trials. In addition, the intercept would not differ between valid, neutral, and invalid trials. On the other hand, if the imagery-matching color did not affect spatial attention but had an effect on postprocessing of the search stimuli, the intercepts, and not the slopes, would differ between valid, neutral, and invalid trials. According to Wantz et al. (2015), mental imagery enhanced recognition of the target with an imagery-matching color, and the intercept in the valid trials would be lower than that in the invalid trials.

Method

Participants

Experiment 3 included 51 undergraduate participants (35 women, 16 men; age range: 18–26 years, Mage = 19.9 ± 2.0 years). All participants had normal or corrected-to-normal vision and received monetary compensation for participation. Written informed consent was obtained from all participants prior to inclusion in the study.

Stimuli, procedure, and analysis

The stimuli and procedure were identical to those of Experiment 1, with the exception of the following modification. A search array appeared with two, four, or six Landolt-C-like squares. The search stimuli were positioned in an imagery circle around fixation with a radius of 350 pixels. There were six possible locations, which were positioned at 45, 90, 135, 225, 270, and 315 degrees from vertical. When the set size was two, the search stimuli were located at 90 and 270 degrees from vertical. When the set size was four, the search stimuli were located at 45, 135, 225, and 315 degrees from vertical. The color of the search stimuli was randomly selected during each trial from among red, green, blue, yellow (RGB: 255, 255, 0), purple (RGB: 128, 0, 128), brown (RGB: 153, 51, 0), or orange (RGB: 255, 127, 0). The set size randomly varied within each trial. Participants completed 216 experimental trials, in which 72 trials per each set size were included.

Individual estimates of the slope and the intercept were computed for valid, neutral, and invalid trials respectively. Linear regressions were conducted, separately for each participant, with RTs as the dependent variable and set size as the independent variable.

Results

RTs were analyzed in the same manner as described for Experiment 1, following the exclusion of outliers. The excluded outliers represented 4.1% of all trials. A 3 (validity: valid, neutral, invalid) × 3 (set size: two, four, six) ANOVA for RTs was conducted (see Fig. 4). The ANOVA revealed significant main effects of validity, F(2, 100) = 85.29, p < .001, ηp2 = .63, and set size, F(2, 100) = 483.97, p < .001, ηp2 = .91. The interaction was also significant, F(4, 200) = 28.05, p < .001, ηp2 = .36, which indicated that in all valid, neutral, and invalid trials, RTs on Set-Size 6 (valid: M = 1,000.9 ms, SE = 30.9; neutral: M = 1,209.3 ms, SE = 38.8; invalid: M = 1,252.5 ms, SE = 41.4) were longer than those on Set-Size 2 (valid: M = 737.0 ms, SE = 23.1; neutral: M = 783.7 ms, SE = 24.4; invalid: M = 803.2 ms, SE = 28.7) and Set-Size 4 (valid: M = 892.8 ms, SE = 30.7; neutral: M = 1,006.3 ms, SE = 33.4; invalid: M = 1,097.8 ms, SE = 35.2), and RTs on Set-Size 4 were longer than those on Set-Size 2 (all ps < .001). Moreover, except for Set-Size 2, RTs in invalid trials were significantly longer than those in valid and neutral trials, and RTs in neutral trials were significantly longer than those in valid trials (all ps < .001, except for difference between neutral and invalid trials on Set-Size 6, in which p < .05). Based on the results in Experiment 1, on Set-Size 2, a one-way ANOVA for RTs in the three types of trials (valid, neutral, invalid trials) was conducted. The main effect was significant, F(2, 100) = 17.74, p < .001, ηp2 = .26, which indicated that RTs in neutral and invalid trials were significantly longer than those in valid trials (all ps < .001).

Fig. 4
figure 4

Mean reaction times for each trial and set size in Experiment 3. Error bars represent standard errors

A one-way ANOVA for slopes in the three types of trials (valid, neutral, invalid) was conducted (see Table 1). The ANOVA revealed a main effect, F(2, 100) = 40.74, p < .001, ηp2 = .45. Bonferroni-corrected comparisons showed that the slopes in neutral and invalid trials were significantly steeper than those in valid trials (all ps < .001). A one-way ANOVA for intercepts revealed no significant main effect, F(2, 100) = 2.33, p = .10, ηp2 = .05. A Bayesian one-way ANOVA using JASP indicated that the null hypothesis was 2.18 times more likely to be true than the alternative hypothesis.

Table 1 Mean search slopes and intercepts (standard errors in parentheses) in Experiment 3

A 3 (validity) × 3 (set size) ANOVA for error rates revealed no significant main effects of validity, F(2, 100) = 1.46, p = .24, ηp2 = .03, and set size, F(2, 100) = 0.55, p = .58, ηp2 = .01 (see Table 2). The interaction was not significant, either, F(4, 200) = 2.24, p = .07, ηp2 = .05.

Table 2 Mean percentage of error rates (standard errors in parentheses) for each trial and set size in Experiment 3

Discussion

Experiment 3 was conducted to investigate whether the low RTs in the valid trials derived from the spatial attention toward the imagery-matching stimuli or the fast recognition of the target. The results indicated that the slope in the valid trials was shallower than that in the neutral and invalid trials, whereas the intercepts did not differ between valid, neutral, and invalid trials. That is, spatial attention, and not target recognition, affected the low RTs in the valid trials.

The slope in the invalid trials was comparable with that in the neutral trials. This was partially consistent with the results of the previous study, which investigated the attentional guidance from VWM (Soto et al., 2005). Although they indicated that the slope in the invalid trials, in which a VWM-matching stimulus was a distractor, was steeper than that in the neutral trials and valid trials, they also showed that cost scores in RTs, or invalid-trial−neutral-trial RTs, did not differ across different set sizes. The latter result suggested that RTs per item did not differ between neutral and invalid trials. Moreover, they also investigated the slopes for the RTs from the fast end of the distribution to clarify the effects of the fastest response. The results indicated that the slope in the invalid trials was comparable with that in the neutral trials and was larger than that in valid trials. These results were consistent with the present result.

In Experiment 1, color imagery was directly induced by a color word, although color plays an important role in various other processes such as object recognition (Bramao, Reis, Petersson, & Faisca, 2011). Wantz et al. (2015) demonstrated that color imagery induced by both color words and object cues results in comparable enhancements in subsequent color perception. Moreover, the object cue was strongly associated with a specific color (e.g., the image of a lemon for yellow). These findings suggest that visualizing an object in mental imagery also influences attentional allocation to imagery-matching stimuli. Therefore, in Experiment 4, the effects of mental imagery associated with objects on attentional allocation were investigated.

Experiment 4

In Experiment 4, participants imagined an object related to a specific color, rather than a color itself following the presentation of a color word cue. Several studies have investigated color-imagery ability during the visualization of common objects (Manning, 2000; Shuren, Brott, Schefft, & Houston, 1996). Although the finding remains controversial, some studies have reported that mental representations of an object’s color information elicit activation in the same neural areas that are activated when participants perceive color (Hsu, Frankland, & Thompson-Schill, 2012; Rich et al., 2006; Simmons et al., 2007). In these studies, color imagery was induced by presenting the name of the object or grayscale photographs. Considering the finding that enhanced neural representations guide attention (Desimone & Duncan, 1995), mental imagery of objects would also guide attention toward the color of an imagery-matching object. If the connection between an object and its color was strong, mental imagery of the object would lead to color imagery. Therefore, attentional guidance from mental imagery would occur even when participants were not directly instructed to imagine the color of the object.

Method

Participants

Experiment 4 included 45 undergraduate participants (25 women, 20 men; age range: 18–24 years, Mage = 20.0 ± 1.3 years). All participants had normal or corrected-to-normal vision and received monetary compensation for participation. Written informed consent was obtained from all participants prior to inclusion in the study.

Stimuli and procedure

Figure 5 illustrates the task protocol for Experiment 4. Each trial began with a white fixation cross, which was presented at the center of a gray screen for 500 ms, after which the name of a vegetable or fruit was presented for 1,000 ms as a mental-image cue. The vegetables and fruits used as cues were each associated with a specific color. The words grape, radish, burdock, and orange were presented in practice trials (i.e., words unassociated with the colors red, green, and yellow). In the main trials, the name of each food was associated with the color red, green, or yellow. Foods associated with the color red included apple, strawberry, cherry, and tomato. Food associated with the color green included melon, broccoli, cucumber, and green pepper. Foods associated with the color yellow included banana, lemon, pineapple, and corn.Footnote 3 Each word was randomly selected and presented an equal number of times. Participants were asked to imagine the objects indicated by the mental-image cue for a subsequent imagery task. They were required to consider the form, cultivar, and number of objects to ensure concrete representation, although no instructions regarding color were provided. They were instructed as follows: “Please visualize the food that you imagine from the name of the food as precisely and as concretely as possible in each trial. For example, visualize how many foods, what types of the food, and what forms of food are in your imagination, and also specify the angles in which you see the foods in your imagination. You do not have to remember the foods that you imagined in previous trials.”

Fig. 5
figure 5

Example of the visual displays used in Experiment 4

A fixation cross was presented for 4,000 ms, after which two stimuli were presented in a search task. The stimuli were Landolt-C-like squares, which were the same as those used in Experiment 1, except for the following modification. The color of the stimuli was randomly selected in each trial from among red (RGB: 255, 0, 0), green (RGB: 0, 255, 0), or yellow (RGB: 255, 255, 0). Participants were asked to report if the gap in the target was on the top or bottom by pressing appropriate keys as accurately and as quickly as possible.

Following the participant’s response and a 500-ms blank interval with a fixation cross, two color-food pictures were presented 350 pixels to the left and right of the cross in an imagery task. The picture size was 280 × 280 pixels. The pictures presented in the imagery task were of the same food indicated by the mental-image cue and were randomly selected from among four different figures of the same category. Participants were asked to select the picture that more closely matched the imagined food. They were also instructed as follows: “There is no correct answer. Even if the presented two foods are different from the imagined food, please select the one that is more closely matched the imagined one.”

Following eight practice trials, participants completed 72 experimental trials. There were three types of trials: valid, invalid, and neutral trials. For valid trials, the color of the target in the search task was within the same category as that of the imagined food/mental-image cue (e.g., the target was red when the mental-image cue was a tomato). For invalid trials, the color of the distractor was within the same category as that of the imagined food/mental-image cue. For neutral trials, the color of stimuli in the search task differed from that of the imagined food/mental-image cue. Each trial type appeared an equal number of times, and the trial type was chosen at random.

Results

RTs were analyzed in the same manner as described for Experiment 1, following the exclusion of outliers. The excluded outliers represented 3.1% of all trials. One-way ANOVA for RTs in the three types of trials (valid, neutral, invalid trials) revealed a significant main effect, F(2, 88) = 5.01, p < .05, ηp2 = .10 (see Fig. 6). Bonferroni corrected comparisons revealed that RTs in invalid trials (M = 760.8 ms, SE = 37.1) were significantly longer than those in valid trials (M = 722.1 ms, SE = 25.5, p < .05). This analysis also revealed that RTs in neutral trials (M = 748.8 ms, SE = 29.9) were longer than those in valid trials (p < .05).

Fig. 6
figure 6

Mean reaction times for each trial in Experiment 4. Error bars represent standard errors

The error rates were 2.2%, 2.0%, and 2.6% for valid, neutral, and invalid trials, respectively. A one-way ANOVA for error rates revealed no significant main effect, F(2, 88) = 0.39, p = .68, ηp2 = .01. These results suggest that the difference in RTs could not be explained by a speed–accuracy trade-off.

Discussion

The results of Experiment 4 indicated that mental imagery for an object with strong associations to a specific color also guided attention toward the color of the imagery-matching object. Previous studies have reported that mental representations of an object elicit activation in the same neural area activated during color perception (Hsu et al., 2012; Rich et al., 2006; Simmons et al., 2007). Even though color imagery was indirectly induced by visualization of an object and participants were not instructed regarding color during visualization, these findings indicate that mental imagery of the object may enhance the mental representation of a specific color, thereby guiding attention toward the imagery-matching color.

Experiment 5

Colored photographs were used for the imagery task of Experiment 4. These photographs may have helped participants to voluntarily direct internal attention toward the color of the objects represented by the mental image. Object recognition is strongly associated with color processing when using color diagnostic objects clearly associated with a specific color (Bramao et al., 2011). Rich et al. (2006) demonstrated that even when grayscale photographs of objects were presented, participants were able to visualize colored objects, resulting in activation of color-selective cortical areas. Therefore, Experiment 5 was conducted to investigate whether visualization of colored objects following the presentation of grayscale pictures guides attention toward the color of the imagery-matching object.

Method

Participants

Experiment 5 included 44 undergraduate participants (21 women, 23 men; age range: 19–23 years, Mage = 20.4 ± 1.1 years). All participants had normal or corrected-to-normal vision and received monetary compensation for participation. Written informed consent was obtained from all participants prior to inclusion in the study.

Stimuli and procedure

The stimuli and procedure were identical to those utilized in Experiment 4, except that the food pictures presented in the imagery task were in grayscale.

Results

RTs were analyzed in the same manner as described for Experiment 1, following the exclusion of outliers. The excluded outliers represented 4.1% of all trials. A one-way ANOVA for RTs in the three types of trials (valid, neutral, invalid trials) revealed a significant main effect, F(2, 86) = 3.73, p < .05, ηp2 = .08 (see Fig. 7). Bonferroni-corrected comparisons revealed that RTs in invalid trials (M = 756.1 ms, SE = 21.4) were significantly longer than those in valid trials (M = 739.4 ms, SE = 19.7. p < .01). RTs in neutral trials (M = 747.7 ms, SE = 22.7) did not significantly differ from valid and invalid trials.

Fig. 7
figure 7

Mean reaction times for each trial in Experiment 5. Error bars represent standard errors

The error rates were 1.0%, 0.9%, and 0.6% for valid, neutral, and invalid trials, respectively. A one-way ANOVA for error rates revealed no significant main effect, F(2, 86) = 0.56, p = .57, ηp2 = .01. These findings suggest that the difference in RTs could not be explained by a speed–accuracy trade-off.

Discussion

The results of Experiment 5 demonstrated that mental imagery for a color diagnostic object guides attention toward the color of the imagery-matching object even when grayscale photographs are presented. Grayscale photographs may prevent participants from voluntarily directing internal attention toward the color of the object. In the imagery task, participants were asked to compare the imagined object with presented pictures. Because the presented pictures were in grayscale, color information may not be useful in the task. However, because each color diagnostic object was strongly associated with a specific color, color imagery may have been induced during visualization of the object, as Rich et al. (2006) demonstrated. Thus, color imagery may guide attention toward the imagery-matching color.

General discussion

The present study aimed to investigate whether visual mental imagery guided attention toward an imagery-matching stimulus. Participants were instructed to perform task-irrelevant color imagery during visual-search tasks. The findings of Experiment 1 demonstrated that RTs in valid trials, in which the color of the target in the visual-search task matched the imagined color, were shorter than those in invalid trials, in which the color of the distractor matched the imagined color. That is, attention was allocated to the imagery-matching stimulus. These effects were not observed when participants did not imagine color in Experiment 2. When the set size in the visual-search task was manipulated in Experiment 3, the shallow search slope in valid trials as compared with invalid trials supported the attentional allocation to the imagery-matching stimulus. Furthermore, when participants were instructed to imagine an object associated with a specific color, attentional guidance toward the imagery-matching stimulus was observed in Experiments 4 and 5. RTs in valid trials, in which the color of the target in the visual-search task matched the color of the imagined object, were shorter than those in invalid trials, in which the color of the distractor matched that of the imagined object. These results suggest that attention is allocated to imagery-matching information.

Previous studies have reported that VWM guides attention toward memory-matching stimuli (Dalvit & Eimer, 2011; Olivers et al., 2006; Soto et al., 2005; Soto et al., 2008). Some researchers have theorized that attentional guidance from VWM is due to enhanced mental or neural representation of the items in VWM (Olivers et al., 2011). In previous studies, however, the stored mental representation in VWM derived from external visual input, and it was unclear whether internal-generating mental representation also attracted attention. The findings of the present study indicate that the attentional guidance to stored mental representation may apply not only to VWM items but also to imagined items. Both VWM and mental imagery refer to internal representations, although mental imagery is induced in the absence of external stimuli (Ganis, 2013; Kosslyn, 2005). In previous studies of attentional guidance from VWM, participants were observed to hold information internally even when external stimuli were not presented during a visual-search task (Dalvit & Eimer, 2011; Olivers et al., 2006; Soto et al., 2005; Soto et al., 2008). Additional studies have indicated that mental imagery activates neural representations, inducing patterns of activity similar to those observed during actual visual perception (Cichy et al., 2012; Cui et al., 2007; Lee et al., 2012). Taken together, these findings indicate that the same mechanisms may underlie attentional guidance from visual mental imagery and that from VWM.

Nonetheless, visual mental imagery and VWM representations may differ in some respects. Participants rely on two strategies during VWM tasks: mental imagery and phonological recoding (Berger & Gaunitz, 1979; Gur & Hilgard, 1975). During phonological recoding, participants encode stimuli using language rather than visual representations. Soto and Humphreys (2008) demonstrated that attentional guidance from VWM was modulated by verbal processing. Using an articulatory-suppression task in which participants repeated task-irrelevant speech sounds (e.g., “ba”), the authors demonstrated that this modification prevented phonological recoding of the VWM stimuli. During the articulatory suppression task, the effect of attentional guidance on VWM-matching stimuli was reduced. Importantly, the authors noted that the articulatory-suppression task could not completely eliminate the effects of VWM on attentional allocation. That is, mental representations in VWM more or less affect attentional guidance from VWM. Therefore, mental representations may be critical for attentional guidance due to VWM and visual mental imagery.

Mental representations associated with mental imagery also differ in strength from those associated with VWM (Pearson et al., 2015). In this respect, mental imagery resembles a weak form of perception. Mental imagery in the absence of external stimuli is associated with weak mental representations, which may reduce the effects of attentional guidance from mental imagery relative to those from VWM. Because participants were directly instructed to visualize color following presentation of a word cue in Experiments 1 and 3, mental representations of color may have been relatively strong, a finding further supported by the large effect size observed. In Experiments 4 and 5, the visualized object was strongly associated with a specific color. However, participants were not instructed to visualize color directly, but to imagine form, cultivar, and the number of objects. Thus, internal attention may not have been primarily allocated to color features. The effect sizes revealed by ANOVAs were moderate for Experiments 4 and 5, indicating that mental representations of color imagery in these experiments were relatively weak when compared with those induced in Experiments 1 and 3. In the present study, the target used in the visual-search task was constant (i.e., outlined square with a gap on the top or bottom). Because trials in which the target remains unchanged weaken the representation of the target in the search template (Gunseli et al., 2016), the representation of mental imagery might be strong compared with that of the target. Therefore, the findings of the present study suggest that even weak mental representations, such as those observed in Experiments 4 and 5, induce attentional guidance.

Unlike VWM, participants can internally generate and manipulate mental imagery. Therefore, visualization associated with mental imagery may take longer than that associated with VWM. Previous studies have reported that longer durations of image generation result in significantly greater enhancements in sensory perception (Pearson et al., 2008; Pearson et al., 2015). The duration allowed for image generation in the present study (i.e., approximately 5,000 ms) was likely sufficient for inducing mental imagery, as this duration was chosen based on previous studies of mental imagery (Keogh & Pearson, 2014, 2017; Wantz et al., 2015), in which mental imagery was observed to enhance subsequent perception. However, in these previous studies, participants directly imagined a color stimulus following presentation of a word or object. Thus, the time required to imagine the color of an object without direct instructions regarding the visualization of color remains unknown. If time is required to generate color imagery, attentional guidance toward imagery-matching stimuli may have been observed more clearly by increasing image-generation times in Experiments 4 and 5.

Visual mental imagery occurs both voluntarily and involuntarily (Pearson & Westbrook, 2015). Because a color word automatically activates color concepts, the Stroop effect can be observed when a conflict occurs between a written word and the color in which it is presented (Stroop, 1935). However, as indicated by the results of Experiment 2 in the present study, involuntary mental imagery may not induce attentional guidance toward imagery-matching stimuli. Mental representations induced by involuntary imagery may not be as strong as those induced by a target in the search template. As attention involves distinct top-down and bottom-up mechanisms (Connor, Egeth, & Yantis, 2004), voluntary and involuntary mental imagery may differentially affect attentional allocation to imagery-matching stimuli. Future studies should aim to manipulate voluntary and involuntary imagery in order to elucidate whether these factors exert interactive effects on attentional guidance.

The critical issue in the present study is how the imagery performance was measured. Although it is difficult to clarify that participants imagined color or an object during each trial, analyzing the responses in the imagery task might help address this issue. According to Mannaert, Dijkstra, and Zwaan (2017), people create mental simulation vividly and stably. In their experiments, participants read a sentence in which they stably imagined a precise color of an object. In the present experiments, if participants stored stable mental imagery during each trial, then the selection of color in the imagery task in Experiments 1 and 3 may have been biased toward a certain color. In the imagery task, four different colors in each color category (i.e., red, green, and blue) were selected, and each color was presented six times in Experiment 1 and 18 times in Experiment 3. A chi-square test was conducted for the responses of selection in each participant. Nonsignificant results, which means nonbiased responses, were observed in six of 30 participants in Experiment 1 and four of 51 participants in Experiment 3. It is possible that they responded at random in the imagery task. Even excluding these participants, however, the results remained the same in that RTs in invalid trials were significantly longer than those in valid and neutral trials in Experiment 1 (valid: M = 670.2 ms, SE = 26.5; neutral: M = 707.3 ms, SE = 30.2; invalid: M = 746.2 ms, SE = 29.9) and Experiment 3 (valid: M = 868.3 ms, SE = 28.0; neutral: M = 995.5 ms, SE = 33.2; invalid: M = 1045.0 ms, SE = 35.9). In addition, in Experiment 3, slopes in neutral and invalid trials were significantly higher than those in valid trials (valid: M = 66.0 ms, SE = 4.4; neutral: M = 108.2 ms, SE = 5.6; invalid: M = 112.9 ms, SE = 5.8). In Experiments 4 and 5, biased responses for selection were also observed. In these tasks, four different figures in each food (e.g., apple) were selected in the imagery task, and each figure was presented three times. If participants had stable mental imagery, they would always select the same figure for each food. That is, the same figure would be selected three times. Such biased responses were counted in each food in each participant, and the average proportion of such responses was 86.1% (SE = 2.2) in Experiment 4 and 85.0% (SE = 1.8) in Experiment 5. High proportions reflect biased responses, which may have led to stable mental imagery in the experiments. In order to enhance the validity of the present methods, neural activity for the imagined stimuli should be measured in which the contents of mental imagery could be decoded (Reddy, Tsuchiya, & Serre, 2010).

The present study possesses some limitations. First, individual differences in mental-imagery ability were not assessed. Similar to findings observed regarding VWM capacity, individuals vary greatly in their ability to perform mental imagery, and the strategies used by “good” imagers and “poor” imagers may differ (Keogh & Pearson, 2011). Moreover, recent studies have revealed that some individuals have difficulty visualizing certain images or exhibit deficits in voluntary imagery—a phenomenon known as aphantasia (Zeman et al., 2010; Zeman, Dewar, & Della Sala, 2015). Because individuals with aphantasia exhibit difficulty generating mental imagery, their attention may not be directed to imagery-matching stimuli. Thus, future studies should aim to determine the association between attentional guidance from mental imagery and mental-imagery ability.

Second, the present study investigated the effects of color imagery only, and it remains unclear whether mental imagery for other features also guides attention. For example, considering that mental imagery of stimulus orientation activates early visual areas (Albers et al., 2013), mental imagery of orientation may also guide attention toward imagery-matching stimuli. Additional studies have revealed that VWM of emotional information also affects attentional allocation (Moriya, Koster, & De Raedt, 2014b). Future studies should investigate whether mental imagery of both low-level visual features and high-level cognitive/emotional features influences attentional guidance.

Third, the present study could not discard the possibility that not mental imagery but long-term memory had an effect on attentional attraction. In Experiments 4 and 5, for example, it is possible that participants retrieved the images from long-term memory. Mental imagery activates and evaluates internal representations stored in long-term memory (Ganis, 2013; Kosslyn, 2005). Previous studies have indicated that mental imagery accompanies long-term memory (Greenberg & Knowlton, 2014; Rubin, 2006; Vannucci, Pelagatti, Chiorri, & Mazzoni, 2016). Individuals with high mental imagery generated more autobiographical memories than others (Vannucci et al., 2016), whereas individuals without visual imagery showed low reliving of autobiographical memories (Greenberg & Knowlton, 2014). Long-term memory also affects attentional allocations (Reinhart & Woodman, 2015; Woodman, Carlisle, & Reinhart, 2013; Woodman & Luck, 2007). When the target in the visual-search task was stored in long-term memory in the constant-target condition, participants could efficiently search for the target even under working-memory load (Woodman & Luck, 2007). Moreover, stimulation of the medial-frontal cortex using transcranial direct-current stimulation enhanced the long-term memory representations indexed by the anterior P1, which also facilitated the search for the target in the visual-search task (Reinhart & Woodman, 2015). Future work should discriminate the effects of mental imagery and long-term memory on attentional allocation by requiring participants to imagine, for example, bizarre scenes, which would decrease the amount of information available in long-term memory (Baddeley & Andrade, 2000).

Fourth, it is still unclear whether the attentional allocation to an imagery-matching stimulus was an automatic effect or strategic control. Because the imagined color was sometimes used as the color of a target in the search array, participants could voluntarily attend toward the imagery-matching stimulus to facilitate the maintenance of imagined representations. For example, attentional guidance toward VWM-matching information was observed, although this attentional shifting was not observed when the VWM-matching stimulus was always a distractor in a visual-search task (Woodman et al., 2013; Woodman & Luck, 2007). In addition, an increased probability that a target in a visual-search task matched the content of VWM-enhanced attentional guidance from VWM (Carlisle & Woodman, 2011; Kiyonaga, Egner, & Soto, 2012; Moriya, Koster, & De Raedt, 2014a). Future study should manipulate the probability of valid and invalid trials and investigate the effects of strategic control on attentional allocation.

In summary, the findings of the present study demonstrate that mental imagery of a color guides attention toward imagery-matching information. Moreover, the results of the present study indicate that, even when participants indirectly visualize a color, imagery of a color diagnostic object also guides attention toward imagery-matching information. Such attentional allocation did not occur without mental imagery.