Introduction

How our perceptual systems encode information is of central importance to the understanding of human cognition. A vast amount of research has examined the roles of attention and reinforcement in learning and memory formation. While it is clear that multiple factors guide how we encode information (Seitz & Watanabe, 2005, 2009), the interplay between these factors remains unclear. Recently, a number of labs have investigated a new learning paradigm that shows great promise in dissociating factors that contribute to the encoding of information. These studies have found that processing the target of a rapid serial detection task can facilitate the encoding of information paired with the task targets (Dewald, Sinnett, & Doumas, 2011; Lin, Pype, Murray, & Boynton, 2010; Seitz & Dinse, 2007; Seitz & Watanabe, 2005, 2009; Swallow & Jiang, 2010, 2011; Watanabe, Nanez, & Sasaki, 2001). In these studies, the target-paired stimuli were irrelevant to the serial detection task that the subjects were asked to conduct, and we therefore call this task-irrelevant learning (TIL).

The phenomenon of TIL has been studied in the most detail in the case of perceptual learning. Research into task-irrelevant perceptual learning (TIPL; Seitz & Watanabe, 2009) has demonstrated that subjects learn and become better at detecting or discriminating task-irrelevant stimuli when they are consistently presented at behaviorally relevant times (Seitz & Watanabe, 2005), such as when task targets (Seitz, Lefebvre, Watanabe, & Jolicoeur, 2005; Seitz & Watanabe, 2003) or rewards (Seitz & Watanabe, 2005) are presented. TIPL has been found for motion processing (Watanabe et al., 2002), orientation processing (Nishina, Seitz, Kawato, & Watanabe, 2007), critical flicker fusion thresholds (Seitz, Nanez, Holloway, & Watanabe, 2005, 2006), contour integration (Rosenthal & Humphreys, 2010), auditory formant processing (Seitz et al., 2010), and phonetic processing (Vlahou, Protopapas, & Seitz, 2011; Vlahou, Seitz, & Protopapas, 2009) and is arguably a basic mechanism of learning in the brain that spans multiple levels of processing and sensory modalities.

While the goal of initial studies of TIPL was to examine whether perceptual learning could occur in the absence of attention (Seitz & Watanabe, 2003; Watanabe et al., 2001), more recent studies have demonstrated a more complex interplay between attention and reinforcement in TIPL (Choi, Seitz, & Watanabe, 2009; Nishina et al., 2007; Tsushima, Seitz, & Watanabe, 2008). Furthermore, a number of studies have reported that learning does not occur for stimulus features that are irrelevant to a subject’s task (Ahissar & Hochstein, 1993; Schoups, Vogels, Qian, & Orban, 2001; Shiu & Pashler, 1992). Accordingly, it has been speculated that attention and reinforcement play complementary roles in learning (Roelfsema, van Ooyen, & Watanabe, 2010; Seitz et al., 2009) and that multiple factors interact to produce observed learning effects (Seitz & Dinse, 2007). Indeed, TIPL has been observed in some studies but not in other studies, and the role of attention in TIPL can explain this discrepancy in results.

Tsushima et al. (2008) conducted a TIPL experiment in which the irrelevant information could be around threshold (5% and 15% coherent motion) or suprathreshold (50% coherent motion). TIPL occurred only for coherent motion stimuli around the threshold (5% and 15%), and not for the 50% coherent motion. Thus, TIPL was observed for a weak task-irrelevant signal, but not for a strong task-irrelevant signal. One hypothesis is that weak task-irrelevant signals fail to be “noticed” and suppressed by the attentional system and, thus. are learned, while stronger stimulus signals are detected and suppressed and are not learned (Roelfsema et al., 2010; Tsushima, Sasaki, & Watanabe, 2006).

Similar conclusions were drawn by Choi et al. (2009), who examined how directed exogenous attention impacted the formation of TIPL. In this study, arrows were employed as an exogenous orienting cue to manipulate a subject’s attention to a task-irrelevant stimulus (previous studies showed that arrows could trigger attentional orienting such that a subject’s attention was automatically directed to the place where the arrowhead pointed; e.g., Ristic & Kingstone, 2009; Tipples, 2002). During training sessions, subjects were asked to report the orientation (left or right) of the arrows presented in the center of the screen. The task-irrelevant stimuli were selectively presented according to where the arrowhead pointed: A patch with specific motion direction was consistently presented on the side where the arrowhead pointed, the attended side; the other patch with a different motion direction was on the other side, the unattended side. The results indicated learning for irrelevant stimuli that were unattended but no performance improvement for irrelevant stimuli to which exogenous attention was directed. The authors concluded that these findings were at odds with the hypothesis that TIPL occurs as a result of attention being directed to the task-irrelevant features as long as available attentional resources remain but were in accordance with the hypothesis that attention inhibits the learning of task-irrelevant stimuli, rather than facilitating it, especially when these stimuli are salient (Tsushima et al., 2008).

A study by Nishina et al. (2007) investigated the spatial profile of TIPL. In this study, subjects were trained on an attentionally demanding letter detection task at one location while subthreshold, static Gabor patches, which were masked in noise, were presented at different locations in the visual field. The results showed that the largest improvement in discriminating Gabors at the trained orientation was at the closest spatial proximity to the task. These data indicate that the learning of the task-irrelevant visual feature depends significantly on the task location, with a gradual attenuation according to the spatial distance between them. While these results are consistent with a spatial fall-off of the learning signal that promotes TIPL, the authors speculated that the results could be due to an interaction between a broad learning signal and a spatially restricted attentional mechanism. In this case, learning is inhibited for stimuli presented outside of a limited spatial window. The latter possibility would be in agreement with the results of Tsushima et al. (2008) and Choi et al. (2009), which showed that attentional suppression can restrict TIPL—albeit, in the case of Nishina et al., suppression was outside the areas where attention was directed.

These TIPL studies demonstrate an important interplay between attention and learning, but it is a little unclear when attention will serve to allow TIPL (i.e., Nishina et al., 2007) and when attention will serve to restrict TIPL (i.e., Choi et al., 2009; Tsushima et al., 2008). While the TIPL paradigm has been effective in detailing the factors that guide encoding, it is limited by the fact that these perceptual learning studies typically require thousands of training trials spread across multiple days. Practically speaking, the fact that these studies are time consuming and expensive limits the number of experimental conditions that can be run. Given this, it has not always been feasible to conduct studies that address in detail why some conditions produce TIPL and others do not.

Recent progress in studies of TIPL has been made by a number of labs with the demonstration of a fast form of TIPL (fast-TIPL; Dewald et al., 2011; Lin et al., 2010; Swallow & Jiang, 2010, 2011). In this fast-TIPL paradigm, subjects conducted a rapid serial visual presentation (RSVP) target detection task (looking for a target, letter, color, or word among a series of distractors), while other stimuli (images, pictures) were consistently paired with the stimuli of the RSVP task. Similar to TIPL, visual memory was enhanced for salient stimuli that were paired with the targets of the RSVP task (Lin et al., 2010; Swallow & Jiang, 2010, 2011). The studies of fast-TIPL make a number of findings regarding the processes of learning. First, they show that TIPL can occur on the time scale of a single trial, rather than the many days of exposure typically required to observe TIPL. Second, they show that processing of stimuli that are relevant to the subject (although not relevant to the RSVP task), and not only irrelevant stimuli, can be enhanced through TIPL. Third, they show that TIPL can occur for salient stimuli. As such, this paradigm is very attractive as a method for understanding the processes involved in TIPL.

Furthermore, existent studies of fast-TIPL raise questions regarding how attention and reinforcement play a role in this effect. Swallow and Jiang (2010) suggested that detecting a target in one task may induce an “attentional boost” at the moment in time at which the target appears that facilitates the processing and encoding of information into memory. To study the role of attention in this effect, they conducted an experiment where subjects were instructed to inhibit processing of the images. In this condition, no enhancement for the target-paired images, as compared with the distractor-paired images, was observed (Swallow & Jiang, 2011, Experiment 4). Dewald et al. (2011) conducted a slightly different experiment in which superimposed words and pictures were presented and subjects were instructed to detect immediate repetitions of words or pictures, depending on the experimental design. At the end of the experiment, surprise recognition tests were performed on pictures and words. Results obtained for the unattended stimuli indicated worse performance for the stimuli paired with the target than for the other stimuli. Accordingly, Swallow and Jiang (2011) suggested that attention to the images is necessary to observe an enhanced memorization for target-paired images. However, an alternative explanation is that attention can suppress TIPL of salient, distracting stimuli (e.g., Tsushima et al., 2006; Tsushima et al., 2008). Since the images presented with the RSVP task were irrelevant and not weak, attention inhibited their processing, and then these images were not learned. According to this framework, when subjects are asked to perform the RSVP task while memorizing the images, the images are not totally irrelevant, and there is no interest in inhibiting them. In this situation, traces of visual scenes may be automatically encoded into memory at behaviorally relevant points in time (e.g., times of reinforcement), regardless of the spatial focus of attention (see also Lin et al., 2010; Seitz & Watanabe, 2005, 2009).

The objective of the present study is to clarify how attention impacts the observation of fast-TIPL. To accomplish this, we conducted a series of experiments where exogenous attention of the subjects was manipulated by using arrows as the task target. In order to study the time course of the effects, we compared recognition rates for images presented with the target and images presented before and after the target. Swallow and Jiang (2010) made similar comparisons and observed better performance for target-paired images than for pre-target-paired images and better performance for target-paired images than for post-target-paired images but no differences between pre- and post-target-paired images. However, novel to our design is the use of arrows as targets, which serve as exogenous attentional cues.

Experiments 1a and 1b

In our first experiment, we set out to test how using an arrow as a target would impact the fast-TIPL procedure. We used our fast-TIPL paradigm (Leclercq & Seitz, in press) with a response for an arrow target and a test for image recognition after each trial. In Experiment 1a, subjects were required to make an immediate response to the arrow target, whereas in Experiment 1b, the response to the arrow target was delayed to the end of each trial. If attention facilitates fast-TIPL, the arrow, which would exogenously direct attention to the image stream, should facilitate memorization of the target-paired image. However, if attention disrupts fast-TIPL (e.g., Choi et al., 2009), memorization of the target-paired image should be disrupted in these two experiments.

In addition to addressing the impact of exogenous attention, these manipulations allowed us to address existing controversies in the field. Swallow and Jiang (2010) observed that better performance for target-paired images can be eliminated when the target task requires an arbitrary stimulus–response mapping (in their case, discriminating red vs. green squares). Those authors argued that accessing the response mapping in working memory and selecting an arbitrary response demand attention and may eliminate the effect of higher performance for target-paired images. However, a delayed response for discrimination task was used successfully by Lin et al. (2010).

Method

Forty subjects gave informed consent to participate in the experiment, which was approved by the University of California, Riverside. All subjects reported normal or corrected-to-normal visual acuity and received course credit and financial compensation for the 1-h session. Twenty subjects (19 years of age ± 8 months; 14 females, 6 males) participated in Experiment 1a, and 20 (20 years of age ± 5 months; 13 females, 7 males) participated in Experiment 1b. Prior to testing, subjects were familiarized with the 192 images that were to be used in the experiment by viewing each image for 2 s, presented once before the beginning of the experiment. After this, subjects performed a practice block of 12 trials. Each subject was then tested for a total of 240 trials, in 10 blocks of 24 trials. Blocks were separated by brief breaks.

Apparatus and stimuli

An Apple Mac Mini running MATLAB (Mathworks, Natick, MA) and Psychtoolbox Version 3 (Brainard, 1997; Pelli, 1997) was used for stimulus generation and experiment control. Stimuli were presented on a 22-in. LCD monitor with a resolution of 1,680 × 1,050 pixels and a refresh rate of 60 Hz or on a 22-in. CRT monitor with resolution of 1,600 × 1,200 pixels and a refresh rate of 100 Hz (results did not differ across monitor type). Subjects sat with their eyes approximately 60 cm from the screen. The backgrounds of all displays were a mid-gray. Display items consisted of one hundred ninety-two 700 × 700 pixel (18.3° of visual angle) photographs depicting natural or urban scenes from eight distinct categories (i.e., mountains, cityscapes, etc.). Images were obtained from the LabelMe Natural and Urban Scenes database (Oliva & Torralba, 2001) at 250 × 250 pixels of resolution, then up-sampled to 700 × 700 pixels of resolution. The average luminance of all images was 79 cd/m2 (standard deviation of 29).

Procedure

Each trial began with the presentation of a fixation cross for 450 ms. This presentation was followed by a rapid sequence of 16 full-field images. Each image was presented for 133 ms, followed by a blank interstimulus interval of 367 ms for a stimulus onset asynchrony of 500 ms (Fig. 1).

Fig. 1
figure 1

Design of Experiment 1. On each trial, subjects had to rapidly press the correct key when the arrow appeared, while also memorizing 16 images presented in RSVP. At the end of each trial, subjects had to respond to the image recognition task

Arrow task

A gray aperture (1° of visual angle and luminance of 92 cd/m2) was presented in the center of each image, thus centered in the middle of the screen. On each trial, a black square (luminance of 0.25 cd/m2) was presented at central fixation in the middle of the gray aperture for 15 images, and a black arrow (0.75° of visual angle and luminance of 0.25 cd/m2) was presented in the middle of the gray aperture for 1 image. The arrow could be pointing to the left or to the right. The squares and the arrow had the same onset and offset times as the image with which they were paired. The arrow could appear only with images presented in serial positions 9–16. This avoided the presentation of the target at the onset of the RSVP stream and provided a greater chance that subjects were engaged in the task when the critical images were presented (Lin et al., 2010). For Experiment 1a, subjects were instructed to fixate the center of the screen without eye movement during the experiment and to rapidly press the arrow key, left or right, corresponding to the direction of the arrow presented when it appeared. They were also instructed to memorize the 16 images presented on each trial and were tested on image recognition after each trial. For Experiment 1b, subjects were instructed to withhold their response to the direction of the arrow until the end of each trial, just before the image recognition task. As in Experiment 1a, they were also instructed to memorize the 16 images presented on each trial and were tested on image recognition after each trial.

Image recognition task

Following each trial, subjects were presented with a test image and were asked to report (by pressing the up-arrow or down-arrow key) whether the test image had appeared on that trial. To facilitate comparison of the results with those of previous studies, we used the same procedure as that used by Lin et al. (2010). The test image was presented for 3,000 ms or until subjects’ response. On 50% of the trials, the test image was an image presented in position 9–16 of the present RSVP sequence. For each experiment, the images presented with the target were tested on 16 trials, the images presented in the position just before the target (pre-Target 1) or just after the target (post-Target 1) were tested 14 times each, images presented in the position pre-Target 2 or post-Target 2 were tested 12 times each, and images presented in the position pre-Target 3 or post-Target 3 were tested 10 times each. The other positions were tested on the remaining trials. On the other 50% of the trials, the test image was drawn from the set of images not presented on that trial. Of note, the target in the arrow detection task did not predict which image would be tested in the image recognition task, and thus any benefit in processing of the image was task irrelevant in regard to the detection task.

Results

Mean performance on the arrow discrimination task was 95.0% ± 1.0% (within standard error) for Experiment 1a and 98.4% ± 0.6% for Experiment 1b, indicating that subjects’ memorization of images did not negatively influence performance of the central task.

In order to examine the temporal dynamics of performance within trials, we compared performance on target-paired images with performance on pre-target-paired images (7 images possible) and post-target-paired images (7 images possible). A repeated measures analysis of variance (ANOVA) conducted with experiment (Experiment 1a, Experiment 1b) as a between-subjects factor and relative image position (pre-target-paired, target-paired, post-target-paired) as a repeated factor indicated no effect of the experiment factor and no interaction between the experiment and relative image position factors. This first result indicates that the use of a delayed or an immediate response did not impact the results of our experiment, in line with the results obtained by Lin et al. (2010). Since there were no significant differences between Experiments 1a and 1b, we lump data across these experiments in the analysis presented below (see Fig. 2a for graphs of data from each experiment).

Fig. 2
figure 2

a Performance (in percentages) for pre-target-paired (white), target-paired (light gray), and post-target-paired (black) images for Experiments 1a and 1b, without control for the recency effect (left panel) and with control for the recency effect (right panel). Error bars represent standard errors of the means. b Performance (in percentages) for pre-target-paired (white), target-paired (light gray), and post-target-paired (black) images for Experiments 1a and 1b lumped together without control for the recency effect. Error bars represent standard errors of the means

Results for the image recognition task are shown in Fig. 2b. The hit rate for target-paired images (68.8% ± 1.6% correct) was larger than the false alarm (FA) rate (33.2% ± 2.0%), t(39) = 15.27, p < .001; the hit rates for pre-target-paired images (60.2% ± 1.0%) and post-target-paired images (73.4% ± 0.8%) were also both larger than the FA rate, t(39) = 17.70, p < .001, and t(39) = 24.49, p < .001, respectively. An ANOVA conducted on percent correct (hits) independently of experiment and with relative image position (pre-target-paired, target-paired, post-target-paired) as a within-subjects factor indicated a significant effect of relative image position, F(2, 78) = 25.15, p < .001. Planned comparisons showed better recognition performance for target-paired images than for pre-target-paired images, F(1, 39) = 15.91, p < .001, better recognition performance for post-target-paired images than for target-paired images, F(1, 39) = 6.14, p = .018, and finally, better recognition for post-target-paired images than for pre-target-paired images, F(1, 39) = 63.93, p < .001. We calculated dʹ values for each subject and conducted a second ANOVA with relative image position as a within-subjects factor (to achieve normality for statistical analyses on dʹ, we converted dʹ to unbiased percent correct values—i.e., the percent correct value for each dʹ score on the assumption of an unbiased criterion; see Macmillan & Creelman, 1991). The same results as in the previous ANOVA were obtained, with better recognition performance for target-paired images (.69 ± .05) than for pre-target-paired images (.64 ± .03), F(1, 39) = 13.23, p < .001, better recognition performance for post-target-paired images (.71 ± .02) than for target-paired images, F(1, 39) = 6.99, p = .012, and better recognition for post-target-paired images than for pre-target-paired images, F(1, 39) = 68.72, p < .001.

The results of these experiments confirmed earlier findings of better recognition for target-paired images than for pre-target-paired images; however, we found a novel result that recognition was best for post-target-paired images, as compared with target- and pre-target-paired images. As we show below, this advantage for the post-target-paired images can partially be explained by a recency effect and partially explained as a result of exogenous attention being directed toward the images.

We noted that post-target-paired images are, on average, presented later on a trial than are pre-target-paired images and target-paired images. In order to control for a possible recency effect, we analyzed only pre-target- and post-target-paired images and target-paired images that were matched in sequence position. Specifically, first we removed from the analysis images in positions 9 and 16, because there were no post-target-paired images in position 9 and no pre-target-paired images in position 16. We then averaged across positions the results for pre-target-paired, target-paired, and post-target-paired images. Consequently, we obtained a new percent correct (hit rate) for pre-target-, target-, and post-target-paired images that control for any recency effects. Results for the image recognition task with control for the recency effect are shown in Fig. 3. A new one-way ANOVA was conducted on these new values, with relative image position (pre-target-paired, target-paired, post-target-paired) as the within-subjects factor. The results indicated a main effect of this factor, F(2, 78) = 5.59, p < .01. Planned comparisons showed better recognition performance for target-paired images (68.3% ± 1.9%) than for pre-target-paired images (62.1% ± 1.0%), F(1, 39) = 6.52, p = .015. No significant difference was obtained between post-target-paired images (68.8% ± 0.9%) and target-paired images, F(1, 39) = 0.08, p = .78. However, significantly better recognition for post-target-paired images than for pre-target-paired images was still obtained, F(1, 39) = 14.12, p < .001. We calculated dʹ values for each subject, and an ANOVA with relative image position as a within-subjects factor also showed a significant effect of relative image position, F(2, 78) = 6.23, p < .005. Planned comparisons indicated significantly better recognition for target-paired images (.93 ± .05) than for pre-target-paired images (.77 ± .03), F(1, 39) = 7.24, p = .010, and significantly better recognition for post-target-paired images (.97 ± .03) than for pre-target-paired images, F(1, 39) = 16.41, p < .001.

Fig. 3
figure 3

Performance (in percentages) for pre-target-paired (white), target-paired (light gray), and post-target-paired (black) images for each experiment, with control for the recency effect. Error bars represent standard errors of the means

The results of these analyses that control for a recency effect indicated better performance in recognition for target-paired images than for pre-target-paired images and no difference between target-paired images and post-target-paired images. These results are in accordance with the existence of TIPL in these experiments. However, the results of better recognition performance for post-target-paired images than for pre-target paired images is novel and requires additional explanation (see Experiment 2).

Discussion of experiments 1a and 1b

The results of these experiments indicated that recognition rates were superior for target-paired than for pre-target-paired images, but also for post-target-paired than for pre-target-paired images, even when a control for the recency effect was conducted. How can these results be reconciled with Swallow and Jiang’s (2010) finding of no differences between pre- and post-target-paired images?

One hypothesis is that fast-TIPL occurred for the target-paired image, leading to better performance for target-paired images than for pre-target-paired images. However, in our experiment, the target was an arrow pointing to the left or the right. This arrow exogenously oriented attention toward the images (Ristic & Kingstone, 2009; Tipples, 2002), and since the subject had only to pay attention to the images after the presentation of the RSVP target, this exogenous orienting could help to endogenously direct attention to the images, which could also lead to a better memorization of these images. Consequently, performance for images presented after the target was enhanced, partially obscuring the effects of fast-TIPL for target-paired images. While this result would suggest an impact of attention opposite to that found by Choi et al. (2009), it would be consistent with the idea that attention facilitates the processing of stimuli that are relevant to subjects. However, an alternative explanation could be that the difference between pre- and post-target-paired images relates to an increase of arousal when subjects have to process the target and that this arousal lasts for images presented after the target. To test between these possibilities, we ran a new experiment.

Experiment 2

Experiment 2 was conducted to address whether the comparable performance for the post-target-paired images and the target-paired images could be due to attention being oriented toward the images. The method was similar to that in Experiment 1, but a white square target replaced the arrow target. If the failure to find a difference between target-paired and post-target-paired images in Experiment 1 was related to the arrow target and orienting of attention, then in Experiment 2, performance for target-paired images should be better than performances for pre- and post-target-paired images, and no significant difference would be expected between pre- and post-target-paired images.

Method

Twenty new subjects (19 years of age ± 5 months; 14 females, 6 males) participated in this experiment. Procedure, apparatus, and stimuli were the same as those described in Experiment 1, with the exception that a white square (0.75° of visual angle and a luminance of 251 cd/m2) was used as target instead of an arrow. Subjects were informed that they should press the right arrow key when they detected the white square in the RSVP task (immediate response, as in Experiment 1a).

To control for recency effects, the results were analyzed only for image positions 10–15, as discussed for the recency controls in Experiment 1.

Results

Mean performance on the white square detection task was 95.7% ± 0.8%, indicating that subjects’ memorization of images did not negatively influence performance of the central task.

The results for the image recognition task are shown in Fig. 3. The hit rate for target-paired images (70.8 ± 2.9) was larger than the FA rate (35.0% ± 3.3%), t(19) = 9.03, p < .001; the hit rate for pre-target-paired images (59.0 ± 1.9) and post-target-paired images (63.3 ± 1.2) were also both larger than the FA rate, t(19) = 8.62, p < .001, and t(19) = 8.34, p < .001, respectively. An ANOVA on percent correct, with recency control as described in Experiment 1, was conducted with relative image position (pre-target-paired, target-paired, post-target-paired) as a within-subjects factor. The results indicated a significant effect of relative image position, F(2, 38) = 5.68, p < .01. Planned comparisons showed better recognition performance for target-paired images (70.8% ± 2.9%) than for pre-target-paired images (59.0% ± 1.9%), F(1, 19) = 7.08, p = .015. Contrary to Experiment 1, we found significant better performance for target-paired images than for post-target-paired images (63.3% ± 1.2%), F(1, 19) = 6.56, p = .019, and no significant difference between post-target-paired images and pre-target-paired images, F(1, 19) = 2.01, p = .17. For each subject, dʹ values were calculated and entered into a second ANOVA with relative image position as a within-subject factor. The same results as in the previous ANOVA were obtained, with a significant effect of relative image position, F(2, 38) = 6.26, p < .01, indicating a significantly better recognition performance for target-paired images (1.00 ± .09) than for pre-target-paired images (.65 ± .05), F(1, 19) = 7.41, p = .014, and post-target-paired images (.77 ± .03), F(1, 19) = 8.45, p = .009, but no difference in performance between pre- and post-target-paired images, F(1, 19) = 1.75, p = .20. Thus, we found the expected results of better performance for target-paired images, as compared with pre- and post-target-paired images, and no difference between pre- and post-target-paired images, indicating a clear effect of fast-TIPL for target-paired images.

The results of Experiment 2 corroborate the hypothesis that the presentation of an arrow target leads to an orienting of attention to the background images presented after the target and are inconsistent with the arousal hypothesis. Figure 4 shows a more detailed time course for image recognition as a function of placement in the stream, relative to the target. Of note, these data involve all presentations of images in each relative position and, thus, do not control for the recency effect, as was done in the prior analysis. This control is not necessary for this analysis, since we are comparing the data across two experiments that are equally impacted by any effects of recency. We observed that differences in results between Experiments 1 and 2 occurred only for post-target-paired images. A t-test revealed significantly better performance for post-target-paired images in Experiment 1 (70.3 ± 1.0) than in Experiment 2 (64.8 ± 1.1), t(1, 58) = 2.10, p = .040. The same result was obtained for dʹ, with better performance for post-target-paired images in Experiment 1 (1.08 ± 0.04) than in Experiment 2 (0.86 ± 0.05), t(1, 58) = 2.75, p = .008. Difference in performances for post-target-paired images between these two experiments is probably related to an orienting of attention toward the images presented in the background in Experiment 1. However, when a target that does not trigger an orienting of attention to the images is used (like a square), an enhanced overall memorization for the target-paired image is observed.

Fig. 4
figure 4

Performance (in percentages) for images according to their position relative to the target (from -3 to +3) for Experiments 1a, 1b, and 2. Error bars represent standard errors of the means

Experiment 3

To further confirm the results of Experiments 1 and 2, which seem to show that attention facilitates fast-TIPL, we ran a new experiment that allowed us to better address the spatial direction of attention. This experiment was more analogous to that of Choi et al. (2009), which showed that exogenously directed attention impaired slow-TIPL. In Experiment 3, we employed a method similar to that in Experiment 1, but instead of presenting one image at a time, at each presentation, two images were simultaneously displayed, one to the left and one to the right of the fixation point (see Fig. 5 for a task schematic). In this way, we could more directly study how recognition performance on the images would be impacted by the direction of exogenously directed attention by the arrow. Furthermore, as compared with Experiments 1a and 1b, the use of two images decreased the probability that subjects would orient their attention to the posttarget images indicated by the arrow in order to better memorize them. Indeed, in this new experiment, the images tested at the end of each trial could consist of the image presented on the side indicated by the arrow or on the opposite side (in the same proportion). Consequently, after the presentation of the arrow target, the subject gained no advantage from attending to one side. This allowed us to more cleanly examine the impact of exogenous attention in TIPL; if attention facilitates fast-TIPL, we should observe higher memorization rates for target-paired images presented on the side indicated by the arrow (congruent condition; i.e., right image if the arrow pointed to the right), as compared with the target-paired images presented on the opposite side (incongruent condition; i.e., left image if the arrow pointed to the right).

Fig. 5
figure 5

Design of Experiment 3. On each trial, subjects had to memorize the direction of the arrow and 16 images presented in RSVP. At the end of each trial, first, subjects had to indicate the direction of the arrow and, afterward, respond to the image recognition task

Method

Twenty new subjects (19 years of age ± 4 months; 10 females, 10 males) participated in this experiment. The procedure, apparatus, and stimuli were the same as those described in Experiment 1b, with the exception that two images were presented at a time, one to the left and one to the right of the fixation point. Thus, instead of 16 images, 32 images were presented on each trial. Image size was 512 × 512 pixels (13.6° of visual angle), and each pair of images was separated by a visual angle of 3°, leading to a global visual angle of 29.6° (Fig. 5). Each subject was tested for a total of 256 trials presented in 11 blocks. Blocks were separated by brief breaks.

For the image recognition task, in this experiment, as in the previous ones, target-paired images were tested on 16 trials for each of the congruent and incongruent conditions. However, the number of trials testing the other position was divided by two: half for the congruent condition and the other half for the incongruent condition. Thus, for each of the congruent and incongruent conditions, images presented in the pre-Target 1 or post-Target 1 position were tested 7 times, images in the pre-Target 2 or post-Target 2 position were tested 6 times, and images presented in the pre-Target 3 or post-Target 3 position were tested 5 times. As such, the estimates of performance were less reliable when positions temporally distant from the target were evaluated.

To control for recency effects, the results were analyzed only for image positions 10–15, as discussed for the recency controls in Experiment 1.

Results

Mean performance on the arrow discrimination task was 95.2% ± 1.1% and indicated that subjects’ memorization of images did not negatively influence performance of the central task.

The results for the image recognition task are shown in Fig. 3. Globally, the hit rates for target-paired images (67.9% ± 2.6%), pre-target-paired images (58.4% ± 1.3%), and post-target-paired images (68.0% ± 1.1%) were all larger than the FA rate (51.4% ± 3.0%), t(19) = 5.80, p < .001, t(19) = 3.28, p < .01, and t(19) = 10.86, p < .001, respectively. We conducted an ANOVA on percent correct with control for the recency effect, with relative image position (pre-target-paired, target-paired, post-target-paired) and congruency (congruent vs. incongruent conditions) as the within-subjects factors. The results indicated a significant effect of relative image position, F(2, 38) = 5.89, p = .005, indicating better performance for pre-target-paired images than for target-paired images, F(1, 19) = 8.24, p = .009, and better performance for post-target-paired images than for pre-target-paired images, F(1, 19) = 10.36, p = .005. However, no significant main effect of congruency was observed, F(2, 38) = 0.50, p = .61, and contrary to our hypothesis, there was no significant difference between target-paired images in the congruent condition and target-paired images in the incongruent condition, F(1, 19) = 0.36, p = .55.

We next separately examined performance in the congruent and incongruent conditions. Planned comparisons conducted for the congruent condition indicated better recognition performance for target-paired images (69.2% ± 3.1%) than for pre-target-paired images (60.4% ± 1.6%), F(1, 19) = 5.65, p = .028; however, there was no significant difference in performance between target-paired images and post-target-paired images (65.5% ± 1.5%), F(1, 19) = 0.98, p = .33, and no significant difference between post-target and pre-target-paired images, F(1, 19) = 2.80, p = .11. On the contrary, planned comparisons conducted for the incongruent condition indicated only a trend in better recognition performance for target-paired images (66.7% ± 3.9%), as compared with pre-target-paired images (56.4% ± 1.9%), F(1, 19) = 4.00, p = .060, no difference between target-paired images and post-target-paired images (66.4% ± 1.5%), F(1, 19) < 0.01, p = .99, and also better recognition for post-target-paired images than for pre-target-paired images, F(1, 19) = 9.68, p = .006.

For each subject, dʹ values were calculated and entered into a second ANOVA with relative image position and congruency as within-subjects factors. The same results as in the previous ANOVA were obtained, with a significant effect of relative image position, F(2, 38) = 4.91, p = .013, indicating better performance for pre-target-paired images than for target-paired images, F(1, 19) = 6.70, p = .018, and better performance for post-target-paired images than for pre-target-paired images, F(1, 19) = 11.16, p = .003. Again, no significant main effect of congruency was observed, F(2, 38) = 0.38, p = .68, and there was no difference between target-paired images in the congruent condition and target-paired-images in the incongruent condition, F(1, 19) = 0.24, p = .63. Planned comparisons conducted for the congruent condition also showed better recognition for target-paired images (.48 ± .10) than for pre-target-paired images (.23 ± .04), F(1, 19) = 5.01, p = .037, no significant difference in performance between target-paired images and post-target-paired images (.39 ± .04), F(1, 19) = 0.88, p = .36, and only a trend in better recognition for post-target than for pre-target-paired images, F(1, 19) = 3.55, p = .075. Planned comparisons conducted for the incongruent condition indicated a trend in better performance for target-paired images (.43 ± .12) than for pre-target-paired images (.13 ± .05), F(1, 19) = 4.30, p = .052, no difference between post-target-paired images (.40 ± .04) and target-paired images, F(1, 19) = 0.10, p = .75, and finally, better recognition for post-target-paired images than for pre-target-paired images, F(1, 19) = 10.15, p = .004.

We also examined the fine time course of how memorization rates changed during the trial. Figure 6 shows the results for the image recognition test for images paired with the target and pre- and post-target-paired images (from one to three) for the congruent and the incongruent conditions. Of note, these data involve all presentations of images in each relative position and, thus, do not control for the recency effect, as was done in the prior analysis. This control is not necessary for this analysis, since we are comparing the data across two conditions that are equally impacted by any effects of recency. Descriptively, the performance for target-paired images in the congruent condition appears to be better than performance for target-paired images in the incongruent condition. A t-test confirmed this observation by showing significantly better performance for target-paired images in the congruent condition (70.0 ± 3.0) than in the incongruent condition (61.9 ± 2.9), t(19) = 2.14, p = .045. Similar results were obtained for dʹ, with better performance for target-paired images in the congruent condition (0.46 ± 0.09) than in the incongruent condition (0.26 ± 0.08), t(19) = 2.10, p = .049. No other comparisons were significant, but we did observe that the value at pretarget 3 for the incongruent condition was surprisingly low; this was due to 6 subjects who responded positively to only one or less of the five tested images in this condition. This response rate was below the false positive rate and most likely reflects that this analysis goes beyond the original design goals for the experiment, which was to combine the three pretarget positions in the analyses. Also of note, while the advantage in memorization of the congruent target-paired images, as compared with the incongruent ones, was significant here, and not in the previous analyses, we note that the present analyses included all of the target-paired presentations, whereas the previous one included only a subset (due to the need to control for recency in the comparisons with pre- and post-target-paired images).

Fig. 6
figure 6

Performance (in percentages) for images according to their position relative to the target (from -3 to +3) for Experiment 3. Error bars represent standard errors of the means

In the congruent and incongruent conditions, performance for target-paired images was better compared with performance for pre-target-paired images. Difference between pre-target-paired images and target-paired images obtained for the incongruent condition support the hypothesis that fast-TIPL occurs independently of the direction to which attention is allocated. We also found better performance for target-paired images in the congruent than in the incongruent condition, although this depended on which data were included in the analyses; these data indicate that attention can play a facilitative role in the memorization of the target-paired images. More definitively, these data show that attention is not suppressive in fast-TIPL, when the target-paired images have some relevance to the subjects, as opposed to the findings in Choi et al. (2009), where suppressions was found for completely irrelevant stimuli.

General discussion

Our objective was to study the role of attention in fast-TIPL. In particular, we hoped to resolve differences found in the impact of attention across various studies of TIPL and fast-TIPL. The results from Experiments 1a and b showed that fast-TIPL occurs for target-paired images, but this effect is also followed by better performance for images presented after the target. The results in Experiment 2 corroborated our hypothesis that the better performance for post-target-paired images observed in Experiments 1a and 1b was related to the orienting of attention by the arrow target by showing better performance for target-paired images than for both pre- and post-target-paired images when a square target was used instead of an arrow target. Experiment 3 demonstrated that TIPL can occur invariant of the direction to which attention is allocated but that directed attention can improve memorization of target-paired images.

At first glance, it seems that these findings of fast-TIPL are in direct contradiction to previous findings of slow TIPL. First, in our experiments of fast-TIPL, the images paired with the RSVP task were very salient (i.e., not weak), and studies of slow-TIPL showed that when information presented with the RSVP task are salient, attention inhibits them and they are not learned (Tsushima et al., 2008). Second, we found that an orienting of attention led to an enhanced memorization of these images; however, Choi et al. (2009) found that attentional orienting interfered with slow-TIPL. Key to reconciling these disparate findings is the observation that the target-paired stimuli in previous studies of slow-TIPL had no importance to the subjects and were typically distracting from the subjects’ tasks. However, in the case of the present experiments with fast-TIPL, the target-paired images were important to the subjects who had to memorize them. Thus, while in both slow-TIPL and fast-TIPL, the target-paired stimuli are irrelevant to the RSVP task (see Seitz & Watanabe, 2008), in the case of fast-TIPL, the target-paired stimuli have relevance to the subject. Consequently, in the case of previous slow-TIPL studies, attentional inhibition of the target-paired stimuli was advantageous, because those stimuli could take resources from the subjects’ task (Tsushima et al., 2006), whereas in the case of the present fast-TIPL experiments, attentional enhancement of the target-paired stimuli would be advantageous.

Supporting this view, other studies of fast-TIPL in which subjects were not told in advance that the images were important (or to memorize them) did not show fast-TIPL (Dewald et al., 2011; Swallow & Jiang, 2011). For example, Swallow and Jiang (2011) observed no enhanced memorization for target-paired images when subjects were not informed of the subsequent test on image memorization; however, under similar conditions, enhancement for target-paired images was found when subjects were aware of the subsequent test on image memory. On the other hand, Seitz et al. (2009) found, in a study of slow-TIPL where subjects had no task to perform (instead, stimuli were reinforced with liquid rewards), that learning occurred even for suprathreshold orientation stimuli (in Experiment 1 of that article, subjects were close to 100% accurate in discriminating the conditioned orientation stimuli). We suggest that in this case, where the subjects had no task to perform, attentional inhibition did not occur and TIPL could occur for even salient stimuli. Altogether, these results indicate that attention can lead to both enhancement and inhibition of TIPL, depending on the relevance/importance of the task-paired information to the subjects.

This observation is well in line with previous accounts of TIPL that have discussed how attention and reinforcement play complementary roles in the formation of learning (Roelfsema et al., 2010; Seitz & Watanabe, 2005, 2009). While initial accounts of TIPL had the goal of establishing that reinforcement in the absence of attention could lead to TIPL (Seitz & Watanabe, 2003, 2005; Watanabe et al., 2001, 2002), these studies suggested that behaviorally relevant events—such as target recognition (Seitz & Watanabe 2003) or delivery of rewards (Seitz et al., 2009)—lead to a release of diffuse neuromodulatory signals that gate plasticity. More recent accounts of TIPL (Roelfsema et al., 2010; Seitz & Watanabe 2009) have discussed a more complex interplay between attention and reinforcement whereby attentional signals guide learning by suppressing distracting features while permitting the learning of important features. In TIPL with weak stimuli, only reinforcers would play a role, because these stimuli are below the threshold of attention (Tsushima et al., 2006; Tsushima et al., 2008). However, when the paired stimuli are not weak, both attention and reinforcers will play a role. If the paired stimuli are irrelevant, unimportant, and/or distracting, attention will suppress processing of target-paired stimuli, and reinforcement signals will have no or little effect (for a review, see Seitz & Watanabe, 2009). The results of the present study demonstrate a complementary story, whereby, when the paired stimuli are important to the subject, attention will enhance processing of those stimuli, and reinforcement will lead to learning. This can be seen by an enhanced memorization for the congruent target-paired images, as compared with the incongruent target-paired images in Experiment 3. Furthermore, the significant enhancement for the target-paired images, as compared with pre-target-paired images in all the experiments, and the trend for enhanced memorization for incongruent target-paired images over the pre-target-paired images in Experiment 3, support the view that reinforcement in the absence of attention can also lead to fast-TIPL.

An important caveat to the discussion above is that, to date, fast-TIPL and slow-TIPL have not been studied using identical methodologies or with the same stimuli. While there are strong parallels between techniques and findings of fast-TIPL and slow-TIPL, perceptual learning and visual memory have different time courses of acquisition and involve different brain processes. It is thus possible that the underlying mechanisms for fast- and slow-TIPL are, in fact, different and that the similarities observed above are only superficial. Further research will be required to gain a more detailed understanding of the processes involved in TIPL and the relations between fast-TIPL and slow-TIPL.

While we have discussed attention and reinforcement as separate processes, this distinction may be overly simplistic (e.g., Seitz & Watanabe 2005, 2009). For example, the orienting of attention, in the direction of the target arrow, has been linked with the acetylcholine neuromodulatory system (Davidson & Marrocco, 2000). The same neuromodulatory system has been suggested to have an important role in learning: Some studies indicate that a reduction of the cholinergic input reduces cortical plasticity (Juliano, Ma, & Eslin, 1991) and impairs learning (Easton, Ridley, Baker, & Gaffan, 2002; Warburton et al., 2003; Winkler, Suhr, Gage, Thal, & Fisher, 1995). However, other neuromodulatory systems, such as dopamine and norepinephrine, have also been linked both to attention (Fan, McCandliss, Sommer, Raz, & Posner, 2002; Posner & Petersen, 1990) and to learning (Bao, Chan, & Merznich, 2001; Dalley et al., 2001). Indeed, these three neurmodulators (acetylcholine, norepinephrine, and dopamine) have been linked to the three attentional systems described by Posner and Petersen (1990): the alerting network that involves temporal cuing and the maintenance of an alert state (norepinephrine; Coull, Frith, Frackowiak, & Grasby, 1996; Marrocco, Witte, & Davidson, 1994; Witte & Marrocco, 1997), the orienting network that spatially selects information from sensory input (acetylcholine; Davidson & Marrocco, 2000), and the executive control network that resolves conflict among responses (dopamine; Fossella et al., 2002). These studies indicate that attention and reinforcement may not be distinct from each other but that there are dissociated types of attention/reinforcement that play distinct roles in learning. Future research will be necessary to further clarify how these different systems interact in the formation of TIPL.

Finally, the recency effect, referring to the fact that the latest items in a list memorized are recalled most accurately than previous ones (Ebbinghaus, 1885), was observed in our experiments. However, this effect cannot explain by itself all the results, because better performances for target-paired images were still obtained when a control analysis for recency effect was conducted. Lin et al. (2010) also observed a recency effect in their experiments, but not Swallow and Jiang (2010, 2011), which is certainly related to the fact than in the Swallow and Jiang experiments, the recognition test was performed at the end of the experiment, and not after each trial.

Conclusion

Our results show that fast-TIPL can occur invariant of the direction that attention is allocated but that directed attention can enhance memorization of both target-paired and distractor-paired images. This extends previous findings of slow-TIPL, which also occurs outside the focus of directed attention, to the domain of memory, and here, we show for the first time that spatially directed attention can enhance TIPL. Together with previous findings, our results show that TIPL can be both enhanced and suppressed by attention, depending on whether the target-paired stimuli have some relevance to the subject’s main task.