Introduction

When faced with an overabundance of information, the visual system uses attention to select important stimuli to be perceived. Without this ability to focus on specific stimuli, the processing capabilities of the brain would be incapable of taking the prompt, decisive action that is required for navigating a complex environment in real time. Visual attention is a complex phenomenon that is almost certain to result from a variety of tightly coupled mechanisms within the visual system. To discover these mechanisms, there has been much interest in charting the temporal and spatial fluctuations of attention in response to a task relevant stimulus. One of the more effective techniques for doing this has been to present two stimuli in sequence where the second stimulus acts as a probe of the state of attention at varying spatial or temporal offsets from the first (Broadbent & Broadbent, 1987). This method has discovered and characterized such phenomena as the attentional blink, in which two targets (T1 and T2) are presented in a rapid serial visual presentation (RSVP) at one location, resulting in a dramatic reduction in the ability to report the T2 when it occurs about 300 ms after the T1 (Raymond, Shapiro, & Arnell, 1992). There are other findings of inter-target interference in the literature, including the attentional dwell time (Duncan, Ward, & Shapiro, 1994), competitive interference (Potter, Staub, & O’Connor, 2002), temporal order errors (Bowman & Wyble, 2007; Chun & Potter, 1995; Hommel & Akyürek, 2005; Olivers, Hilkenmeier, & Scharlau, 2011) and localized attentional interference (Mounts, 2000). Each of these effects describes a particular change in the ability to accurately report two targets as a function of their temporal or spatial separation.

However, it is often unclear whether these effects are distinct manifestations of a common interference process or whether they reflect distinct mechanisms. This uncertainty stems from the fact that different phenomena are studied using different methods. For example, localized attentional interference typically varies spatial offset while holding temporal separation constant, and studies of the attentional blink typically use converse methods. While such highly precise experimental control is an essential tool for studying the human visual system, the use of distinct paradigms to study distinct effects impedes our progress towards a comprehensive understanding of attention for two reasons. First, the visual attention system is known to be highly configurable according to task demands, such as attentional set when using blocked designs (Belopolsky, Schreij, & Theeuwes, 2010; Folk, Remington, & Johnston, 1992), and thus the visual system may be adapting to the tasks that are being used to study it. Therefore, when distinct effects are observed through distinct behavioral paradigms, one cannot be sure if the differences in methods are causing the distinct effects. The second reason is that measuring distinct effects with distinct paradigms makes it harder to directly compare the spatial and temporal extent of those effects since each method uses particular experimental parameters.

The goal of this paper is to review the various interference effects that might exist and to measure them using a single experimental procedure that varies the spatial and temporal separation of two targets relatively unpredictably within a single block of trials. This within-block design ensures that each trial type is performed using a common attentional set. The discussion will conclude with a theoretical depiction of how these distinct forms of interference might arise through distinct processing mechanisms within the visual system. One aspect of the data that will be particularly informative in determining the existence of different levels of interference will be differences in the spatial gradients that occur at different temporal separations. Furthermore, it will be important to note whether effects occur in a forward-only (i.e., T1 affects T2) or retroactive direction (i.e., T2 affects T1).

We will begin by reviewing a variety of effects that are commonly found when participants are asked to report two targets in unspeeded report. This is not an exhaustive list of all of the observed effects from responding to two targets. For example, we have excluded effects related to speeded responses to either or both of the two targets, which give rise to interference effects in reaction time (Jolicoeur, 1999). For each effect we will include a brief review of the major theories for each, although it should be noted that it is beyond the scope of this paper to test these theories.

The attentional blink and the attentional dwell time

The attentional blink (Broadbent & Broadbent, 1987; Raymond et al., 1992) and the attentional dwell time (Duncan et al., 1994) are two effects that are generally thought to reflect the same underlying tendency for the visual attention system to experience a momentary “blink” while processing one stimulus. The attentional blink (AB) is observed when participants have to report two targets presented in the same location at varying temporal offsets (Broadbent & Broadbent, 1987; Raymond et al., 1992; Weichselgartner & Sperling, 1987). In such a case, the likelihood of reporting T2 is dramatically reduced when it occurs within 200–400 ms of T1, especially if both targets are masked. Even more interesting is the finding of lag-1 sparing (discussed in greater detail below), which refers to the fact that when the two targets onset within about 100 ms and at the same location, both of them can be seen more often than when the targets are separated by one to four distractors. The AB is generally found to onset abruptly and then recede over several hundred milliseconds. Its observation requires that the T2 is presented briefly and backward masked (Brehaut, Enns, & Di Lollo, 1999) although it is not necessary that the first target be masked to observe the effect (Lagroix et al., 2012; Nieuwenstein, Potter, & Theeuwes, 2009; Visser, 2007)

Several studies have explored the AB when targets are in different locations. Duncan et al. (1994) presented two masked targets at different spatiotemporal offsets and observed what they termed an attentional dwell time, which resembles in many ways an attentional blink, except that the targets were presented approximately 2–4 degrees apart and without RSVP. Many other experiments have demonstrated variations of the attentional blink across different locations for targets presented either in isolation, or in ongoing RSVP streams with spatial separations ranging from .5 to 9 degrees of visual angle (Bay & Wyble, 2014; Du, Abrams, & Zhang, 2011; Jefferies & Di Lollo, 2009; Jefferies, Enns, & Di Lollo, 2014; Kawahara & Yamada, 2006; Jefferies, Ghorashi, Kawahara, & Di Lollo, 2007; Kristjánsson & Nakayama, 2002; Shih, 2000; Visser, Bischof, & Di Lollo, 1999).

Theories.

There is an ongoing debate as to the underlying cause of the attentional blink, with the major theories invoking either a bottleneck of encoding (Chun & Potter, 1995; Jolicoeur, 1999), a closure of the attentional gate by processing of the post-T1 distractor (Raymond et al., 1992; Olivers & Meeter, 2008; see also, Di Lollo, Kawahara, Ghorashi & Enns, 2005), or a suppression of attention triggered by encoding of the T1 (Wyble, Bowman, & Nieuwenstein, 2009). For a review of these theories, see Dux and Marois (2009) and Martens and Wyble (2010).

Sparing

One of the most intriguing aspects of the attentional blink is the finding that when two targets are presented closely in time (e.g. about 100 ms apart) and at the same location, it is relatively easy to see both of them (Potter, Chun, Banks, & Muckenhoupt, 1998; Raymond et al., 1992). The effect is known as “sparing,” since the second target is spared from the attentional blink. This is counterintuitive because one would generally expect that the AB induced by a T1 would be largest in amplitude when the targets were presented closely in time. Sparing is generally dependent on the two targets having the same location, with one finding showing that even as small as half a degree of spatial separation near the fovea is enough to eliminate the effect (Visser, Zuvic, Bischof, & Di Lollo, 1999). However, there is some disagreement on this point. Shih (2000) found some lag-1 sparing when targets appeared in separate RSVP streams 3.5 degrees apart, and in opposite hemifields, although it was reduced in magnitude relative to when T1 and T2 were in exactly the same location. Kristjánsson and Nakayama (2002) found that lag-1 performance was worst at the location of the T1, and improved as stimuli were presented at farther separations.Footnote 1 Bay and Wyble (2014) found that two targets presented simultaneously at different locations were easier to see than two targets presented at onset asynchronies of 100–400 ms, which was termed lag-0 sparing. Furthermore, expectation of location on the part of the experimental participant has been shown to play a role in lag-1 sparing, such that knowing exactly which location will contain the T1 reduces lag-1 sparing across different locations (Jefferies et al., 2007).

Theories.

Some theories attribute sparing to a rapid burst of attention in response to the detection of T1 (Bowman & Wyble, 2007; Chun & Potter, 1995; Olivers & Meeter, 2008; Wyble et al., 2009) that may be a form of transient attention (Müller & Rabbitt, 1989; Nakayama & Mackeben, 1989). Another explanation for this effect is that when two items are presented at the same location and in close temporal proximity, they become integrated into a single perceptual representation (Akyürek et al., 2012).

Competitive interference

Another effect that is frequently observed when two targets need to be perceived in close succession is a form of competitive interference, wherein there is a tradeoff between the report of two targets. This effect is thought to be distinct from the AB, in that it is retroactive (i.e. T2 can affect T1) and also that it typically occurs over a shorter interval, such as approximately 100 ms onset asynchrony. This effect is most clearly exemplified in Potter et al. (2002), although it can be observed in many AB findings, such as in Chun and Potter (1995). Note that a retroactive form of interference can be observed over a longer temporal interval when participants have to make speeded selections of the second target (Nieuwenstein & Wyble, 2014). There is also some evidence that competitive interference is not present when the targets are presented at different spatial locations (Shih, 2000).

Competitive interference effects are typically observed when T1 and T2 are quite difficult to perceive, for example because their durations are extremely short. When the targets are visible for longer than 150 ms, it is typically possible to see both of them.

Theories

It has been suggested that competitive interference reveals an early stage of direct competition between two targets that is followed by a second stage of processing in which the first target takes precedence (Potter et al., 2002). By this theory, competitive interference and the attentional blink reflect two distinct mechanisms at different stages of processing. This effect has also been attributed to enhanced backward masking of a T1 when it is immediately followed by a T2 (Olivers & Meeter, 2008).

Localized attentional interference (LAI)

Following the detection of a highly salient target at a particular location, there is a brief period of time during which it is more difficult to perceive a second salient stimulus within a spatial distance of several degrees (Mounts, 2000). Compared to the AB, LAI has a much tighter spatial extent, being largely absent already when the two items are just a few degrees of angle apart, while the AB is observed across spatial separations over 5 degrees or more (e.g. Bay & Wyble, 2014). The time course of LAI has not been as well studied as the AB, but it does appear to require a brief temporal asynchrony between the onsets of the two critical stimuli (Mounts, 2000).

Theories.

The LAI effect is thought to reflect the engagement of attention on the first target, which acts by suppressing the processing of information in the spatial region of the visual field immediately surrounding the targets (Cutzu & Tsotsos, 2003). According to this explanation, LAI and the AB are likely the result of distinct attention mechanisms, since the AB also occurs at the same location as the T1, rather than just in the area around it.

Temporal order errors

When two targets are presented in close temporal proximity, participants will frequently report them in the wrong order. For example during some AB experiments participants will swap the order of T1 and T2 as much as 30 % of the time at a 100 ms separation and these errors fall off rapidly as temporal separation increases (Bowman & Wyble, 2007; Chun & Potter, 1995; Hommel & Akyürek, 2005; Olivers et al., 2011). Note that temporal order errors are only possible in experiments for which T1 and T2 are interchangeable (e.g. they cannot occur in the classic AB paradigm of Raymond et al., 1992).

Theories

There are two major theories of temporal order errors in paradigms involving RSVP. The first is prior entry, which proposes that two items presented at nearly the same time race against one another to be encoded first into working memory. If T2 wins the race, it enters working memory prior to the T1, which results in a bona fide perceptual experience of the items being in the wrong order (Olivers et al., 2011; Wyble et al., 2009). An alternative theory is that these temporal order swaps occur when T1 and T2 are combined together into a single representation, which has no temporal order information and thus participants must sometimes guess randomly which target came first (Akyürek et al., 2012; Bowman & Wyble, 2007; Hommel & Akyürek, 2005).

Mapping patterns of attentional interference

All of these effects have all been extensively explored with a variety of distinct experimental designs. To consolidate their measurement into a common experimental framework, we have devised a new paradigm, which is the SpatioTemporal Attention Mapping Procedure (STAMP). This experimental protocol has the goal of measuring inter-target effects across a continuum of spatial and temporal separations. Like in RSVP, the targets are presented among a continuous sequence of distractor items; however, unlike RSVP tasks, the targets and distractors can be centered at any pixel within a broad window on the display (a grid of 600 × 600 possible locations and are scaled in size according to their distance from the center of the screen to compensate for reduced acuity in the periphery. Each item was present on the screen for 107 ms and was followed by a mask, centered at its own location. Targets were black letters and distractors were red letters. Participants were asked to report the two targets at the end of the sequence. The method section below describes the presentation sequence in more detail.

Method

Participants.

363 participants from the Pennsylvania State University, University Park campus Psychology participant pool (age 18–35, mean 18.9 years) participated in the experiment and received course credit. All participants reported normal or corrected to normal vision. All participants were instructed in American English. This experimental protocol was approved by the IRB at the Pennsylvania State University.

Apparatus

The experiment was run using MATLAB (build R2012b) with PsychToolbox 3 (Brainard, 1997; Pelli, 1997) on a Windows XP operating system running on a Dell Optiplex PC. The screen resolution was set to 1024 × 768 at 75 Hz refresh rate on cathode ray tube monitors with a diagonal of 17 inches. Participants were situated in a chinrest located 50 cm from the monitor.

Stimuli

The background of the screen was always gray (150, 150, 150 in RGB on a 255 scale). Stimuli for the primary experiments were of three kinds. Distractors were red letters (255,24,20 in RGB) in the proportional Andale sans-serif font. Targets were black letters in the same font. Masks were superimposed # and @ symbols in either red or black. All stimuli were presented in a 20.6 × 20.6 degree square that was centered on the screen.

The sizes of the letter stimuli were scaled according to their distance from a central fixation cross in 5 categories (see Figure 1a) in circular bands of 2.6 degrees. From 0–2.6 degrees, stimuli were 0.5 degrees wide by 0.7 degrees tall. In each additional distance band, stimulus width increased to 0.58, 0.7, 0.9, and 1.1 degrees with proportionate increases in height as well (see Figure 1b). These sizes were chosen to be well above threshold at each eccentricity so as to reduce crowding in case stimuli happened to occur closely together. The fixation cross was present throughout each trial and was 0.7 degrees wide. A dot or comma would also be presented at the end of each trial that was 0.2 degrees wide.

Fig. 1
figure 1

a. Illustration of stimuli in proportion to display. Shown are distractors in red and masks in black. Distractors were red letters while targets were black letters. Masks were the same color as the stimulus they masked. Dotted lines are shown for illustration purposes only. Circles represent the distance thresholds for increasing the size of the stimuli. b. Illustration of two targets. The location of the T1 (the R) is determined randomly and the T2 is located in one of the 16 spatial arcs defined by the dotted and diagonal lines which are included here only for illustration purposes, or could also occur at the same position as the T1. Note that the size of the targets is determined by their distance from the fixation cross regardless of their distance from one another

Procedure

Fixation training procedure

Participants were trained to maintain fixation using a method that presents interleaved pixilation patterns rapidly at the center of the display (Guzman-Martinez, Leung, Franconeri, Grabowecky, & Suzuki, 2009) and then asks them to attend to stimuli that appear in the periphery without moving their eyes. The display pattern alternated between the two pixilation patterns every frame (13.3 ms) as confirmed by Flip time stamps in PsychToolbox. This rate caused the pattern to wash out into a nearly uniform gray patch when the eyes were stationary, while eye movements caused an abrupt return of the high-contrast pixilation. Participants were first shown five trials of this alternation (400 × 400 pixels, 13.6 × 13.6 degrees, 300–500 alternations) with a central fixation and were instructed not to move their eyes so as to observe the grey pattern. This was followed by seven additional trials in which they had to report a letter appearing randomly on the left or right of fixation (17 degrees eccentricity) for 53 ms and were again instructed not to move their eyes while observing the letter.

STAMP

Each trial began with a fixation cross for 300 ms after which the stimuli began to appear in sequence. The sequence consisted of a series of distractors (red letters) with one or two targets (black letters) interspersed. The total length of the sequence was between 8 to 29 items. Each item in the sequence would be present on the screen for 107 ms after which it would be masked by a 107 ms mask consisting of a superimposed @ and # symbol centered at its location. These masks matched the color and size of the stimulus being masked. Thus, throughout the presentation sequence there would usually be three stimuli on screen simultaneously: the fixation cross, a given item, and the mask for the previous item. The exceptions to this were for the case of simultaneous targets, the frame following two simultaneous targets, and the first and last frames in the sequence, which had only a target and a mask respectively. To emphasize central fixation, each trial was immediately followed by a dot or comma that replaced the fixation. Participants were told at the beginning of the experiment that there would be one or two targets per trial. At the end of the trial participants were asked to type in the two letters separately and in order. They were allowed to enter no response for either letter. Participants then answered whether they saw a dot or comma in a forced choice. Feedback was given at the end of each trial after participants made their responses.

The design used three factors (4 * 5 * 8), with the first two factors determining the direction and distance between T1 and T2 and the third factor determining their temporal separation. There were 160 trials per participant mixed within a single block.

Spatial locations of targets and distractors

Except for the T2, all of the presented items of the sequence, and their following masks, were presented at a randomly chosen location in the 20.6 degree square at the center of the screen and subject to the following constraints. The location of each item was chosen randomly subject to the constraint that it did not spatially overlap with the preceding item, by setting a proximity limit that varied from 1 to 1.75 degrees, with 1 degree used for stimuli close to the fixation cross and 1.75 degrees for stimuli that near the edge of the screen.

The location of the T2 was determined relative to T1 according to two factors that specified its relative distance and direction from the T1 (see Figure 1b). The first factor determined which of four 90-degree arcs of direction, specifying whether T2 was above, left, below, or to the right of T1. The relative angle of T2 was uniformly distributed within the arc. The second factor determined the distance between T1 and T2 and was specified in the following distance bands: 0 (i.e. same location), 1 to 2 degrees (subject to the variable, minimum interitem proximity specified above), 2 to 5 degrees, 5 to 10 degrees, and 10 to 13.7 degrees. These bands of spatial separation will be referred to as sep-0, sep-1, and so on. If T2 was presented immediately after T1, then the T1 mask was omitted. If T2 would occur off screen, or at lag 0 and at the same location as T1 (i.e. superimposed), then T2 and its mask were not shown, resulting in a T1-alone trial.

This procedure produces a slight spatial dependence between the T1 and T2, such that T2 occurs closer to T1 than would be expected by chance. This was necessary in order to measure interference at close separations because truly randomized positioning of T1 and T2 would have resulted in very few trials with a close spatial proximity.

Timing of targets

T1 could appear between 5 and 14 positions from the beginning of the presentation sequence. T2, if it appeared, could follow at lags 0, 1, 2, 3, 4, 5, 6, or 8 relative to T1. The presentation sequence continued for three to seven items after T2 (or the corresponding T2 time slot in a T1-alone trial).

Participant exclusion

To eliminate participants with abnormally low performance, participants were excluded by combining their accuracy scores for trials containing two targets for lags 5, 6, and 8 without regard to spatial separation (mean averaged accuracy across both targets: 66 %, SD 27 %). Examination of the histogram of values across subjects revealed a largely bimodal distribution with one peak above 50 % and a smaller peak below 50 %. To eliminate subjects in the lower peak, participants with less than 30 % accuracy on this metric were removed. This filter excluded 53 participants, leaving 310 participants. This filter did not substantively change the pattern of the results.

Analyses

For all analyses, the angular direction between T1 and T2 was collapsed. For trials containing both a T1 and T2, there are effectively 39 cells (five spatial and eight temporal separations, excepting the cell sep-0, lag-0). Each participant could contribute at most 4 data points to each of these cells. Targets were scored as correct regardless of their temporal order at report. Confidence intervals were estimated using a bootstrapping method that resampled 310 participants from the original data set with replacement, 500 times. Bootstrapped confidence intervals for 95 % of the data were calculated from the resampled distributions and are labeled as BCI. Due to the small number of trials per condition, analyzing data as T2 conditional on T1 produces many empty cells. Therefore analyses of trends across multiple levels of spatial and temporal separation were conducted using logistic regression (Peng, Lee, & Ingersoll, 2002) on individual trials, collapsing across the participant variable and grouping data points into cells that correspond to particular values of lag and separation. Analyses that compared one cell directly against another cell used a paired t test that compared data from the subset of participants with valid data in both cells. To confirm that these subsets had similar data as the entire data set, follow-up analyses compared the means of the dependent variable for each subset of participants against the mean of the entire dataset and the subset means were always within 2 % of the means for the entire data set.

Results

Single-target trials

We first examined the probability of reporting a single target as a baseline measurement of accuracy. We analyzed trials containing only a single target, which were 28.6 % (10,291) of the total trials and found an average accuracy of 74.5 % (SE 1 %). Figure 2 illustrates a spatial map of single-target report probability across the extent of the 20 × 20 degree window. The similarity of report at different positions indicates that targets had similar detectability across the display due to the scaling of size with eccentricity. It should be noted as a caveat that this finding does not guarantee equivalent detectability of targets at all eccentricities under conditions of diminished attention or intertarget interference.

Fig. 2
figure 2

Accuracy of reporting T1 at different locations within the display when it was the only target. The BCI of each cell ranged from +/- 3.2 % to 5.4 %. Units on the axes are in visual degrees

Two-target trials

T1 accuracy

T1 accuracy was analyzed according to its temporal and spatial proximity to T2 divided into 39 different cells: eight temporal lags (0, 1, 2, 3, 4, 5, 6, and 8) and five spatial separations (0, 1 to 2, 2 to 5, 5 to 10, and 10 to 13.7 degrees), excluding the condition in which the targets would be superimposed. Trials in which T1 or T2 appeared within 1 degree of fixation were discarded to eliminate overlap with the fixation cross. This outcome of this analysis is illustrated in Figure 3a, which depicts spatial separation on the horizontal axis and temporal separation on the vertical axis using a bubble plot. As is typical of findings in the AB literature, T1 accuracy was slightly reduced when T2 occurred just after it, at lag-1 in the same spatial position (i.e. sep-0). A paired t-test found that T1 accuracy was reduced at sep-0, lag-1 compared to sep-0, lag-2, t(309) = 4.6 , p < .0001, indicating competitive interference between the targets.

Fig. 3
figure 3

A. Accuracy of T1 for each spatial and temporal separation relative to T2. B. Accuracy of T2|T1 for each spatial and temporal separation relative to T1. The BCI of each cell ranged from +/- 2.4 % to 5.5 %

An important finding here would be that the degree of competitive interference at lag-1 is spatially graded. To determine if this was true T1 performance at lag-1 was regressed across spatial separation for seps 0–4. For these trials, the full model was reliably preferred over the null model (χ2 = 9.4, df = 1, p < .003) with an odds ratio for temporal separation of 1.08, which means that the odds of correctly reporting T1 were increased by 8 % for each additional increase in spatial separation.

This competitive interference extended to the condition in which T1 and T2 were simultaneous.Footnote 2 A regression over data from seps 1–4, at lag 0 found that spatial separation significantly predicted T1 accuracy (χ2 = 56.5, df = 1, p < .0001) with an odds ratio of 1.28 per each step increase in spatial separation (p < .0001), indicating that T1 report was increased when T1 and T2 were presented farther apart. This finding suggests that the degree of competitive interference between two targets at lag-1 is larger when the targets are simultaneous.

T2|T1 accuracy at lags 0 and 1

As is typical of AB studies, we measure T2 accuracy conditionally on trials in which T1 was accurately reported (T2|T1), which is shown in Figure 3b, although similar effects are observed with an analysis of raw T2 accuracy. Our analysis will first focus on lag-0 (i.e. simultaneous presentation), followed by lag-1, and then the attentional blink at lags two through eight.

When T1 and T2 were in different locations, lag-0 performance was superior to lag-1 performance according to a regression of T2|T1 on spatial and temporal separation for seps 1–4 and lags 0–1. Over these trials, the full model was significantly different from the null model (χ2 = 141.7, df = 2, p < .0001). The odds ratio of spatial separation was 1.12 (p < .0003) and the odds ratio for temporal separation was .50 (p < .0001). Thus T2|T1 accuracy was higher when the targets were presented simultaneously rather than sequentially, and when the targets were presented farther apart. This finding replicates the finding of lag-0 sparing demonstrated by Bay and Wyble (2014) and indicates that it is easier to see two targets presented simultaneously than sequentially at a lag of 100 ms.

At lag-1, there was a clear pattern of sparing among trials in which T1 and T2 share the same spatial location as can be seen in an analysis comparing accuracy for T2|T1 between lag-1 (M = .66, BCI +/- .03) and lag-2 (M = .24, BCI +/- .03) at sep-0, paired t(296) = 15, p < .0001. Also, performance on T2|T1 at lag-1 is strongly reduced at sep-1 (M = .35, BCI +/- .03) compared to sep-0 (M = .66, BCI +/- .03) a difference that was also highly significant, t(284) = 9.9, p < .0001. This reduction at sep-1 indicates localized attentional interference.

T2|T1 accuracy during the attentional blink

To obtain a general measure of the influence of spatial and temporal separation on the AB, T2|T1 performance was regressed on the spatial and temporal separation between the targets for lags 2–6 and seps 0–4. This analysis revealed a highly significant effect of temporal separation but not for spatial separation, suggesting that the AB is not strongly tied to the location of T1. In a regression, the full model was significantly different than the null model (χ2 = 803.7, df = 2, p < .0001). The odds ratio for temporal separation was 1.37 (p < .0001), while the odds ratio for spatial separation was not significantly different from 1 (p > .15).

The overall pattern in the data suggests that the AB may be deeper at greater spatial separations. However, one data point that stands counter to this pattern is the particular case of sep-0 at lag-2, in which T2 accuracy is extremely low. However this data point can be considered anomalous because in this one cell, T2 is sandwiched directly between two masks: the T1 mask and its own. To assess the depth of the attentional blink across spatial distances, T2|T1 accuracy was regressed on spatial and temporal separation without including sep-0 (i.e., lags 2–6, seps 1–4). Again, both spatial and temporal separations were regressed on T2|T1 accuracy, and the spatial relationship was found to be negative (χ2 = 710, df = 2, p < .0001; odds ratio of temporal separation 1.3, p < .0001; odds ratio of spatial separation = .88, p < .0001).

Swaps

Temporal order errors were computed as the proportion of trials in which T1 and T2 were correctly reported but in the wrong order. The mean proportions of correctly reporting the order of T1 and T2 are provided in Figure 4 and several clear patterns are present. At sep-0, there was an elevated proportion of swaps at lag-1 (M = 33 %, BCI = +-6 %) relative to lag-2 (M = 08 %, BCI = +-6 %), t(142) = 6.0, p < .0001, as in Chun and Potter (1995). This analysis also revealed that there was a sharp decline in the proportion of swaps when T1 and T2 were not at the same spatial position by comparing swaps at lag-1 between sep-0 (M = 34 %, BCI = +-6 %) and sep-1 (M = 13 %, BCI = +-4 %), t(162) = 5.4, p < .0001. Moreover, at lag-1 there was a progressive decline in swaps as distance increased (χ2 = 61.4, df = 1, p < .0001; odds ratio of spatial separation .70, p < .0001. This graduated decline was also present, although much shallower at lag-2. At later lags, the proportion of swaps is negligible. This is the first finding to our knowledge that swaps are affected in a graded fashion by spatial proximity.

Fig. 4
figure 4

Proportion of trials for which T1 and T2 were reported in the correct order. The BCIs of each cell ranged from +/- 1 % to 6 %

Dot–comma task

Across all trial types, participants correctly reported the dot or comma on 77 % (SE .6 %) of trials. Trials were not excluded based on accuracy on the dot comma task, however a series of follow-up analyses determined that excluding trials with incorrect dot/comma accuracy did not alter the results of any of the significance tests or the overall pattern of results.

Discussion

These results suggest the existence of distinct spatial gradients depending on the temporal separation of two targets. We found that temporal order errors and competitive interference (which we define as reduced performance on T1 at lags 0 and 1) are both more severe when the targets are spatially proximal, peaking when the targets occur at the same location. This pattern differs from that observed during the attentional blink, which was weakly graded in the reverse direction (i.e., stronger when the targets were farther apart). Finally, there was evidence that a target elicits a brief, but intense spatial surround of inhibition around its location, replicating previous findings of LAI (Mounts, 2000). There was also a typical finding of lag-1 sparing for T2|T1 report when the targets were in the same location but not when they were in different locations. Also, T2|T1 report was higher at lag-0 than lag-1, replicating the finding of lag-0 sparing by Bay and Wyble (2014).

Despite our attempts to discourage eye movements through instruction, training and task constraints, some participants may have moved their eyes towards the T1. However, the fact that T2|T1 performance was not higher when the targets were in the same location (i.e., sep-0) compared to when they were in different locations suggests that such eye movements did not predominate. More importantly, such eye movements would have been too slow to affect T2 accuracy in the lag 0 and lag 1 conditions, which were used to measure the existence of competition and LAI effects. Eye movements could conceivably have contributed to the deeper AB at greater spatial separations, however.

It is also worth nothing that while single targets were similarly detectable across the entire display (see Figure 2), fluctuations in attention may have altered the relationship between stimulus size and detectability, in a similar way that the attentional blink alters the minimum duration necessary to perceive a stimulus (Ghorashi et al., 2010). Since stimulus size was fixed according to eccentricity, it is beyond the scope of the STAMP to measure such changes, and this remains an intriguing question to explore.

Evidence for distinct mechanisms of interference

As attentional effects are typically studied in different experimental paradigms, it has been difficult to determine whether they result from distinct mechanisms, or instead are different manifestations of the same underlying mechanism. The present finding of distinct spatial gradients at different temporal offsets supports previous theories that there are distinct sources of interference in the visual system. For example, Potter et al. (2002) argued that different mechanisms of interference stem from different levels of processing with competitive interference in an early stage and the attentional blink at a later stage. Hommel and Akyürek (2005) also pointed out that competitive interactions between two targets at short SOAs might be another explanation for lag-1 sparing effects (see also Akyürek et al., 2012). According to their theory, two targets in close temporal proximity will either be integrated together into a single representation, or engage in competition that sometimes favors the T2. Modelling work has provided a computationally formalized account of this distinction. For example, in the episodic Simultaneous Type, Serial Token (eSTST) model (Wyble, Potter, Bowman, & Nieuwenstein, 2011), there is a mechanism of direct competition between two neural representations of targets presented closely in time. This competitive interference, simulated in the model by lateral inhibitory links between nodes representing the two targets, allows the model to simulate the finding that the T1 is often reduced at lag-1 (e.g., Chun & Potter, 1995). Importantly, this inhibition is relatively weak and does not necessarily cause one stimulus to “win” over the other, and it is thereby possible for two stimuli to be simultaneously active. However if one target has a much stronger representation than the other (due perhaps to differences in familiarity or attention), the stronger item may eliminate the representation of the weaker. In the eSTST model, the attentional blink is caused by a different circuit, which inhibits attention during consolidation of T1 into memory. This suppression is theorized to play a beneficial role in perception by segmenting visual information into episodes prior to storage in working memory (Wyble et al., 2009). Unlike lateral inhibition, which causes direct competition between targets that can exted retroactively, the suppression of attention underlying the AB primarily affects the processing of following rather than preceding targets.

The present findings support a proposed distinction between different forms of interference operating at two separate stages of processing by revealing that there are interference effects with different spatial gradients at different temporal separations. When targets are separated by 0 or 107 ms, T1 performance is worse particularly when the targets are spatially proximal. The spatially graded competitive interference effect is also present in conditional accuracy of simultaneous targets (i.e., T2|T1 at lag 0) in which accuracy increases systematically with increasing spatial separation. To illustrate how the spatial gradient of interference changes as the SOA between T1 and T2 increases, Figure 5 replots data from Figure 3b, showing T2|T1 for lags 0 and lag 3.

Fig. 5
figure 5

Data from Figure 3b replotted for comparison. BCIs for all data points ranged from +/- 3 % to 5.5 %

The finding that competitive interference of T1 is stronger for more proximal targets is consistent with neurophysiological data from monkeys in which receptive fields of neurons late in the visual stream are spatially localized, but also relatively large (DiCarlo & Maunsell, 2003). Consequently, if competitive interference is caused by overlapping neural representations in later stages of the visual system, such interference would be expected to exhibit a spatial gradient in a radius of several degrees from either target, as observed here. These results are also compatible with the theory of attentional selection through biased competition, in which stimuli compete for representation among neurons that have overlapping spatial receptive fields (Luck, Chelazzi, Hillyard, & Desimone, 1997). Such a theory predicts that the closer two stimuli are, the more likely it will be that they share receptive fields of a neuron. While the biased competition theory is intended to explain the selection of targets among distractors, it is likely that a similar competition occurs when two targets are presented simultaneously.

This finding of a spatial gradient of competitive interference over a distance of several degrees argues against the possibility that the competitive tradeoff between T1 and T2 accuracy at short SOAs in RSVP tasks (Chun & Potter, 1995; Potter et al., 2002) is entirely the result of stronger masking of the T1 by the T2 as suggested by Olivers and Meeter (2008). A masking explanation would predict that the drop in T1 accuracy at lag 1 should only occur when the targets are very near the same location.

Our results suggest further that LAI (Mounts, 2000) is distinct from either competitive interference or the attentional blink due to the observation of sharply reduced performance in T2|T1 accuracy at sep-1, lag-1. This finding corresponds to the spatial and temporal profile observed for LAI in Mounts (2000). These findings thereby support the theory that localized attentional interference is the manifestation of an attentional mechanism that is triggered by detection of a target and produces a tightly focused inhibition of attention in the surrounding region (Cutzu & Tsotsos, 2003; Mounts, 2000).

It was also observed that unlike the competition and LAI effects evident at target separations less than 200 ms, the attentional blink does not diminish with increasing separation. This observation fits with previous findings of a relatively similar AB when measured at the same or different locations (Bay & Wyble, 2014; Juola, Botlla, & Palacios 2004; Lunau & Olivers, 2010; Shih 2000; Visser, et al. 1999a, b; but see Du et al., 2011, and Kristjánsson & Nakayama, 2002, for evidence to the contrary) and is also compatible with the theory that the attentional blink reflects interference in processing at a central locus which is not tied to retinotopic or spatiotopic representations (Chun & Potter 1995; Jolicoeur 1998). The results of the STAMP revealed some evidence that the AB is deeper at farther separations, which agrees with the observations of Jefferies et al. (2007) and Jefferies and Di Lollo (2009). This finding may indicate a spatially inhomogenous suppression of attention during the AB, although it should be noted that eye movements could potentially have contributed to this finding.

A conceptual model of multiple attentional effects

Based on these data, as well as previous computational modeling work (Wyble et al., 2009), we offer a candidate model of multiple interacting attentional effects that is summarized in Figure 6. This model depicts neural mechanisms underlying three interacting cognitive processes that are thought to be involved in target detection and encoding. These three processes include (1) the identification of stimuli (i.e., stage one in classical descriptions of two-stage models, such as Chun & Potter, 1995); (2) an attentional system that allows for attention to be triggered by the detection of targets; and (3) a memory system that can selectively encode information (i.e., stage two in two-stage models). Furthermore, this memory encoding system is capable of encoding multiple items at the same time, and can give rise to temporal order errors. In fact, we suggest that there are two potential sources of order encoding errors at different levels of processing as detailed below.

Fig. 6
figure 6

Conceptual model of three distinct mechanisms of inter-target interference in a tripartite model of attention and memory. Circles represent individual neurons in a retinotopic cortical sheet and gray shades represent increasing levels of activation. Arrows represent excitatory and inhibitory synaptic connections. Visual input is processed by the ventral stream, activating neurons with large receptive fields in areas such as V4 and IT cortex that produce spatially-graded, competitive interference (A). Detection of a target triggers the deployment of attention in a topographic neural representation that uses local surround suppression to improve attentional focus resulting in localized attentional interference (B). When a target has been selected for encoding into working memory, attention is suppressed across the entire visual field resulting in the attentional blink (C). This figure reflects the activation of neural activity following detection of a single target

This conceptual model suggests that each of these three systems exhibits a particular kind of interference that can be separately observed in the present data set. Note that there are many possible candidate models that could be proposed to explain the various effects observed here (e.g., Bundesen, Habekost & Kyllingsbaek, 2005; Cave, 1999; Foley, Grossberg & Mingolla, 2012; Hamker, 2005; Heinke & Humphreys, 2003; Itti & Koch, 2000; Mozer & Sitton, 1998; Olivers & Meeter, 2008; Tsotsos et al., 1995; Wolfe, 1994) and it is beyond the scope of this paper to review the existing literature on attention models. The unique contribution of the present account is to provide a comprehensive framework for understanding how distinct effects could emerge at different stages of processing.

The first kind of interference (denoted by A in Figure 6) occurs while processing the identity of the stimuli, which presumably occurs in a relatively late stage of the visual system in a region that has large receptive fields. In this stage of processing, we theorize that a spatial gradient of competitive interference can be observed when two different targets are simultaneously activated, such that T1 and T2 have overlapping neuronal representations in late stages of the visual system, such as IT cortex. This interference would occur most strongly when the targets are simultaneously presented, but would also occur at an inter-target lag of up to 100 ms in both temporal directions (i.e., T1 affects T2 and T2 affects T1). Findings of retroactive interference at longer temporal separation may also reflect interference at this stage of processing (Nieuwenstein & Wyble, 2014).

Temporal order errors may also arise at this stage of processing if the stimuli are perceived as being a single stimulus through integration rather than competition (Akyürek et al., 2012). However, as we also find order errors at considerable spatial separations, it seems unlikely that integration is an explanation for all of the observed temporal order errors.

Next, we assume the existence of an attention map, which provides a gradient-field of attention that represents the amount of attentional enhancement at different locations within the visual field (Bay & Wyble, 2014; Cheal, Lyon, & Gottlob, 1994; Tan & Wyble, 2014). In this map, it is presumed that localized surround inhibition plays a role in allowing attention to focus on one spatial region at the expense of surrounding regions (denoted by B in Figure 6) and this inhibition causes the LAI effect by reducing the excitability of attention in the region surrounding T1 (Mounts, 2000). As this is an attentional effect that restricts the initiation of T2 processing, it occurs in a temporally forward direction, such that T2 processing can be inhibited by T1 at nearby spatial locations, but not as easily vice versa. Furthermore, as this effect is only present during the selection of a briefly presented stimulus, it is brief in duration, lasting only 100 ms.

Another possible source of LAI could arise without an explicit attentional map. For example, Tsotsos et al. (1995) describes a computational framework for understanding how focal attention can arise through interactions within a visual hierarchy, which also uses a surround inhibitory mechanism.

The third form of interference occurs once encoding of a target into memory has begun. We propose that visual attention is suppressed (denoted by C in Figure 6) by ongoing memory consolidation (Bowman & Wyble, 2007; Wyble et al., 2009) which gives rise to an attentional blink that can be observed across the entire visual field. Unlike competitive interference, and like LAI, this effect is forward going in time (i.e., T1 can affect T2 but not vice versa), because it occurs after T1 has been selected by attention, and memory consolidation has begun. This final stage of interference would also give rise to spatial attention deficits that can be observed across the visual field while processing a visual target (Olivers, 2004). Note that the present results suggest this inhibition of attention could be spatially graded such that it is weaker near the T1. Such a mechanism could also explain the contraction of attention towards the fovea which was observed in Olivers (2004), although in our case, attention would be found to contract towards the T1 location which might not have been at the fovea.

Some of the observed temporal order errors may arise at this stage as well due to the phenomenon of prior entry, in which the T2 enters working memory prior to T1 (Olivers et al., 2011). This would seem to be the only available explanation for order errors when T1 and T2 are spatially separated, since such stimuli would presumably not be integrated into a single percept.

While this conceptual model is not yet instantiated in a computational form, thus limiting its ability to generate new predictions, it does provide a framework for thinking about how multiple attentional effects might interact within the visual system to guide future research.

Conclusion

In conclusion, the findings from our STAMP support the development of more comprehensive models of visual attention by allowing us to compare distinct attentional effects within a common experimental paradigm. Future work using similar mapping procedures will help to inform the creation of broader theories that bridge our theories of spatial and temporal attention.

The data and metadata from this experiment will be hosted on ScholarSphere at this URL: https://scholarsphere.psu.edu/collections/x346dj339