Methods
Stimuli and procedure
The study used a 2 (location condition) × 2 (ISI condition) within-subject design, with conditions blocked. Stimuli and procedure closely followed Experiment 1 of Pertzov and Husain (2014), with the exception of the added ISI condition and a small change in the cue stimulus described below.
Twelve participants (nine female, mean age 26.5 years) performed the experiment after giving informed consent in accordance with the Declaration of Helsinki. All reported normal or corrected-to-normal visual acuity and showed normal color vision in an Isihara color test. The number of participants was determined by a Bayesian stopping criterion (see “Statistical analysis”). Participants were seated in front of a computer monitor (27” LCD screen with a refresh rate of 166 Hz) at a viewing distance of 60 cm, with their head position stabilized by a head rest. Gaze direction was monitored using an infrared eye tracker (EyeLink 1000, SR Research) operating at 1000 Hz.
The task design is illustrated in Fig. 1. Each trial began with the presentation of a central fixation point, a white disk with a diameter of 0.25 degree of visual angle (dva), shown on a medium gray background. After 500 ms of maintained fixation on this point, four colored, oriented bars were presented sequentially in the periphery. Each bar had a length of 2 dva and a width of 0.3 dva, a unique color (red, green, blue, or yellow, in random order within each trial) and a random orientation drawn with uniform probability from the range of possible bar orientations [0∘,180∘), with the constraint that the orientations of any two bars had to differ by at least 10∘. In the different location condition, each bar was presented in a random location on an invisible circle with a radius of 6 dva around the fixation point, with a minimum distance of 3 dva between the centers of any two bars. In the same location condition, all bars within a trial were presented in the same location on this circle, but the location still varied randomly from trial to trial. Each bar was presented for 200 ms, with an ISI of 300 ms in the short ISI condition, and 600 ms in the long ISI condition.
After a memory delay of 1000 ms following the last sample stimulus, the fixation point was replaced by a central color cue in the form of an annulus with an inner diameter matching the length of the oriented bars. Participants then had to report the memorized orientation of the target stimulus (the bar matching this cue color) with a mouse. A probe bar appeared when the mouse pointer was first moved over the annulus, and its orientation could be continuously adjusted (following the angular position of the mouse pointer). The response was finalized by a mouse click. We used a colored annulus as the cue stimulus instead of the randomly oriented colored bar employed by Pertzov and Husain (2014) to minimize any possible interference of the cue with orientation memory (Souza, Rerko, & Oberauer, 2016). If participants lost fixation before onset of the response cue, the trial was aborted and repeated later in the same block.
For each of the four combinations of location (same or different) and ISI (short or long) conditions, participants completed 120 trials divided into three consecutive blocks of 40 trials each. Within each block, the sample item at each of the four ordinal positions was tested ten times (randomly interleaved). The order of conditions was counterbalanced across participants. Stimulus presentation and response collection were controlled using MATLAB (The MathWorks, Inc.) with the Psychophysics Toolbox (Brainard, 1997; Pelli, 1997; Kleiner, Brainard, & Pelli, 2007) and Eyelink Toolbox (Cornelissen, Peters, & Palmer, 2002) extensions.
Response distributions and mixture model fits
We obtained histograms of response errors (angular deviations between the reported orientation and the orientation of the target item in each trial) for each participant and each task condition to visualize response distributions. We also determined histograms of response deviations from the orientations of the non-target items in each trial. A central peak in these histograms signifies the presence of swap errors (i.e., erroneous report of a non-target’s orientation). However, the minimum separation between the orientations of different items within a trial causes the histograms of non-target deviations expected by chance (i.e., without any swap errors) to be non-uniform.
To remove this effect, we applied a correction to these histogram using a shuffling method (Schneegans & Bays 2017; code available at https://bayslab.com/toolbox). We determined the deviations of the non-target orientations from the target orientation in one trial, A, and added these to the target orientation in another trial, B. This yields a new set of non-target orientations for trial B that still obey the minimum distance requirements both to the target feature and among each other, but are unrelated to the response in this trial. We then determined the deviation of the response made in trial B from these shuffled non-target orientations. We did this for every possible pair of trials (separately for each participant and task condition) to obtain an expected histogram of response deviations from non-target features in the absence of swap errors. Finally, we subtracted this expected histogram from the original histogram to determine the corrected histogram. Any remaining central peak in the corrected histogram indicates the occurrence of swap errors.
Response distributions were fit with a three-component mixture model (Bays et al., 2009). In this descriptive model, each response is assumed to be drawn either from a von Mises (circular normal) distribution around the target value, a von Mises distribution centered on the feature of one non-target item in the same trial (a swap error), or from a uniform distribution. This yields the following probability density function,
$$ p(\hat{\theta}) = p_{T} \phi_{\circ}(\hat{\theta}; \theta, \kappa) + p_{NT} \frac{1}{3} \sum\limits_{i = 1}^{3} \phi_{\circ}(\hat{\theta}; \varphi_{i}, \kappa) + p_{U} \frac{1}{2\pi}. $$
(1)
Here, \(\hat {\theta }\) is the reported value, 𝜃 is the true target value, and φi are the feature values of the non-targets in the trial. We denote with \(\phi _{\circ }(\hat {\theta }; \mu , \kappa )\) the von Mises distribution centered at μ with concentration κ, evaluated at the value \(\hat {\theta }\). The model has three free parameters, namely the proportions of swap errors, pNT, and of uniform responses, pU (with the proportion of target responses pT = 1 − pNT − pU), and the concentration parameter κ of the von Mises distribution.
A separate maximum-likelihood fit of the model was obtained for the response distribution of each participant in each of the four experimental conditions (using code available at https://bayslab.com/toolbox). We note that for the orientation responses, all feature values (which were in the range [0∘,180∘)) were scaled by a factor of two before applying the mixture model so that the von Mises distribution could be used in its standard formulation over the whole circle.
Following Pertzov and Husain (2014), we also used a simple heuristic to estimate the number of target responses and swap errors independently from the model fit. We determined the proportion \(\tilde {p}_{T}\) of responses that fell within a certain range of the target feature (15∘), as well as the proportion \(\tilde {p}_{NT}\) that fell within the same range around any of the non-target features in a trial. This measure does not make any specific assumptions about the shape of response distributions, and only relies on the expectation that an increase in the proportion of target or non-target responses should produce an increase in the frequency of response values in the vicinity of the target or non-target feature values, respectively. Note that for estimating the proportion of non-target responses, we use the histograms without correction for minimum feature separation. While the correction is useful for visualizing the occurrence of swap errors, it does not provide any specific advantages when comparing response frequencies across conditions. Using the uncorrected histograms reduces the reliance on any prior assumptions about response distributions, and also directly matches the method of Pertzov and Husain (2014).
Finally, we assessed the occurrence of swap errors at different temporal or spatial distances between target and non-target items, adapting a method used in Schneegans and Bays (2017). For the temporal distance effect, we grouped all non-target items according to their ordinal position relative to the target item (from preceding the target by three steps to succeeding it by three steps). For each group, we then determined the mean absolute deviation (MAD) of the response in a trial from the non-target feature values in the same trial. If the MAD is below the level expected by chance (in the absence of swap errors), this indicates the occurrence of swap errors specifically for items at a certain temporal separation.
For assessing effects of spatial distance, we similarly grouped non-targets according to their angular distance from the target location (in the different-location conditions only). We used four distance bins, the first covering angular distances up to 67.5∘, and each other spanning a 37.5∘ range up to 180∘ (the minimum spatial distance of 3 dva used in the experiment translates to an angular distance of approximately 30∘, so this spacing produces nearly equal numbers of non-targets falling into each bin). We then again determined the MAD of the response in each trial from the non-targets in the same trial that fall within a specific distance bin. The minimum distance between items’ feature values within a trial also affects the expected MAD in the absence of swap errors, which would otherwise be 45∘. We determined the expected deviation using the same shuffling method as described above, by determining the MAD of a response value from all shuffled non-target feature values.
Statistical analysis
We used Bayesian statistics to determine the evidence for an effect of the different experimental conditions on recall performance. We applied a two-factors (location condition and ISI condition) repeated measures Bayesian ANOVA on the obtained mixture model parameters as well as the heuristic measures for the proportion of target responses and swap errors. Subsequent paired-sample Bayesian t tests were performed where the ANOVA revealed evidence for interaction effects. We additionally performed a three-factor repeated measures Bayesian ANOVA on the mean absolute response errors, with ordinal position of the cued item as an additional factor, and on the MADs of responses from non-target features, with temporal separation as third factor. For the effects of spatial distance between targets and non-targets, we performed an ANOVA with factors ISI and distance bin (since this measure is only applicable for the different-location conditions). All tests were performed in JASP (version 0.14.0.0) using the standard parameters. For ANOVAs, we report the evidence in favor of inclusion of each factor and interaction, BFincl, estimated across matched models. For Bayesian t tests, we report the evidence in favor of an effect over the null hypothesis, BF10.
We further employed a Bayesian stopping criterion (Rouder, 2014) to determine the number of participants in the experiment. The main hypothesis tested in Experiment 1 was that the effect of the location condition on the proportion of swap errors observed by (Pertzov & Husain, 2014) is modulated by the length of the ISI. This predicts an interaction effect that can be tested in the Bayesian ANOVA; however, this cannot be computed analytically in standard Bayesian methods and is instead estimated by sampling, making it less suitable for a stopping criterion. We therefore used the difference-of-differences in the proportion of swap errors between conditions, ΔpNT, as a proxy for the interaction effect:
$$ \begin{array}{@{}rcl@{}} {\varDelta} p_{NT} = \left( p_{NT}(\mathrm{different, short}) - p_{NT}(\mathrm{same, short}) \right) \notag\\ - \left( p_{NT}(\mathrm{different, long}) - p_{NT}(\mathrm{same, long}) \right) \end{array} $$
(2)
We used a one-sample Bayesian t test as basis for a stopping criterion in the number of participants, terminating the experiment after strong evidence (Bayes factor > 10) either in favor or against the hypothesis that ΔpNT≠ 0 was found, or after a maximum of 20 participants when this criterion was not reached. This Bayesian t test constitutes a more conservative criterion for stopping than the evidence for an interaction effect in the ANOVA.
Results
In Experiment 1, we sequentially presented four colored, oriented bars, and participants had to report the orientation of one bar cued by its color. Two factors were varied in a blocked within-subjects design: stimulus location (same or different for the stimuli within a trial) and ISI (300 or 600 ms).
We first determined the effects of the task conditions and the ordinal position of the cued item on mean absolute response error, as a model-free measure of performance (Fig. 2). A three-factor Bayesian ANOVA (with factors location, ISI, and ordinal position) produced overwhelming evidence for an effect of ordinal position (BFincl = 2.72 ⋅ 1045). There was weak evidence against an effect of location (BFincl = 0.37) and moderate evidence against an effect of ISI (BFincl = 0.25), as well as weak to moderate evidence against any interaction effects (all BFincl between 0.14 and 0.42). This suggests that overall recall performance was comparable across task conditions. The effect of ordinal position takes the form of a recency benefit, which is broadly consistent with previous studies (Gorgoraptis et al., 2011).
To analyze effects of task conditions on specific response errors, we fit a mixture model (Bays et al., 2009) to the response distributions of each participant in each condition (pooled over ordinal positions). This yields estimates of recall precision and proportions of target, non-target, and uniform responses. Histograms and model fits of response deviations from target and non-target orientations are shown in Fig. 3, and estimated mixture model parameters in Fig. 4.
Based on the previous findings of Pertzov and Husain (2014), we expected to find a specific effect of location on the proportion of swap errors for short ISIs (with more swap errors in the same-location condition). Based on the results of Harrison and Bays (2018), however, we hypothesized that this effect would not generalize to long ISIs, and that we consequently would find an interaction effect of location and ISI conditions on the proportion of swap errors. We employed a Bayesian stopping criterion for this interaction effect (expressed as a difference of differences) to determine the number of participants in the experiment. The criterion was reached after 12 participants, with strong evidence in favor of an interaction (BF10 = 15.5).
A subsequent Bayesian ANOVA confirmed this interaction effect (BFincl = 22.1), while results were inconclusive regarding a single-factor effect of location (BFincl = 0.76) and showed weak evidence against an effect of ISI (BFincl = 0.41). Separate Bayesian t tests on the effect of location within each ISI condition confirmed that the interaction took the form that we had hypothesized (Fig. 4c): For short ISIs, there was strong evidence that the proportion of swap errors was higher in the same-location than in the different-location condition (BF10 = 26.4), while for long ISIs, there was weak evidence against an effect of location (BF10 = 0.60).
The occurrence of swap errors can be visualized by plotting the histograms of response deviations from the non-targets of each trial, as shown in Fig. 3c and d (corrected by subtracting the distribution that would be expected in the absence of swap errors). Following the method of Pertzov and Husain (2014), we determined the proportion of trials in the two central bins of this histogram (within ± 15∘ of the non-target feature) as a heuristic measure for the proportion of swap errors, and compared them across conditions. However, a Bayesian ANOVA on this measure was inconclusive regarding an interaction of location and ISI (BFincl = 1.21), even though within each ISI condition, the findings from the mixture model were supported (higher proportion of swap errors for the same-location condition with short ISI, BF10 = 18.8, no effect of location for long ISI, BF10 = 0.29). Visual inspection of the histograms suggests that many trials outside of the range of ± 15∘ contributed to the proportion of swap errors, and a post hoc test indeed showed moderate evidence for an interaction effect when the range was extended to ± 30∘ (BFincl = 6.03).
We also applied the Bayesian ANOVA to the other parameters of the mixture model fit. We note that comparisons for these parameters are more likely to show weak or inconclusive evidence since our sample size was determined by a stopping rule on the proportion of swap errors, being the main variable of interest and the one we expected to show the largest effects.
For the concentration parameter κ (Fig. 4a), the results provided weak to moderate evidence against an effect of location (BFincl = 0.29), ISI (BFincl = 0.82), and an interaction of these factors (BFincl = 0.61). Similarly, we found weak-to-moderate evidence against an effect of location (BFincl = 0.29), ISI (BFincl = 0.29), and an interaction (BFincl = 0.41) on the proportion of target responses (Fig. 4b). Applying the heuristic approach to estimate the proportion of target responses from the response histograms likewise yielded weak evidence against an effect of location (BFincl = 0.52) or ISI (BFincl = 0.42), and results were equivocal regarding an interaction effect (BF10 = 1.30).
For the proportion of uniform responses (Fig. 4c), there was weak evidence against an effect of location (BFincl = 0.72) and ISI (BFincl = 0.39). However, we found moderate evidence for an interaction of these two factors (BFincl = 6.83). Subsequent Bayesian t test showed that the form of this interaction was complementary to the one observed for the proportion of swap errors: At short ISIs, the proportion of responses captured by the uniform component of the model was lower in the same-location condition compared to the different-location condition (BF10 = 8.06), while for long ISIs, there was weak evidence against an effect of location (BF10 = 0.47).
To further elucidate the patterns of swap errors in different task conditions, we analyzed the deviation of responses from non-target features at different temporal separations (based on the ordinal positions of target and non-target items in the sequence of stimuli within each trial) and for different spatial distances (based on angular locations). Effects of temporal separation are shown in Fig. 5a. If swap errors occur for certain separations, this will decrease the MAD below chance levels (shown as dotted line; Schneegans & Bays 2017). Due to the minimum distance between the features of different items within a trial, the MAD for other separations can then be increased above the chance level.
A repeated measures Bayesian ANOVA with factors location, ISI, and temporal separation produced overwhelming evidence for an effect of temporal separation (BFincl = 2.98 ⋅ 1018), with MAD values below chance level for non-targets immediately preceding or succeeding the target item (Table S1). We also found weak evidence for a location-separation interaction (BFincl = 1.58) and a location-ISI-separation interaction (BFincl = 1.73). Notably, the MAD for the item immediately following the target is decreased in the same-location condition for short ISIs, in which we observed a specific increase in swap errors. All other factors and interactions showed evidence against an effect (BFincl between 0.04 and 0.69).
We also assessed the effects of spatial distance in the different location conditions (Fig 5b). An ANOVA with factors ISI and distance bin showed strong evidence for an effect of target-to-non-target distance (BFincl = 16.9), with lower MADs in the two bins for smaller distances (Table S2). We found evidence against an effect of ISI (BFincl = 0.22) and an interaction (BFincl = 0.49).
Discussion
We successfully reproduced a key finding from the main experiment of Pertzov and Husain (2014), namely that presenting memory sample stimuli sequentially at the same location selectively increased the proportion of swap errors when using short ISIs. However, we also found strong evidence for an interaction of this effect with ISI, and no positive evidence for an effect of location remained at longer ISIs. This confirms our main hypothesis.
Pertzov and Husain (2014) had tested the effect of location at longer ISIs in a control experiment (reported in their supplementary material) and found support for the same effect as for short ISIs. Converting the result of their t test (t(7) = 2.6, p = 0.03) into a Bayes factor shows that their evidence for a location effect is only weak (BF10 = 2.46), while we found weak evidence against such an effect (BF10 = 0.60). It therefore remains an open question whether or not some location effect persists at the longer ISI. However, the within-subjects design employed here produced clear evidence that the effect decreases with increasing ISI.
We note that even in the short ISI condition, presenting all sample items at the same location did not lead to a complete breakdown of color-orientation binding. Although the estimated proportion of swap errors approximately doubled compared to the different-location condition (from 12% to 25%), a majority of responses was still classified as target reports (67%, compared to no more than 25% that would be expected by chance). This is consistent with previous results (Gorgoraptis et al., 2011; Pertzov & Husain, 2014), and indicates that feature binding in VWM does not entirely rely on spatial separation of stimuli even at shorter ISIs.
On the other hand, when sample items were presented at different locations, we found evidence that swap errors occurred more frequently between spatially close items, indicating a role for location in feature binding at least in the different-location condition. An increase of swap errors with spatial proximity has been observed in previous studies (Emrich & Ferber, 2012; Rerko, Oberauer, & Lin, 2014; Bays, 2016; Schneegans & Bays, 2017), but this is to our knowledge the first time this effect has been found when location was not a task-relevant feature.
Unlike Pertzov and Husain (2014), we found that in the short ISI condition, the decrease in swap errors when items were presented at different locations was largely balanced by an increase in the proportion of uniform responses, rather than an increase in the proportion of target responses. This may reflect an (intentional or implicit) strategy aimed at producing the most likely correct response from noisy memory representations, given different levels of certainty as to which memory item is being cued. This interpretation is based on evidence that the retrieved features of different sample items are associated with differing precisions, and that humans have at least partial knowledge of these precisions (Fougnie, Suchow, & Alvarez, 2012; van den Berg, Shin, Chou, George, & Ma, 2012; van den Berg, Yoo, & Ma, 2017; Schneegans, Taylor, & Bays, 2020). Consider the case that the target item in a trial is retrieved with very low precision. If the cue identifies the target item with high certainty, the participant should always attempt to produce that item’s orientation as a response, even if it is of such low precision that it is likely to be categorized as a random response in the mixture model. However, if there is uncertainty about which item is cued, it may be advantageous to report an orientation that is retrieved with high precision, even if it belongs to an item that is somewhat less likely to be the actual target. This would result in an increase of swap errors.
This account is still generally consistent with the hypothesis of Pertzov and Husain (2014) that memory for feature bindings is impaired in the same-location condition. This condition presumably led to greater uncertainty about the cued item, leading to the observed shift from uniform responses towards swap errors. However, such uncertainty does not necessarily imply an impairment of feature binding, as it would also be expected if memory for the item’s cue feature (here, color) is impaired by sequential presentation at the same location.
Critically, the effect did not generalize to the long ISI condition, where we found no positive evidence for a location effect on any parameter of the mixture model. This suggests that it is not the shared location of sample items alone that impairs recall, but the specific pairing of shared location with rapid presentation. The effect may therefore be attributed to masking or temporal crowding (Yeshurun, Rashal, & Tkacz-Domb, 2015) leading to impaired encoding of items in memory, rather than a necessary role of location for binding.
This interpretation is also consistent with the finding indicating higher swap frequencies in the short-ISI same-location condition specifically between a target and directly succeeding non-target. This effect is reminiscent of increased swap rates between directly succeeding target items reported in rapid sequential visual presentation tasks, which have likewise been explained as encoding errors (Wyble, Bowman, & Nieuwenstein, 2009; Wyble, Potter, Bowman, & Nieuwenstein, 2011). We note that the MAD measure we used to assess effects of temporal distance does not discriminate between swap errors and response biases towards non-target features. However, biases should result in decreased recall precision in the mixture model fits, which we did not observe, and therefore swap errors provide the most plausible explanation for the combined results.
We considered a possible alternative to this account, namely that the observed differences between the two ISI conditions were the result of a verbalization strategy. The longer ISI may have allowed more time for forming verbal representations that could supplement visual working memory and compensate for binding deficits in the same-location condition. Such a strategy should have resulted in more categorical responses in the long-ISI conditions. We tested this by producing scatter plots of all pairs of target feature and response feature, and density plots of responses over the space of possible orientations (Fig. S1; Hardman, 620 Vergauwe, & Ricker 2017). While we observed a strong oblique effect (Appelle, 1972; De Gardelle, Kouider, & Sackur, 2010), there were no clear signatures of responding categorically, and crucially no systematic differences in response densities between ISI conditions. This indicates that verbalization did not contribute substantially to recall performance.