Methods
Participants
Experiment 2 was conducted at the Istituto Italiano di Tecnologia, Genova, Italy. Twenty-eight new participants took part, receiving an honorarium for their service. Participants were randomly assigned to one of two action–effect contingency groups as in the first experiment. Furthermore, in each group half of participants started with the cue prediction condition and the other half with action prediction condition. One participant was excluded due to chance-level performance. Participants’ age range was 18–31 (M = 25.7) years; two were left-handed, and 13 were male. All participants self-reported normal or corrected to normal vision.
Written informed consent was given by each participant. The study was approved by the local ethical committee (Comitato Etico Regione Liguria) and was conducted in accordance with the Code of Ethics of the World Medical Association (Declaration of Helsinki). Data were stored and analyzed anonymously.
Procedure
The procedure was generally the same as in Experiment 1, though measuring reaction times necessitated a few modifications. In particular, we focused solely on the cue and action prediction conditions, presenting them in counterbalanced order across participants. For this reason, participants now also started the exposure trials preceding the cue prediction phase by randomly pressing the left or right keys, as in the action–effect association phase (that preceded the action prediction phase), but the subsequently displayed distractor appeared randomly on the left or the right side, in order for participants to unlearn any action–effect associations they may have had acquired previously. The second modification related to task response, which became speeded. Hence, there was no staircase procedure; search displays were presented until response, and there were no post-display masks; participants had to use two different response options to indicate the target probe line orientation as fast and as accurately as possible. The search display was again started by a keyboard press and the search display appeared after 100 ms. Participants issued their target orientation responses by pressing one of two foot pedals. Responses were given via foot pedals because we deemed it potentially confusing for participants (and giving rise to interference) had they had to produce another keyboard response so shortly after initiating the trial by a manual key press.
Results
Accuracy was generally at ceiling level; 95% CI for the mean of individual accuracies = [0.943, 0.968]. An ANOVA on the individual accuracies yielded a significant main effect of distractor presence (F[1, 26] = 13.5, p = 0.0011, \(\eta _{{\text{G}}}^{2}\) = 0.026): slightly more errors were made when a distractor was absent rather than present (0.964 vs. 0.951); critically, however, neither the main effect of prediction type nor the prediction type × distractor presence interaction was significant (both F[1, 26] < 0.8, p > .39). Similarly, an ANOVA on the medians of the individual reaction times (RTs) revealed a significant main effect of distractor presence (F[1, 26] = 63.8, p < 0.001, \(\eta _{{\text{G}}}^{2}\) = 0.050): RTs were overall slower on distractor-present (M = 1168, SD = 244 ms) compared to distractor-absent trials (M = 1061, SD = 237 ms). However, neither the main effect of prediction type (F[1, 26] = 0.11, p = 0.743, \(\eta _{{\text{G}}}^{2}\) = 0.0005) nor the prediction type x distractor presence interaction (F[1, 26] = 1.06, p = 0.312, \(\eta _{{\text{G}}}^{2}\) = 0.0004) were significant. See Fig. 3. Including the group factors (order of conditions, natural versus reversed action–effect mapping) into the ANOVA design did not reveal any additional significant (main or interaction) effects. We followed up the non-significant interaction of prediction type and distractor presence by comparing the distractor interference RT costs (RT distractor present minus RT distractor absent) between the two prediction-type conditions. A Bayes factor analysis using Cauchy prior with a recommended scale of 0.707 yielded modest evidence for null effect: BF01 = 3.03 (Rouder et al., 2009). Finally, we conducted an ANOVA on the so-called inverse-efficiency scores, computed as median RT divided by accuracy (proportion of correct responses) as a potentially more sensitive, aggregate measure of performance (Townsend & Ashby, 1983). However, once again, this analysis yielded the same pattern of results, with a significant main effect of distractor presence (F[1, 26] = 63.8, p < 0.001), but a non-significant main effect of prediction-type and a non-significant interaction of these two factors (both F < 0.76, p > 0.39).
Thus, the follow-up experiment, which employed a potentially more sensitive, reaction time measure, likewise does not provide evidence in favor of a difference between the cue and action prediction conditions.
Discussion
The present study was designed to examine two questions: (1) would the opportunity to predict the presence and location of an item that is task-irrelevant but attention-capturing by means of one’s own actions or by an informative cue interfere with task performance to a greater degree, as posited by the ‘attentional white bear’ hypothesis, or to a lesser degree, relative to when no prediction regarding the distracting item is possible? And (2) would the type of predictive information influence the degree to which the distractor interferes with task performance, specifically: is there evidence for a special role of motor stimulus identity prediction, as posited by optimal motor control theories, or is non-motor identity prediction sufficient for explaining the effect of distractor predictability on performance?
To examine these questions, we adapted an additional-singleton compound visual search task. In the first experiment, the search displays were presented only for a limited exposure duration and then immediately overwritten with post-display masks. Using this task design, we opted for a measure of distractor interference in terms of accuracy, rather than RT, costs, which is arguably better suited to capture effects arising at early, perceptual processing stages of attentional stimulus selection and discrimination, unaffected by later, post-selective processes of response selection and execution (Santee & Egeth, 1982). Of note, our study is one of only very few that successfully demonstrated attentional capture using this type of paradigm and measure. In the second experiment, display was presented until response and reaction times were measured, to examine whether the results obtained in Experiment 1 would be generalizable to another dependent variable, the reaction time measure. To address our research questions, we manipulated the way in which the presence and location of the distractor was predicted, namely, by providing participants with either an explicit informative external cue or making them internally generate a prediction in terms of the anticipated effect of a motor action they chose to perform. In all conditions, we controlled for factors such as the presence of an action, cognitive load, temporal predictability, and temporal control, which are common confounds in other studies on action–effect prediction (Hughes et al., 2013), to isolate the specific effects of non-motor and motor stimulus identity prediction. In this respect, we believe our study to be unique in the literature on the potential influences of motor prediction on attention.
In the first experiment, we found that for both prediction by an action and prediction by a cue, the distractor interference was reduced, compared to a (non-predictive) baseline condition. Because our action and external cues provided predictive information simultaneously about the presence and location of the distractor, future studies are needed to disentangle the respective contribution of these two aspects of prediction. Of note, the interference reduction was of a comparable magnitude whether the distractor was predicted by an external cue or by the choice of an action. Predictive information of either type about the absence of a distractor had no noticeable effect compared to the baseline, suggesting that the prediction indeed influenced the processing of the distractor item, rather than the performance improvement being due to some other facilitatory processes related to the provision of predictive information as such.
Our attempt to capture, as well as possible, any specific effects of motor stimulus identity (action–effect) prediction came with a methodological cost, namely, presentation of the three prediction conditions in a fixed order. In particular, the action prediction condition had to be administered last because of the action–effect association (learning) phase that was required for this condition. Implementing this phase earlier on in our within-participants design would have influenced any other (i.e., the baseline and/or cue prediction) condition(s) participants would have performed after it (e.g., pressing the left button in a baseline condition performed after the action condition might have attenuated the intensity of a stimulus that happened to occur at the location previously associated with this action). However, our results provide no evidence that there was a learning effect within the three conditions, that is, there was no systematic reduction of distractor interference with increasing time on a particular task (prediction) condition (see the Appendix and Fig. 4)—possibly owing to the long exposure to distractors in the ‘exposure’ (or ‘association’) phase before each condition proper and the number of practice (96) and staircase (on average 96) trials at the start of the experiment (cf. Müller et al., 2009). In addition, across conditions, it is unlikely the change in task between the baseline and the cue and action prediction conditions as such brought about a step-like change in performance, due to some factor other than the predictive information provided by the cues, such as novelty or increased arousal. First of all, there was no difference in performance on distractor-absent trials among the three conditions, and for distractor-present trials, any increase in general arousal would, arguably, have led to increased distractor interference (assuming that arousal would have boosted the saliency of the distractor as well as that of the target; e.g. Zou, Muller, & Shi, 2012), rather than the reduction in interference we actually observed.
In any case, we do not believe that our main conclusions with regard to the two questions we set out to answer were compromised by our sequential condition order. First, our results clearly show that distractor prediction did not cause an ‘attentional white bear’ (AWB) effect: the AWB hypothesis predicts a performance cost associated with the cues (i.e., reduced accuracy on distractor-present trials in the cue- and action prediction conditions relative to the baseline), rather than the performance benefit that we actually observed. Second, optimal motor control theories predict that action–effect prediction has a specific, namely, an attenuating effect on the predicted stimulus, over and above the effect of cue prediction. However, we failed to find a significantly greater interference reduction for the action prediction versus the cue prediction condition—which may be taken to argue against optimal motor control theories (as further discussed below). However, despite having evidence favoring the null hypothesis (BF01 = 4.22), there was a small numerical difference and we cannot definitely rule out that self-generated action cues may be somewhat more effective in reducing interference than external cues (a difference we may have been unable to detect with the presented experimental designs and sample sizes).
Cue prediction and attentional white bear
With respect to prediction by external cue, participants were told they could use the cue information in any way that could help them perform the task better. Although most people reported no consistent usage of the information provided by the cue (see Appendix), the cue clearly had a positive effect on performance for most participants. This indicates that the external cue was actually being used by the majority of participants, without them being explicitly aware of this, perhaps in automatic manner, even without some kind of association phase as implemented in the action prediction condition. This is consistent with previous reports that people can extract cue information without being aware of this (Decaix, Siéroff, & Bartolomeo, 2002; Peterson & Gibson, 2011). A similar case can be made for the action prediction condition, in which participants presumably lacked a reason to deliberately and consciously guide their attention according to the button they pressed (although participants were not explicitly questioned about this at the end of the experiment).
Action–effect prediction processes
The difference in performance between the baseline and cue prediction conditions was supposed to reveal the influence of what Hughes et al. (2013) referred to as ‘non-motor identity prediction’ processes, that is, predicting the stimulus (and its properties) in a general manner (not necessarily related to motor processes). And importantly, any difference between the cue- and action prediction conditions was supposed to directly reflect the contribution of specific ‘motor identity prediction’ processes, in line with optimal motor control-based theories (Waszak et al., 2012; Wolpert & Flanagan, 2001). We failed to observe such an additional effect; rather, both types of prediction resulted in very similar effects, both in terms of the overall interference reduction as well as spatial distance effects (see Appendix). While we cannot definitely rule out that this null difference is simply a false negative finding (owing to lack of statistical power), we did achieve a power of 0.82 and 0.78 in our two experiments for observing an effect of the expected size and our Bayes factor analyses revealed more evidence for the null hypothesis of no effect versus the hypothesis of an effect.
Conceivably, our design may have been too different from that of Cardoso-Leite et al. (2010) in that instead of providing predictive information about a near-threshold stimulus our distractor was a highly salient display item. Forward model theories postulate that predicted sensory consequences of self-generated actions are subject to sensory attenuation, but the specific mechanism bringing about this attenuation is unclear. It is possible that such sensory signals are attenuated in a non-linear fashion, depending on the original strength of the stimulus, such that, for instance, very salient stimuli cannot be attenuated. However, Reznik, Henkin, Levy, and Mukamel (2015) found that while self-produced supra-threshold auditory stimuli were attenuated, near-threshold stimuli were enhanced. If their finding generalizes to the visual domain, our salient distractor should be subject to sensory attenuation. Another nonlinearity, described by Zehetleitner et al. (2013), may also make it possible that the sensory strength of the distractor was actually attenuated by motor prediction, but not enough to measurably reduce attentional capture (over and above the reduction with external cues). Zehetleitner et al. (2013) showed that the probability with which a distractor captures attention on a given trial is a psychometric function of the difference in salience between the distractor and the target: if the distractor is much more salient than the target, a small decrease in distractor salience—for instance due to the presumed attenuation of the sensory consequences of self-generated actions—would not translate into any, or only a very small, reduction of the probability of attentional capture.
Overall, while we cannot exclude existence of a sensory attenuation effect for action-specific, motor-identity prediction (Hughes et al., 2013), we observe no evidence in its favor in our experiments. We may only speculate that a more general mechanism may be engaged in both the action and cue prediction conditions. A highly prominent proposal of such a general principle is ‘predictive coding’, or, more generally, ‘predictive processing’ (Clark, 2013) and we therefore believe it is worth discussing how our results may fit into it.
On this view, only prediction errors are propagated to higher levels in a processing hierarchy, and this signal should thus be lower for a predicted than an unpredicted distractor, which could cause sensory attenuation. Importantly, the prediction errors are also weighted by their expected precision, where this precision weighting is generally taken as corresponding to the cognitive-psychological concept of attention (Feldman & Friston, 2010; Hohwy, 2012). Exactly what expected precision should be assigned to a salient but task-irrelevant distractor remains an open issue. Multiple factors come into play here. It has been proposed that task-irrelevant stimuli have reduced expected precision (Kok, Rahnev, Jehee, Lau, & de Lange, 2012). By contrast, we are thought to have a prior expectation (innate or acquired) that strong stimuli have a high signal-to-noise ratio and are thus more precise (Feldman & Friston, 2010). Arguably, therefore, the theory cannot readily answer the critical question whether prediction of the distractor would make it more or less interfering. What the theory would predict is that both cue- and action prediction should influence processing in a very similar manner, because both sources of prediction have the same accuracy, namely 100%, and also no variability in prediction errors—that is, they have the same precision. However, the theory also allows for a potential additional effect of action-specific prediction: The principle of ‘active inference’ posits that we need to decrease the precision of proprioceptive and somatosensory states to make a movement possible (Brown et al., 2013), though it remains unclear whether, how, and to what extent this might also concern visual processing.
Note though that our results are merely consistent with ‘predictive processing’, and it could be objected that this framework can accommodate all manners of possible result patterns. Despite the promises of this framework, we see it as still young and not yet sufficiently developed—especially with regard to explaining attentional phenomena (Ransom, Fazelpour, & Mole, 2017). Better, and ideally computationally explicit, models are thus required to derive more concrete testable predictions. For instance, Kok, Rahnev, et al. (2012) proposed a model of how attention interacts with prediction in a Posner-type cueing scenario—though their model essentially equates attention with task relevance, as they consider only prediction of task-relevant information. Our data on the interaction of attention and prediction of task-irrelevant stimuli might thus be useful for testing future, more complete models.