Reducing failures of working memory with performance feedback
Fluctuations in attentional control can lead to failures of working memory (WM), in which the subject is no better than chance at reporting items from a recent display. In three experiments, we used a whole-report measure of visual WM to examine the impact of feedback on the rate of failures. In each experiment, subjects remembered an array of colored items across a blank delay, and then reported the identity of items using a whole-report procedure. In Experiment 1, we gave subjects simple feedback about the number of items they correctly identified at the end of each trial. In Experiment 2, we gave subjects additional information about the cumulative number of items correctly identified within each block. Finally, in Experiment 3, we gave subjects weighted feedback in which poor trials resulted in lost points and consistent successful performance received “streak” points. Surprisingly, simple feedback (Exp. 1) was ineffective at improving average performance or decreasing the rate of poor-performance trials. Simple cumulative feedback (Exp. 2) modestly decreased poor-performance trials (by 7 %). Weighted feedback produced the greatest benefits, decreasing the frequency of poor-performance trials by 28 % relative to baseline performance. This set of results demonstrates the usefulness of whole-report WM measures for investigating the effects of feedback on WM performance. Further, we showed that only a feedback structure that specifically discouraged lapses using negative feedback led to large reductions in WM failures.
KeywordsVisual working memory Cognitive and attentional control Feedback
Failures of attention are frequent and have unintended consequences ranging in severity from variable reaction times on simple laboratory tasks to fatal car accidents in the real world (Reason, 2003; Robertson, Manly, Andrade, Baddeley, & Yiend, 1997). Given that ongoing attentional fluctuations lead to deficits in simple reaction time measures, the effect of impaired attention on demanding tasks can be even more exaggerated. Consistent with this idea, previous research has found a strong relationship between working memory (WM) capacity and propensity toward periods of mind wandering and failed executive attention (McVay & Kane, 2012). Here, we investigated the potential for feedback about task performance to reduce attentional failures during a difficult visual WM task.
Performance feedback might improve WM performance for a variety of reasons. First, subjects are relatively unaware of periods of inattention to the task at hand (Reichle, Reineberg, & Schooler, 2010; Schooler et al., 2011), but bringing subjects’ attentional state into awareness allows them to re-engage (deBettencourt, Cohen, Lee, Norman, & Turk-Browne, 2015). Performance feedback should alert subjects that their current attentional state is not sufficient to perform well. In addition to alerting subjects to failures, feedback may also improve performance by increasing subjects’ baseline motivation and arousal levels. Cognitive feedback can act as an extrinsic reward (Aron, 2004), and game-like visual feedback can increase subjects’ intrinsic motivation (Miranda & Palmer, 2013). If subjects are relatively unmotivated in typical laboratory settings, they may underperform their true ability; providing feedback could increase task engagement and overall performance levels.
To provide informative feedback to subjects, we must first have a reliable indicator of trial-by-trial fluctuations in performance. Performance fluctuations during simple attention tasks have been extensively studied (Cohen & Maunsell, 2011; Esterman, Noonan, Rosenberg, & DeGutis, 2013; Manly, 1999; Smallwood, Riby, Heim, & Davies, 2006; Unsworth & McMillan, 2014; Weissman, Roberts, Visscher, & Woldorff, 2006), but there are few observations of performance fluctuations during complex WM tasks (Adam, Mance, Fukuda, & Vogel, 2015). Trial-by-trial performance fluctuations are difficult to measure in WM tasks because of the partial-report nature of common WM measures. For example, in a typical change-detection task, subjects are asked to remember a large number of items (e.g., six to eight) and are tested randomly on one of the items. However, capacity is extremely limited, so subjects will remember only three to four items on average. As such, even if a subject was performing quite well (e.g., three items out of six) on 100 % of trials, they would receive feedback that they were incorrect on 50 % of trials. Such unreliable feedback is unlikely to be informative to the subject.
Unlike change-detection tasks, recall tasks allow for trial-by-trial feedback about the number of correctly recalled items. We took advantage of a whole-report visual WM task to test the effects of feedback. In this task, subjects report the identity of all items in the array. Because all items are tested, performance can be calculated for every trial. Additionally, by holding set-size constant across all trials, fluctuations in memory performance can be observed without the confounding factor of intermixed difficulty from multiple set-sizes. Using this task, Adam et al. (2015) found that performance in the whole-report task was highly predictive of typical partial-report capacity measures, and that performance fluctuated strongly from trial to trial. Importantly, these results revealed that nearly all subjects have substantial numbers of WM failure trials, in which they perform no better than chance for the set of six items.
Here, we designed a series of experiments to provide different amounts of information to subjects about their trial-by-trial performance. In Experiment 1, we provided a simple form of feedback: the number correct for the trial. Simple feedback of this type is commonly used by researchers with the rationale that the feedback will increase motivation and task compliance. However, the effectiveness of such feedback is often not quantified. In Experiment 2, we added a reminder of ongoing performance by tallying the number of items correct for all trials within a block. After completing the first block of trials, subjects tried to beat their personal “high score” on subsequent blocks. We hypothesized that giving subjects a long-term goal of improving their high score would further boost performance. Finally, in Experiment 3, we used a weighted points system instead of simple number of items correct on the trial. With this weighted points system, subjects lost points if they performed poorly and gained points if they performed consistently well. Note, however, that the subjects understood that these points were arbitrary and were not associated with any financial payout or other outcome. Unlike the other two feedback conditions, the weighted points system was designed to reinforce a particular strategy; to perform optimally, subjects needed to minimize the number of failure trials. In the other two feedback conditions, subjects could have instead attempted to maximize the number of stored items on good trials without necessarily reducing the number of poor trials. We predicted that specifically encouraging subjects to reduce the number of failure trials would maximally boost performance.
Materials and methods
All participants gave written informed consent according to procedures approved by the University of Oregon institutional review board. Participants were compensated for participation with course credit or payment (US $10/h). Forty-five subjects (21 male) participated in Experiment 1, 44 (22 male) in Experiment 2, and 56 (23 male) in Experiment 3. Subjects were excluded from analyses for non-compliance with task instructions (one in Exp. 1, one in Exp. 2 and three in Exp. 3) or for leaving the experiment early (one in Exp. 1, and one in Exp. 3).
Stimuli were generated in MATLAB (The MathWorks, Natick, MA) using the Psychophysics toolbox (Brainard, 1997). Participants were seated approximately 60 cm from an 18-in CRT monitor; distances are approximate as subjects were not head-restrained. Stimuli were presented on a gray background (RGB values: 127.5 127.5 127.5), and subjects fixated a small dot (0.25° visual angle). In all experiments, colored squares (2.5°) served as memoranda. Each square could appear in one of nine colors, sampled without replacement (RGB values: Red = 255 0 0; Green = 0 255 0; Blue = 0 0 255; Yellow = 255 255 0; Magenta = 255 0 255; Cyan = 0 255 255; Orange = 255 128 0; White = 255 255 255; Black = 1 1 1) within an area extending 12.8° horizontally and 9.6° vertically from fixation. At response, a 3 × 3 grid of all nine colors appeared at the location of each remembered item. After response, a feedback screen displayed information about task performance in size 24 Arial font.
In the “no feedback” condition, subjects saw a blank gray screen after responding to all items, then clicked the mouse to initiate the next trial. In the “feedback” condition, subjects saw a screen with text-based feedback about their performance. After viewing the feedback, subjects clicked the mouse to initiate the next trial. The main difference between the experiments was the content displayed on the feedback screen; an example of a typical feedback screen for each experiment is shown in Fig. 1b.
Points assigned for trial outcomes (number of items correct)
Experiment 1: Simple feedback
Experiment 2: Cumulative simple feedback
Subjects reported an average number of 2.87 (SD = .51) items correct in the no-feedback condition, and 3.02 (SD = .53) items in the feedback condition, and subjects performed significantly better in the feedback condition, t(42) = 4.27, P < .001, 95 % CI [.08, .22]. The proportion of good-performance trials was 27.90 % (SD = 16.10 %) in the no-feedback condition, and 32.57 % (SD = 16.72 %) in the feedback condition. This difference was significant, indicating that subjects had slightly more good-performance trials in the feedback condition, t(42) = 3.62, P < .001, 95 % CI [2.07, 7.28]. Similarly, there was a small reduction in the proportion of poor-performance trials between the no-feedback (35.69 %, SD = 15.44 %) and feedback (31.72 %, SD = 13.89 %) conditions, t(42) = –2.94, P = .005, 95 % CI [–6.7, –1.2]. Performance distributions are shown in Fig. 2. The average percent change in good-performance trials was +30.57 % (SD = 56.27 %), and the percent change in poor-performance trials was –7.23 % (SD = 29.07 %; Fig. 3).
Experiment 3: Cumulative weighted feedback
Subjects reported an average number of 2.95 (SD = .47) items correct in the no-feedback condition and 3.26 (SD = .49) items in the feedback condition, and subjects performed significantly better in the feedback condition, t(51) = 6.70, P < .001, 95 % CI [.21, .40]. The average proportion of high performance trials was 31.47 % (SD = 16.85 %) in the no-feedback condition, and 40.22 % (SD = 18.98 %) in the feedback condition. This difference was significant, indicating that subjects had more good-performance trials in the feedback condition, t(51) = 5.92, P < .001, 95 % CI [5.78 11.72]. Similarly, there was a large reduction in the proportion of poor-performance trials between the no-feedback (32.85 %, SD = 15.11 %) and feedback (22.26 %, SD = 13.37 %) conditions, t(51) = –7.27, P < .001, 95 % CI [–13.52, –7.67]. Change scores relative to baseline rates are shown in Fig. 3 (full range in Fig. S1). The average percent change in good-performance trials was +37.56 % (SD = 40.9 %), and the percent change in poor-performance trials was –27.91 % (SD = 39.1 %).
Between experiments analyses
To compare the change in performance across experiments, we calculated change scores (feedback–no feedback) for each subject and ran a one-way ANOVA using Experiment as a between-subjects factor. First, we looked at the change in mean number of items correct and found a significant effect of Experiment, F(2, 135) = 11.02, P < .001, η2p = .14. Post hoc tests (Tukey’s HSD) revealed that the change in mean performance was larger for Experiment 3 than for either Experiment 1 (P < .001) or Experiment 2 (P = .014). However, the change in mean performance between feedback conditions was equivalent for Experiments 1 and 2 (P = .211).
Similarly, we examined the change in the proportion of good- and poor-performance trials (calculated as proportion in feedback condition minus the proportion in the no-feedback condition). We found a significant effect of Experiment on the proportion of good-performance trials, F(2,135) = 5.77, P = .004, η2p = .08. Tukey’s HSD tests revealed a significant difference between Experiment 1 and Experiment 3 (P = .003) but no significant difference between Experiments 1 and 2 (P = .54) or between Experiments 2 and 3 (P = .08). Finally, we found a significant effect of Experiment on the proportion of poor-performance trials, F(2, 135) = 11.45, P < .001, η2p = .15. Post hoc tests (Tukey’s HSD) revealed that the feedback in Experiment 3 led to a greater reduction in poor-performance trials than either Experiment 1 (P < .001) or Experiment 2 (P = .002). However, there was no difference between Experiments 1 and 2 (P = .56).
We have demonstrated that a behavioral feedback manipulation can lead to global improvement in WM performance for supra-capacity arrays. Critically, however, not all forms of feedback led to the same level of improvement. These results are an important reminder to test the effects of feedback manipulations. While feedback is often assumed by researchers to always be beneficial, feedback interventions can sometimes lead to no improvement, or even to a decline in performance (Kluger & DeNisi, 1996). Moreover, the present results also suggest that estimates of maximum performance may be slightly underestimated under baseline motivational levels.
Somewhat surprisingly, providing subjects simple feedback about performance (Experiment 1) did not improve average performance, though this effect approached conventional significance (P = .06). The Experiment 1 feedback manipulation was similar to the feedback condition in Heitz et al. (2008), who similarly found only a small effect of feedback on reading-span performance. In the present dataset, the marginal difference in performance was driven exclusively by an increased proportion of good-performance trials. Similarly, the increase in good-performance trials in Experiment 2 was much larger than the reduction in poor-performance trials. This asymmetry suggests that subjects attempted a sub-optimal strategy of maximizing the number of items held in mind on good trials without attempting to reduce the frequency of failure trials. Overall, the persistence of lapse trials in Experiments 1 and 2 suggests that lapses of attention remain frequent and persistent in some cases where subjects are explicitly made aware of poor performance.
The weighted feedback manipulation in Experiment 3 was the most effective, reducing poor-performance trials by 27 %. Why was this manipulation so much more effective than others? Streak bonuses and the conjunction of positive and negative feedback were unique design features in Experiment 3, and they could both affect performance dramatically. Miranda and Palmer (2013) found that a visual feedback system with streaks and negative feedback increased subjects’ subjective ratings of intrinsic motivation during a visual search task. Increased intrinsic motivation as a mechanism of improvement dovetails nicely with previous findings that feedback serves as an extrinsic reward (Aron et al., 2004). The conjunction of positive and negative feedback, in particular, may increase the effectiveness of feedback manipulations by engaging both pathways of the dopaminergic reward system (Frank, Seeberger, & O'Reilly, 2004). However, given the present data, we cannot say whether the addition of streak bonuses (emphasizing positive feedback) or negative feedback (punishing lapses) was most critical for performance improvement in Experiment 3. These two types of feedback are intertwined in the current design; successful streaks are perfectly anti-predictive of negative feedback. Future experiments are needed to disentangle the relative impact of each.
We would also like to emphasize the potential importance of providing an attainable performance goal in Experiment 3. Our prior work revealed that nearly all subjects are capable of accurately reporting at least three items (Adam et al., 2015). By setting a performance goal of three items, we encouraged subjects to perform consistently over a series of trials, rather than to maximize the number of items stored on individual trials. Indeed, an inappropriate performance goal could undermine the motivational benefits of feedback. If the goal was too easy (one item correct), then subjects would have incentive to underachieve their capacity. Alternatively, if the goal was too hard (six items correct), then subjects may become frustrated and similarly underperform.
Finally, our results raise some interesting questions that could be addressed by future studies. First, the observed feedback benefit dissipates shortly after the feedback is taken away (Fig. S2). However, more extensive training with feedback may help subjects learn to implement a lapse-reduction strategy without ongoing feedback. Given that behavioral feedback is relatively unobtrusive and inexpensive, there is potential for such interventions in real-world settings. Second, it will be important to disentangle the relative contributions of positive feedback, negative feedback, and performance goals to WM improvement. Finally, markers such as pupil dilation (Unsworth & Robison, 2015), and frontal theta power in EEG (Adam et al., 2015) may identify mechanisms underlying the reduction of lapses, including changes in arousal and consistency in the deployment of controlled attention.
This research was supported by National Institutes of Health (NIH) grant 2R01 MH087214-06A1 and Office of Naval Research grant N00014-12-1-0972. Datasets for all experiments are available online on Open Science Framework at https://osf.io/nu8jd/.
K.A. and E.V. designed the experiments and wrote the manuscript. K.A. collected the data and performed analyses.
Compliance with ethical standards
Conflicts of interest