Advertisement

Journal of Cognitive Enhancement

, Volume 1, Issue 4, pp 419–433 | Cite as

Training Change Detection Leads to Substantial Task-Specific Improvement

  • Martin Buschkuehl
  • Susanne M. Jaeggi
  • Shane T. Mueller
  • Priti Shah
  • John Jonides
Original Article

Abstract

Previous research has demonstrated that adaptive training of working memory can substantially increase performance on the trained task. Such training effects have been reported for performance on simple span tasks, complex span tasks, and n-back tasks. Another task that has become a popular vehicle for studying working memory is the change-detection paradigm. In a typical change-detection trial, one has to determine whether a set of stimuli is identical to a set that was presented just previously. Here, we developed an adaptive training regimen comprised of increasingly difficult change-detection trials to assess the degree to which individuals’ change-detection performance can be improved with practice. In contrast to previous work, our results demonstrate that participants are able to dramatically improve their performance in change detection over the course of 10 training sessions. We attribute this improvement to the current training method that adaptively adjusted the set size of the change-detection task to the proficiency of the trainee. Despite these considerable training effects, an exploratory investigation revealed that these improvements remained highly task specific and may not generalize to untrained tasks.

Keywords

Working memory Visual array comparison Practice 

Introduction

In the past decade, training of working memory (WM) has gained much interest in the research community. The goal of WM training is to improve specific underlying cognitive processes that are shared across many other non-trained tasks. With the improvement of such basic processes, the assumption is that there may be improvement in tasks that were not part of the training but that might depend on WM capacity. In the current WM training literature, researchers have used many different training tasks, but there are three main categories into which most of these tasks fall. These categories represent classical tasks that are not only used for training but are also widely used for assessing WM skills: simple-span tasks (e.g., Klingberg et al. 2005), complex-span tasks (e.g., Chein and Morrison 2010), and n-back tasks (e.g., Jaeggi et al. 2008). Another category of WM task that has been used in research for several decades is the visual change-detection paradigm1 (e.g., Phillips 1974). In this paradigm, the participant is briefly presented with an array of stimuli; the number of presented items is often referred to as its set size. After a short delay, a second array of stimuli is presented after which the participant is required to decide whether the two stimulus presentations differ from each other (Fig. 1). Although this paradigm is popular in the WM literature, it has rarely been used for training purposes. The reason for this might be early unsuccessful attempts to find training effects on similar tasks (Sperling 1960; Whipple 1910). In addition, the claim has been made that visual WM capacity may be particularly impervious to training (Olson et al. 2005), and a recent study even found that individual differences across participants remained stable over 30 days of practice on set sizes four, six, and eight (Xu et al. 2017). Furthermore, others have argued that change detection is a fairly stable trait (Rouder et al. 2008; Zhang and Luck 2011) and resistant to motivational incentives (Zhang and Luck 2011). It may be that this imperviousness to training is due to the fact that change detection, unlike other WM tasks used for training that incorporate sequential processing and/or combine processing and storage components, targets the ability to store multiple items simultaneously. Despite this rather pessimistic outlook on the trainability of change detection, improving change-detection performance could be desirable because it has repeatedly been demonstrated that WM capacity as measured with a change-detection paradigm correlates well with measures of intelligence (e.g., Cowan et al. 2005; Cowan et al. 2006; Johnson et al. 2013). More specifically, it has been shown that it is the number of representations that can be held simultaneously in WM that mediates the relationship between change-detection performance and intelligence (Fukuda et al. 2010). Based on these results, the question arises whether increasing the number of such representations would lead to changes in untrained but correlated tasks.
Fig. 1

Illustration of the trials operationalized in the two training tasks, the two change-detection criterion tasks, and the color resolution task. In the current change-detection paradigm, participants are briefly (250 ms) presented with a set of stimuli (i.e., set size four). Following a blank screen of 1000 ms, the same stimuli are presented again but now one of the stimuli is highlighted, for example, by a circle around the square. The participant is instructed to decide whether the encircled stimulus has changed its color relative to the initial display. a Training task of training group 1. b Training task of training group 2. c Simple change-detection task. d Complex change-detection task. e Color resolution task

Although there are several studies indicating that training on a WM task (in this case, n-back) can improve change-detection performance (Kundu et al. 2013; Owens et al. 2013; Schwarb et al. 2016), direct training on change detection has been more challenging. One set of such studies specifically aimed at improving change-detection skills has focused on whether participants are able to improve their performance if the same array of stimuli is presented repeatedly (Olson and Jiang 2004; Olson et al. 2005). The main idea tested in these studies was whether a repeated array presentation would establish a long-term memory trace that would help improve task performance. Indeed, Olson et al. found improved performance for repeated arrays with the same repeated changes. Whereas the participants in these experiments trained for only one session, Eng et al. (Eng et al. 2005) trained participants for 10 sessions but nevertheless, participants’ WM capacity did not increase. Similarly, Zimmer et al. (Zimmer et al. 2012) trained participants for 12 sessions on change detection using unfamiliar Chinese characters. Performance increased selectively for trained Chinese characters but not for untrained ones. Zimmer et al. concluded that their training regimen demonstrated only familiarity effects but did not improve processing skills. In another, more recent study, Kuo et al. (2014) trained five participants for a total of 24 h over the course of 12 weeks. Participants had to decide whether the probe differed in color, position, or shape from the initial display, and they trained with two, four, six, and eight items in the presented arrays. Significant accuracy improvements were reported, but unfortunately, the sample size was quite small which limits the generalizability of the reported results. Furthermore, due to restriction of the study design, it is not clear whether participants would have been able to improve beyond set size eight.

We note one key difference between the visual change-detection training paradigms used in the studies described above and other WM training studies that typically show substantial task-specific improvements (Au et al. 2016), namely, adaptivity. Specifically, it has been argued that an important feature of successful training paradigms is that they increase task difficulty as participants improve on the trained task (e.g., Jaeggi et al. 2014). Non-adaptive training regimens may be less effective than adaptive-difficulty training because adaptive-difficulty methods are constantly presenting a challenge that is difficult but still attainable and thus represents a “desirable difficulty” (Bjork and Bjork 2014). Indeed, Gaspar et al. (2013) used an adaptive training procedure to improve change-detection performance in young and older adults. Their participants trained for 16 one-hour sessions distributed across 8–9 weeks. Participants trained with either set size three or five. The adaptivity algorithm adjusted the presentation time of the initial display according to the proficiency of the trainee. The authors found significant improvements in the trained tasks but no transfer to two untrained change-detection tasks. A closer inspection of the training data revealed that the improvement occurred in the first three training sessions, and from then on, performance remained fairly constant for the remaining 13 training sessions. Thus, the authors concluded that adaptively changing the presentation time of the initial display might not be the ideal variable to adjust.

By selecting the presentation time of the initial display as the adaptivity variable, Gaspar et al. (2013) put the emphasis of their training approach on improving processing speed. In contrast, in the present study, we focused more on the number of representations that can be held simultaneously in WM because it has been shown that this is the variable that mediates the relationship between change-detection skills and measures of intelligence (Fukuda et al. 2010). We assumed that because of this mediation, adaptive training on different set sizes has a higher chance of resulting in more generalized improvement that would not only result in better learning but also increase chances for improvement in untrained tasks. Accordingly, we implemented an adaptive procedure that adjusted the set size (i.e., the number of stimuli) to the proficiency of the trainee every 20 trials. Although the central and foremost focus of our experiment was the training improvement, we also explored whether participants would be able to increase their WM capacity beyond the three to four items that participants can typically hold in memory (Cowan 2001), and whether the intervention might also lead to generalized improvements beyond the trained task. Consequently, we tested participants on several outcome measures before and after the intervention. First, we used two variants of the change-detection paradigm on which the participants trained (but with different types of stimuli) in order to test whether any training-specific effects would generalize to other variants of change-detection. Further, the participants were tested on a color-resolution task (Sungur and Boduroglu 2012; Zhang and Luck 2008) and a spatial-resolution task (Boduroglu et al. submitted) in order to investigate whether the training would have an impact on the amount of detailed color and/or location information the participants could encode. It is possible that successfully increasing the number of to-be-remembered objects is dependent on coding information with less resolution; by contrast, it is also possible that increased capacity corresponds to greater resolution. Finally, to assess potential generalizing effects to WM capacity more broadly, we used a set of tasks that shared fewer features with the training tasks. They comprised a computerized block-tapping task (c.f., Schellig 1997), a computerized version of the symmetry-span task (Redick et al. 2012), as well as an n-back task in which we systematically varied the number of lures2 (c.f., Au et al. 2016). We varied the number of lures in order to investigate whether the intervention had an impact on the attentional control of interference because a lure trial elicits more interference than a non-lure trial. The training task and all outcome measures were implemented in the visual domain because we assumed that this overlap in modalities would more likely result in generalizing effects.

In order to observe whether different training task parameters have an impact on the training progress, the participants were trained on one of two tasks, one of which we assumed would make it easier for the participants to improve (training group 1 (TG1)) and the other of which we assumed would make it more difficult to improve (training group 2 (TG2)). The features of TG1 that presumably facilitated training progress included a feedback screen after each trial and no masking between the initial presentation of the stimuli and the test display. Feedback generally seems to improve learning (Shute 2008), and in the current context, we thought that it would be especially helpful for trainees to more easily detect inappropriate task strategies. The absence of a mask would further allow the participants to make use of iconic memory which we thought would facilitate training progress (Sligte et al. 2008). In contrast, the TG2 condition did not provide any feedback and included a patterned mask (Fig. 1). We hypothesized that the feedback and the nonexistence of a mask in TG1 would lead to higher initial performance and a faster learning curve compared to TG2. Finally, in the change-detection literature, there are often two options reported in which the test display is presented: either as a single-item probe or as a whole-array probe without cuing a particular item. The whole-array probe usually results in worse performance, especially when an item is cued, presumably due to an increase in interference from distraction and the requirement for multiple decisions (e.g., Wheeler and Treisman 2002). Despite the fact that whole-array test displays result in worse performance when used in non-training tasks, we hypothesized that the configurational information might provide an affordance for the participants to increase their performance, for example, because continued training leads to better coping with the interference and distraction that attend whole-array test displays, which in turn allows to benefit from the additional information conveyed through the configurational information. Accordingly, we sought to investigate the impact of both test-display variants on training performance and therefore TG1 trained with a whole-array probe and TG2 with a single-item probe.

Method

Participants

A total of 45 participants (32 women), from two university communities, were recruited for this study and were randomly assigned to one of two interventions. The average age of the sample was 21.1 years (SD = 3.7). A compensation of $10 per hour was paid for participation. Four participants withdrew from the study following the pre-test session; five participants were excluded from the analyses due to irregularities in their training schedules (i.e., a gap of five or more days between two continuous training sessions), leaving 19 participants in TG1 and 17 participants in TG2.

Training Tasks

Training Group 1 (TG1)

The participants were asked to train once per day for 10 sessions over the course of 2 weeks. Each training session consisted of 15 blocks, and each block consisted of 20 trials. One training session lasted approximately 25 min. The participants started with a set size of two in their first training session. After each block, performance was evaluated and if accuracy was higher than 85%, the set size in the next block was increased by one. It was decreased by one if accuracy was below 70%. In all other cases, the set size remained unchanged. The starting set size of subsequent training sessions was determined by the set size of the last block in the previous training session minus two in order to provide a warm-up period. The theoretical maximum set size the software could handle was 20, but this limit was not reached by any of the participants. In order to characterize training performance in a session, the set sizes over all 15 blocks were averaged and used as the dependent variable.

A trial started with a fixation cross presented in the center of the screen for 1000 ms. The initial screen with the array of squares was presented on a dark gray background for 250 ms, followed by a 1000 ms blank screen, followed by the test display containing the initial array. In the test display, one of the squares was encircled and the participants were instructed to press “A” if the square was of the same color as in the initial screen and “L” if it differed. The encircled square was presented either in the same color as shown initially, or in one of the other six colors used for the stimuli. The test display remained on the screen until a key press had been made. At the end of each trial, feedback was provided indicating whether the given response was correct or not. Assuming a viewing distance of 50 cm on average, the array of squares was presented within a horizontal viewing angle of approximately 11° and a vertical viewing angle of approximately 8.25°. The squares appeared at randomly determined positions in every trial with the restriction that their center-to-center distance was at least 2.25° of visual angle and that they could not appear within about 2.25° of the center of the screen. Each square was approximately 0.8° × 0.8° in visual angle. The colors used for the squares were black, blue, green, purple, red, white, and yellow.3 The same color appeared up to three times in the same array. An example trial is illustrated in Fig. 1a.

Training Group 2 (TG2)

The training parameters were identical to the TG1 condition. Further, trial specifications in this task were similar to the ones used in TG1. The main differences were that (a) no feedback was provided, (b) there was a mask after the initial display preventing carry-over effects from iconic memory, and (c) only one of the squares was shown in the test display, thus removing potential context and configuration effects. After a trial had started with a fixation cross presented in the center of the screen for 1000 ms, an array of colored squares was presented on a dark gray background for 250 ms, followed by a 200 ms blank screen. Afterwards, a mask consisting of colored striped squares appeared in the same locations where the squares were initially shown. The stripe colors were drawn from the same colors as the stimuli: black, blue, green, purple, red, white, and yellow, and randomly determined for each square within a trial. The mask remained on the screen for 700 ms. Following a 100 ms blank screen, one of the squares shown in the initial array of squares was presented again either in the same color or in one of the other six colors used for the stimuli until a key was pressed. All other task details were identical to the training task of TG1. An example trial is illustrated in Fig. 1b.

Outcome Measures—Change-Detection Tasks

Simple Criterion Task (Simple Change-Detection Task)

A trial in this task was the same as in the TG2 condition with the exception that the stimuli appeared at the same fixed positions throughout the task. An example trial is illustrated in Fig. 1c. The participants were given task instructions through the computer program and went through ten practice trials to familiarize themselves with the task. During practice, arrays of two, four, or six squares (i.e., three set sizes) were presented. During practice, the squares appeared at random positions in each trial, and accuracy feedback was given at the end of each trial, but no feedback was presented during the test trials. Following practice, a total of 150 test trials were given for set sizes two, four, six, eight, and ten; 30 trials were used for each set size. On half of the 30 trials per set size, the probe differed in color from the initial display. Set sizes and trial type (color change vs no color change) were selected randomly by the program. The dependent variable of interest was Cowan’s k (Cowan et al. 2005), a measure of WM capacity.

Complex Criterion Task (Complex Change-Detection Task)

This task was very similar to the task used in TG1 and its general outline is illustrated in Fig. 1d. One of the differences between the two tasks was that instead of colored squares, random shapes were used. The shapes were identical to the ones used in Jaeggi et al. (2003) but presented in black color and in a smaller size; each shape was approximately 1° × 1° in visual angle. The trials were different from the TG1 task: the initial screen was presented for 500 ms followed by a 1000 ms blank screen. Next, the whole array from the first screen was shown again, with one of the shapes indicated with a black circle having a thickness of 1 pixel and a radius of approximately 0.75° of visual angle. The participants had to decide whether the encircled shape was the same as the one that had been originally presented in this location. The next trial started as soon as the participant pressed another key. The stimuli appeared at randomly determined positions on every trial. Other task details and the dependent measure of this task were identical to the ones of the simple criterion task.

Color Resolution Task

We included this task in order to assess whether the participants would be better able to recall the exact color of the squares as used in the training tasks. The task is illustrated in Fig. 1e. For this purpose, a set of three squares appeared for 500 ms at three of eight predefined locations, arranged on an imaginary circle in the center of the screen. After a blank screen of 900 ms, a placeholder in the form of a black frame appeared where the colored squares were located just before. One of the black frames was a bit thicker, and the participants were instructed to select the color that matched the color of the square that appeared in this location just before. The color could be selected with the mouse pointer from a color wheel that surrounded the squares. The colors of the three squares appearing together were always different, contained a color that was also represented on the color wheel, and each color pair differed by at least 20° on the color wheel. The wheel remained on the screen until the participant selected a color. Finally, the participants had to indicate whether they guessed the color or whether they were able to remember it. The task consisted of 115 trials and after every 19 trials, the participants were allowed to take a short break. The locations where the colored squares appeared, as well as their color, were randomly determined with the abovementioned constraints, and the stimuli were identical and presented in the same order for each participant.

To determine the dependent variable, we used a mixture model to categorize each response into one of two groups (Boduroglu et al. submitted; Zhang and Luck 2008). The response distribution of each participant (i.e., the selected color expressed in degrees of visual angle) was fitted with a mixture of a von Mises distribution and a uniform distribution, in order to distinguish between memory precision and random guessing. The model classified each response as being either correct (the von Mises distribution) or an error (a random guess represented by a continuous uniform distribution across the 360-degree color wheel) using a maximum likelihood criterion. Estimates were made using the expectation-maximization (E-M) optimization algorithm (Agostinelli and Lund 2013; Dempster et al. 1977; R Core Team 2013) to produce maximum likelihood estimates of the κ parameter and to compute the density of the von Mises distribution. A different model was estimated for both the pre- and post-session of each participant. The variable of interest was the precision parameter κ of the von Mises distribution. Because κ is inversely related to the spread of the distribution and in order to enable easier comparison to the spatial resolution task, we used 1/κ as the dependent measure.

Spatial Resolution Task

Similar to the color resolution task, with this task, the goal was to assess whether the participants would improve their ability to remember the exact location where the squares appeared on the screen. For this purpose, on each trial, the participants were presented with three squares, one in the color red, one in the color blue, and one in the color green. The squares appeared within an imaginary circle and were presented for 500 ms followed by a 900 ms blank screen. One of the colored squares was then presented in the center of the screen and the task was to position the square with the computer mouse to the location where the square in this color appeared in the initial display. Following each response, the participants indicated whether they guessed the location or whether they were able to remember it. The task consisted of 115 trials, and after every 19 trials, the participants were allowed to take a short break. The locations where the colored squares appeared were randomly determined with the abovementioned constraints, and the stimuli were identical and presented in the same order for each participant.

We used a mixture modeling approach to determine the dependent variable. For that purpose, we subtracted the coordinates of the correct answer on each trial from the coordinates of the given answer on each trial. The deviations measured in number of pixels (x, y pairs) were then fitted with a mixture of a bivariate normal distribution and a uniform distribution. Each response was classified to be either correct (the normal distribution) or incorrect (i.e., the uniform guessing distribution) using a maximum likelihood criterion as part of the E-M algorithm. The bivariate normal distribution used three parameters, an x-offset, a y-offset, and a standard deviation that was equal along the x and y dimensions. The uniform distribution had a single parameter representing the entire field of possible guesses, which we identified as a square having a side length of 200 pixels, and it was not directly estimated from the data. Estimating the size of the uniform distribution directly from the error data is theoretically possible, but the estimates are highly unstable because of the small number of guessing trials typically produced. The dependent variables of interest were the standard deviation parameter from the bivariate normal distribution, and the number of trials categorized as remember versus guess.

Outcome Measures—Working Memory Tasks

Block-Tapping Task

In this computerized task, nine white squares, arranged in a similar configuration as the cubes in the block-tapping test (Schellig 1997) were presented on a black background on a computer screen. The participants were required to reproduce a sequence of positions in the given order. At the start of a trial, one of the white squares changed its color from white to red and remained red for 1000 ms before it changed its color back to white. After 500 ms, the next square changed its color and so on. Following the presentation of the sequence, the participants had to reproduce it by clicking on the appropriate white squares using the computer mouse. The participants received feedback about whether the reproduced sequence was correct or not. Sequence lengths (i.e., set sizes) varied from three to nine. There were three trials per set size. The task was terminated by the program after three consecutive errors. There was a forward condition and a backward condition. In the former, the participants had to reproduce the sequence in the presented order; in the latter, the participants had to reproduce the sequence in reverse order. Different sequences were shown in forward and backward conditions and in the pre- and the post-test sessions. The dependent variable in both conditions was the highest set size for which at least two trials were answered correctly. These two values were then averaged, and this single performance indicator was used to evaluate the intervention effect.

Symmetry Span Task

We used a computerized version of the symmetry span task as a complex measure of WM capacity (Redick et al. 2012). The task required the participants to recall a sequence of spatial locations in the correct order in addition to completing a distracting processing task that required deciding whether an image is symmetrical or not (cf., Conway et al. 2005). We presented three trials for each set size (i.e., the number of stimuli to be recalled) from two to five. Different task materials were used in the pre- and post-test sessions. The sum of correctly recalled sets served as the dependent measure.

N-Back Task

The participants were presented with a series of stimuli in the center of the screen, one at a time. The stimuli consisted of colored circles presented in one of the following colors: black, blue, brown, green, magenta, orange, red, and yellow. The participants had to respond with the left hand whenever the current stimulus was the same as the one presented n items back in the series; otherwise, they had to respond with their right hand. The stimulus remained on the screen for 500 ms followed by a 2500 ms inter-stimulus interval. The participants were asked to perform nine blocks of two-back, followed by nine blocks of three-back. In the two-back condition, a block consisted of 18 trials; and in the three-back condition, a block consisted of 19 trials. In these nine blocks, we systematically varied the number of lures. A lure was defined to be either an n − 1 lure or an n + 1 lure, i.e., in the two-back condition, both a one-back match and a three-back match were considered to be lures. Across the nine blocks, three blocks contained no lures at all, three contained one n − 1 lure and one n + 1 lure, and three blocks contained three n − 1 lures and three n + 1 lures. The three block types were randomly presented in each n-back load condition. Prior to both n-back load conditions, the participants received computerized instructions and practice on the corresponding n-back level. The dependent variable was the proportion of hits minus the proportion of false alarms for each n-back load level as a function of the three types of lure blocks. Additionally, we created an overall index per load level by averaging the results from the three different lure blocks.

Procedure

At pre-test, the participants were tested on the criterion tasks and a set of non-trained measures and were then randomly assigned to either TG1 or TG2. The first training session was completed in the laboratory in order to give the participants the opportunity to ask questions about the training task and procedure if they had any. The training program was installed on the personal laptops of the participants4, and they trained on those laptops for the remainder of the study. In order to check for compliance, the participants were required to send the training data that were generated after each training session via email to our laboratory. The participants were required to complete ten training sessions (no more than one per day) within 14 days. Following the training period, the participants were then tested again in the laboratory on all measures administered at pre-test in order to evaluate the impact of the intervention on untrained tasks.

Analysis

Analyses were carried out with R (R Core Team 2013) and JASP (JASP Team 2017). In order to characterize how the average performance (set size) changed across training sessions, we performed a cursory regression analysis with linear and logarithmic transformations of session number. The parameters of these models will serve as indicators to determine whether the training progress was negatively accelerated or not. We also analyzed accuracy and reaction times as a function of training session to further investigate participants’ improvement during training. Further, we analyzed accuracies and reaction times as a function of session and whether the changed color of the probe was a color that was already presented in the initial array or if it was a new color. Potential effects on the non-trained outcome measures were analyzed using paired t tests comparing pre-test values with post-test values. In order to further explore the evidence in favor of the alternative hypothesis, we also calculated Bayes factors (BF10) using default priors for each paired comparison. Finally, in order to test for group differences in each task, we also calculated a mixed-effects analysis of covariance (ANCOVA) with the factors time (pre vs post) and group (TG1 vs TG2) using the pre-test performance as a covariate and post-test performance as dependent variable, and we calculated Bayes factors using default priors to further probe any effects. Note that in the absence of an additional control group, this design and analysis strategy does not allow us to cleanly separate test-retest effects from the effects of the intervention. However, it informs us whether there is potential for generalizing effects and whether there might be any differential effects as a function of the specific intervention approach. If there are no improvements in non-trained tasks, any further investigations looking into transfer will be unnecessary, at least with the current set of outcome measures.

Results

Training Tasks

Figure 2 shows the training functions for both training groups. TG1 (red line) started at an average set size of 6.31 (SE = 0.26) and finished training on an average set size of 8.82 (SE = 0.47), showing a 40% performance increase. TG2 (black line) started at an average set size of 5.02 (SE = 0.21) and finished at an average set size of 7.56 (SE = 0.54) which corresponds to a performance gain of 51%. A cursory curve fitting analysis was performed comparing a negatively accelerated model (a logarithmic model) with a linear model that does not theoretically asymptote to a ceiling level. The functional form of this learning for TG1 was best fit by a logarithmic transform of session number, indicating that the gains diminished over time (and asymptoted after approximately seven training sessions (linear model: R 2 = 0.56; model with logarithmic transform of session number: R 2 = 0.82). The same model provided the best fit for TG2 (model with logarithmic transform of session number: R 2 = 0.96); however, the fit of the linear model was also respectable (linear model: R 2 = 0.86). A repeated measures ANOVA with the factors group (TG1 vs TG2) and session (1 through 10) revealed a significant main effect for group (F(1, 33) = 11.64, p < 0.01, η p 2 = 0.26) and session (F(4.27, 140.87) = 24.99, p < 0.01, η p 2 = 0.43); however, the group by session interaction was not significant (F(4.27, 140.87) = 1.52, p = 0.20, η p 2 = 0.04).
Fig. 2

Training performance of the two training groups. Solid lines represent average set sizes per session. Error bars represent standard errors of the mean. The dotted lines represent the models that best fit the training data. A data table with the average set sizes per group and session is provided below the horizontal axis

Next, we examined accuracies over the course of training as a function of training group and whether the probe changed or not. By inspecting Fig. 3, it becomes clear that accuracy stays constant over time which is due to the adaptive difficulty adjustment of the training; also note that the absolute difference in accuracy between trial types is larger for TG2 than TG1 (M TG1 = 0.06, SD = 0.05; M TG2 = 0.20, SD = 0.09; t(23.16) = 5.93, p < .001, d = 2.01).
Fig. 3

Accuracies averaged for each training session as a function of training group (training group 1 vs training group 2) and trial type (probe changed vs probe did not change). Error bars represent standard error of the mean

Next, we performed a similar analysis for average response times. For that purpose, we included only reaction times that were larger than 200 ms and smaller than 10,000 ms, which resulted in the exclusion of 0.001% of all reaction time data collected during training. As revealed in Fig. 4, the participants became numerically faster over time despite the fact that set sizes tended to increase over the same period; however, analyzing change and non-change trials combined, comparing the first to the last session, the effect was significant in TG1 (t(18) = 3.41, p < 0.01, d = 0.78) but not TG2 (t(16) = 1.39, p = 0.18, d = 0.34).
Fig. 4

Reaction times averaged for each training session as a function of training group (training group 1 vs training group 2) and trial type (probe changed vs probe did not change). Error bars represent standard error of the mean

Finally, in order to investigate whether the participants showed differential training effects in change trials, we examined only the trials in which the probe display was different from the initial display and parsed these trials into instances in which the probe had a new color that was not presented in the initial display vs instances in which the probe had a color that was already presented in the initial display. Specifically, this exploratory analysis sought to reveal whether a new probe color is or becomes more or less salient over time. It was conducted to gain more detailed insight in how the participants solved the task as a function of group assignment. The data are provided in Fig. 5 and a paired t test demonstrates that TG1 performed statistically the same on both trial types (M color presented before = 0.79, SD = 0.03; M new color = 0.77, SD = 0.06; t(18) = -1.57, p = 0.13, d = 0.36). This is in contrast to TG2 being significantly better on trials in which the probe was presented in a new color (M color presented before = 0.89, SD = 0.05; M new color = 0.94, SD = 0.03; t(16) = 5.15, p < 0.01, d = 1.25).
Fig. 5

Accuracies on trials in which the probe was different from the initial display (change trials) as a function of whether the probe had a color that was present in the initial display or not. Error bars represent standard error of the mean

Outcome Measures: Change-Detection and WM Tasks

The descriptive data for the criterion change-detection tasks along with statistical indices from the paired t tests and Bayes factors are noted in Table 1.5 We report and interpret our results implementing a false discovery rate control of 10% (Benjamini and Hochberg 1995). We provide detailed descriptive data and results with raw p values in our two tables for the interested reader to be able to interpret the data without such correction if desired. There were two significant improvements: both TG1 and TG2 improved on the simple criterion change-detection task at post-test compared to pre-test. The Bayes factors for these comparisons further provide evidence in favor of an effect. However, we note that the participants in TG1 also showed a medium effect size (d = 0.53) on the complex criterion task, but this effect is not supported by the Bayesian analysis (BF10 = 1.61).
Table 1

Descriptive data for the criterion tasks as a function of group

 

Pre-test

Post-test

Paired t test

N

Mean

SD

Min

Max

N

Mean

SD

Min

Max

BF10

t

p

r

ES

Training group 1

 Simple criterion task

18

4.80

1.03

2.92

6.96

18

5.74

1.24

4.00

8.60

7.60

3.10

0.006

0.36

0.73

 Complex criterion task

17

2.79

1.27

1.07

5.33

17

3.80

1.57

1.20

8.00

1.61

2.18

0.045

0.12

0.53

 Color resolution

18

14.06

3.57

8.71

24.09

18

13.83

3.59

9.34

23.94

0.30

0.67

0.513

0.92

0.16

 Spatial resolution—SD

18

15.95

4.94

10.08

30.48

18

15.50

4.19

9.54

25.86

0.31

0.72

0.483

0.84

0.17

 Spatial resolution—errors

18

4.72

3.51

1.00

13.00

18

4.11

3.64

0.00

16.00

0.28

0.54

0.598

0.09

0.13

Training group 2

 Simple criterion task

17

4.95

1.51

2.40

8.00

17

6.20

1.84

3.18

8.70

3.56

2.68

0.017

0.35

0.65

 Complex criterion task

17

3.07

1.28

1.33

6.00

17

3.00

0.78

1.73

4.00

0.25

− 0.22

0.831

0.21

− 0.05

 Color resolution

17

14.11

3.39

7.94

20.72

17

13.53

2.81

8.12

18.97

0.53

1.33

0.203

0.85

0.32

 Spatial resolution—SD

17

16.84

6.46

8.76

33.84

17

16.32

5.86

9.56

35.29

0.31

0.70

0.493

0.88

0.17

 Spatial resolution—errors

17

5.41

4.84

1.00

19.00

17

5.06

5.78

1.00

25.00

0.28

0.49

0.634

0.86

0.12

Using a false discovery rate of 10%, two comparisons reach statistical significance (in italics). Note that the group (training group 1 vs training group 2) by time (pre vs post) interaction was not significant for any of the tests. Data from one participant in training group 1 was lost with the exception of five pre-test measures; we excluded this participant from the analysis and this table. From another participant in training group 1, the pre-test data of the complex criterion task was missing

BF 10 Bayes factor indicating the evidence in favor of an effect, r correlation between pre-test and post-test, ES effect size taking into account the pre-test-post-test correlation

In Table 2, the descriptive data for the WM tasks together with the results from the paired t tests and Bayes factors are presented. We again used a false discovery rate control of 10%. Two significant results emerged: TG1 improved significantly on the overall two-back performance and the many-lures condition in the two-back task. Both results are supported by Bayes factors that suggest strong support in favor of an effect.
Table 2

Descriptive data for the measures that shared less features with the training tasks compared to the measures included in Table 1

 

Pretest

Posttest

paired t-test

N

Mean

SD

Min

Max

N

Mean

SD

Min

Max

BF10

t

p

r

ES

Training group 1

 Block tapping

18

6.28

1.10

4.50

8.00

18

6.36

0.74

5.00

7.50

0.26

0.37

0.717

0.52

0.09

 Symmetry span

18

30.00

7.83

17.00

42.00

18

33.28

6.36

19.00

41.00

1.85

2.27

0.036

0.65

0.54

 2-back—overall

17

0.72

0.19

0.40

0.95

17

0.82

0.12

0.63

0.96

9.46

3.24

0.005

0.74

0.79

 2-back—no lures

17

0.76

0.17

0.48

1.00

17

0.80

0.14

0.58

1.00

0.39

1.02

0.322

0.43

0.25

 2-back—few lures

17

0.74

0.18

0.35

1.00

17

0.83

0.14

0.51

0.98

1.56

2.16

0.046

0.40

0.52

 2-back—many lures

17

0.67

0.26

0.06

1.00

17

0.84

0.15

0.46

1.00

16.13

3.54

0.003

0.68

0.86

 3-back—overall

17

0.42

0.25

−0.02

0.85

17

0.48

0.28

−0.03

0.98

1.88

2.28

0.037

0.94

0.55

 3-back—no lures

17

0.46

0.28

−0.06

0.93

17

0.49

0.28

−0.05

1.00

0.30

0.66

0.518

0.74

0.16

 3-back—few lures

17

0.42

0.31

−0.03

0.91

17

0.51

0.29

−0.10

1.00

1.00

1.85

0.083

0.78

0.45

 3-back—many lures

17

0.39

0.23

−0.02

0.84

17

0.43

0.32

−0.09

0.93

0.33

0.79

0.441

0.72

0.19

Training group 2

 Block tapping

17

6.12

1.02

4.00

9.00

17

6.18

0.79

4.50

7.50

0.26

0.30

0.768

0.63

0.07

 Symmetry span

17

29.18

10.70

7.00

42.00

17

31.06

10.21

6.00

42.00

0.49

1.26

0.225

0.83

0.31

 2-back—overall

17

0.63

0.37

−0.48

0.98

17

0.72

0.35

−0.05

1.00

1.19

1.97

0.066

0.84

0.48

 2-back—no lures

17

0.62

0.37

−0.49

1.00

17

0.71

0.36

−0.13

1.00

0.76

1.64

0.121

0.82

0.40

 2-back—few lures

17

0.65

0.40

−0.45

1.00

17

0.74

0.31

0.02

1.00

1.27

2.02

0.060

0.88

0.49

 2-back—many lures

17

0.61

0.37

−0.51

1.00

17

0.72

0.38

−0.17

1.00

0.89

1.76

0.098

0.76

0.43

 3-back—overall

17

0.30

0.40

−0.81

0.91

17

0.40

0.37

−0.14

0.98

0.42

1.10

0.289

0.52

0.27

 3-back—no lures

17

0.35

0.42

−0.81

0.93

17

0.45

0.39

−0.13

1.00

0.38

0.97

0.347

0.44

0.24

 3-back—few lures

17

0.29

0.40

−0.79

0.91

17

0.41

0.39

−0.20

0.93

0.42

1.10

0.290

0.43

0.27

 3-back—many lures

17

0.26

0.43

−0.82

0.93

17

0.35

0.39

−0.21

1.00

0.37

0.95

0.357

0.54

0.23

Using a false discovery rate of 10%, two comparisons reach statistical significance (in italics). Note that the group (training group 1 vs training group 2) by time (pre vs post) interaction was not significant for any of the tests. Data from one participant in training group 1 was lost with the exception of five pre-test measures; we excluded this participant from the analysis and this table. Another participant in training group 1 performed almost on floor level in the n-back tasks, especially in the post-test and was excluded as well. Another participant was identified with a very low hit rate and a very high false alarm rate in the n-back tasks. This participant was included in the above table; exclusion of this participant did not change the picture of the data and their interpretation

BF 10 Bayes factor indicating the evidence in favor of an effect, r correlation between pre-test and post-test, ES effect size taking into account the pre-test-post-test correlation

Comparing the two training groups at pre-test with independent t tests, we did not find a statistically significant difference for any of the measures (all ps > 0.19). Across both tables, one ANCOVA with group (TG1 vs TG2) as the between-subjects variable, the pre-test value as the covariate, and the post-test value as the dependent variable turned out to be close to significant: complex criterion task (F(1, 31) = 3.79, p = 0.06, η p 2 = 0.11). All other comparisons were not significant (all ps > 0.31). In order to further investigate the evidence in favor of an effect, we calculated Bayes factors comparing the model consisting of the group term against the null model (which also included the covariate). The largest Bayes factor was BF10 = 1.37 for the complex criterion task, all other Bayes Factors were smaller than 0.55.

Discussion

The present study aimed at improving the participants’ change-detection skills using two variants of training that differed in whether the participants received feedback on a trial-by-trial basis, whether a pattern mask was included, and how the test display was presented. Both variants implemented an adaptivity algorithm that adjusted task difficulty in accordance with participants’ performance using set size as an adaptivity criterion. Indeed, the participants in both training conditions improved considerably on the trained task; TG1 improved by 40% and TG2 improved by 51% over the course of ten training sessions. In the simple criterion task, this improvement was also apparent in Cowan’s k, where we found a significant increase of 20% in TG1, and an increase of 25% in TG2. This finding stands in contrast to the lack of improvement reported in earlier attempts to train visual change detection (Olson and Jiang 2004; Olson et al. 2005; Whipple 1910; Zimmer et al. 2012) but seems roughly in line with the results observed by Kuo et al. (2014). However, while Kuo et al., trained participants for 24 h, we obtained our effects after only 4 h of training. It is conceivable that the main reason for the observed improvement might be the adaptive adjustment of set size. Furthermore, our training approach did not focus on the involvement of long-term memory (Olson and Jiang 2004; e.g., Olson et al. 2005; Zimmer et al. 2012) given that we used randomly determined stimulus configurations and presentation conditions (250 ms for the stimuli with a 1000 ms inter-stimulus interval) that presumably minimized the involvement of long-term storage (Lin and Luck 2012).

The training curves in Fig. 2 demonstrate that the participants were able to improve their performance on the trained task by more than 1.5 items on average in only 3 days, regardless of training condition. The participants improved to a similar degree in both training conditions as indicated by the non-significant interaction term of the ANOVA that compared both training curves. Additionally, a cursory curve fitting analysis revealed that the training curve in TG1 and TG2 is best described by a logarithmic function. However, the training curve of TG2 was also well described by a linear model, suggesting that learning in TG2 was slower and more incremental. We note that the training curve in TG1 seems to asymptote after 7 days of training. In the last 3 days of training, performance stagnated and even numerically decreased, which could reflect motivational issues. It is an open question whether TG2 would stagnate at a similar set size after prolonged training. The significant main effect of the ANOVA that compared both training conditions indicated that TG1 produced better overall performance. An inspection of the overall accuracies revealed that accuracy remained constant across time, reflecting the adaptive nature of the training task. Further, we found that the change versus non-change difference was much smaller in TG1 than in TG2. TG2 demonstrated the highest relative accuracy in correctly identifying change trials (yellow curve in Fig. 3), on the other hand, they showed relatively low accuracy for non-change trials (red curve in Fig. 3), indicating that TG2 participants were biased towards classifying trials as change trials even though they were not. This pattern was not observed for TG1. Focusing on reaction times, we found that TG1 participants became significantly faster over the entire training time despite the fact that the set size increased on average with subsequent training sessions. We further note that TG1 seems to asymptote in terms of set size and in terms of reaction times, but TG2 does not show this asymptote after the 10 sessions implemented here. Future research with more training sessions should test whether prolonged training on such change-detection tasks might result in further performance increases in either set size or reaction time. It is conceivable that even though an increase in set size might not be observed anymore, reaction times could still continue to decrease, an effect that could then be attributed to training. Finally, by parsing the change trials into instances in which the probe consisted of a color that was not presented in the initial display versus instances in which the probe consisted of a color that was already presented in the initial display, we found it to be relatively easy for TG2 participants to identify new probe colors (yellow curve in Fig. 5). The participants in TG1 were numerically better at identifying probe colors that had been presented before (black curve in Fig. 5), but the difference to new probe colors was not significant. This finding could point towards strategic differences in the two groups. The participants in TG1 may have focused on identifying configurations whereas the participants in TG2 may have focused on maintaining distinct items in memory which could be the explanation for the response bias mentioned before.

Previous research has shown that especially high-functioning participants are able to utilize configural information in a change-detection paradigm (Boduroglu and Shah 2009). This is consistent with a strategy that may have been developed in TG1. It is conceivable that this group learned to focus and to capitalize on configural information. This makes intuitive sense because the probe display in TG1 consisted of the whole array which allowed configural information to be considered. By contrast, TG2 was not provided with configural information at the probe stage because the probe consisted of just a single item.

Our exploratory analysis of the outcome measures revealed that both training groups significantly improved in the simple criterion task. It is conceivable that learning to make use of configural information is a skill that can be more easily applied to other untrained variants of the change-detection task, especially when configural information can be considered. It is an ongoing debate whether memory capacity varies as a function of information load (e.g., Fukuda et al. 2010). By including two tasks that assess the detail of information the participants can hold in mind (color resolution task and spatial resolution task), we aimed to contribute to this line of research. Our results demonstrated that although both groups numerically improved in both resolution tasks, these effects were rather small (average effect size across both groups: d = 0.18) and statistically not significant. Therefore, we conclude that the form of training we implemented here has neither a positive nor a negative impact on the amount of detail one can recall after training. However, we note that the Bayes factors supporting the null hypothesis (BF01)6 range between 1.90 and 3.62 with five out of six factors being larger than 3, indicating that there is more support for the null than the alternative hypothesis. Finally, apart from an improvement in two-back performance in TG1, our analyses of the WM outcome measures suggest that the change-detection training did not result in a general improvement in WM capacity, at least with the assessments used here. This finding is in line with a previous study that has used encoding time as a variable that is adjusted adaptively (Gaspar et al. 2013). As such, the potential of change-detection training to generalize beyond specific effects seems to stand in contrast to simple- and complex-span WM interventions, as well as n-back training, all of which have shown transfer to non-trained variants of WM tasks (e.g., Karbach and Verhaeghen 2014; Soveri et al. 2017; Weicker et al. 2016). In line with the perceptual learning literature (Dosher et al. 2013), we speculate that training on a change-detection paradigm might have an impact on earlier representations but not so much on higher cognitive processes. This stands in contrast to the higher cognitive processes that are involved in other WM tasks (e.g., interference resolution and inhibitory control which seem to be involved in n-back tasks). Such limited impact on earlier representations would also explain the lack of transfer to untrained measures which is not an uncommon finding in perceptual training (Ball et al. 2002; Fahle and Morgan 1996; but see Deveau et al. 2014). Drawing another analogy to the training work published by Ericsson et al. (1980), participant S.F. was able to improve digit-span performance from 7 to almost 80 digits after 230 h of practice, an improvement that seems impossible at first sight. Although our participants did not improve as dramatically as S.F., after just 4 h of training, their performance level was already 1.5 times as large as what one would expect based on previous (non-training) work with the change-detection paradigm. Yet, despite these substantial training improvements, the effects do not seem to generalize to untrained stimulus material in Ericsson et al. or untrained tasks in the present research. Similar to what Ericsson et al. reported, it is conceivable that our participants also developed mnemonic associations and retrieval structures; however, these strategies seem highly specific to the task and task material so that they cannot easily be used in different contexts. We note that our study design did not include a control group, and therefore, no conclusions can be drawn about whether the intervention might lead to effects that go beyond test-retest effects. However, our design was well suited to detect the potential for generalization in case of significant improvements from the pre-test session to the post-test session. Nevertheless, only a handful of training studies on change detection are currently available and future research needs to address both, the training and the transfer potential in other populations, such as older adults, patients, and participants with initial low change-detection performance.

Finally, despite the impressive improvements we observed in the trained tasks, the question remains: What do participants learn by training on the two change-detection tasks given the lack of generalization to other WM tasks? As mentioned before, it seems that the improvement in TG1 points to a “configuration learning” effect. In contrast, the TG2 effect might be based more on learning how to remember distinct items. The stable performance in the spatial and the color resolution tasks indicates that neither training condition had an impact on the precision with which information is encoded or stored. In sum, although the training effects are substantial, they seem to be highly task specific. As such, it seems likely that the training as implemented here might reduce the predictive power of change-detection performance for intellectual aptitude (c.f., Cowan et al. 2006), similar to the situation in which practicing on matrix reasoning tests that assess fluid intelligence subsequently decreases the test’s g loading (Ackerman 1987; Bors and Vigneau 2001; te Nijenhuis et al. 2007). Overall, although we were able to demonstrate reliable improvements in the trained task that go beyond other intervention work with the change-detection paradigm, future research is needed to further detail the exact mechanisms underlying the effects we have documented.

Footnotes

  1. 1.

    We note at this point that definitions of WM often stress the involvement of storage and processing components (e.g., Oberauer 2005). The change-detection paradigm is commonly described as a WM task, although it relies more on a storage than a processing component.

  2. 2.

    In a two-back task for example, a one-back target and a three-back target both constitute a lure.

  3. 3.

    The corresponding RGB values were black (0, 0, 0), blue (0, 0, 255), green (0, 128, 0), purple (128, 0, 128), red (255, 0, 0), white (255, 255, 255), and yellow (255, 255, 0).

  4. 4.

    Note that due to the variation in computer hardware, the stimulus size (as expressed in degrees of visual angle) could vary between participants because of different screen sizes and screen resolutions.

  5. 5.

    Concerning the modeling approaches, we note that for the spatial resolution task, the appropriateness of the model is reflected in both the small number of “error” trials detected (M = 4.9 out of 115 trials; SD = 4.4) and the high test-retest reliability of the resulting precision statistics (1/sd of pre-training vs post-training point distributions was correlated r = 0.81; after each participant was fitted with the model the corresponding 1/SD precision score correlation improved to r = 0.88). The correlation between the number of trials categorized as guesses was also significantly correlated pre- and post-test (r = 0.65; t(38) = 5.2, p < 0.001). For the color resolution task, substantially more trials were categorized as guesses (M = 20.8 out of 125 trials; SD = 10.8), and the precision value (kappa) fitted to each participant (assuming no guessing) had a similar test-retest correlation (r = 0.56 for the raw scores; r = 0.52 for the mixture model). Interestingly, the correlation between the number of trials categorized as guess was higher than the precision scores (r = 0.73).

  6. 6.

    Note that we present BF10 in the tables.

Notes

Acknowledgements

The authors would like to thank Nelson Cowan for comments on the initial results of this experiment.

Funding Information

This work was supported by the Office of Naval Research Grant N00014-09-0213 and the Institute of Education Sciences Grant R324A090164

Compliance with Ethical Standards

Conflict of Interest

SMJ has an indirect financial interest in the MIND Research Institute. MB is employed at MIND Research Institute whose interest is related to this work.

References

  1. Ackerman, P. L. (1987). Individual differences in skill learning: an integration of psychometric and information processing perspectives. Psychonomic Bulletin & Review, 102(1), 3–27.CrossRefGoogle Scholar
  2. Agostinelli, C., & Lund, U. (2013). R package “circular”: Circular Statistics (Version 0.4–7). Retrieved from https://r-forge.r-project.org/projects/circular/
  3. Au, J., Katz, B., Buschkuehl, M., Bunarjo, K., Senger, T., Zabel, C., et al. (2016). Enhancing working memory training with transcranial direct current stimulation. Journal of Cognitive Neuroscience, 28(9), 1419–1432.  https://doi.org/10.1162/jocn_a_00979.CrossRefPubMedGoogle Scholar
  4. Ball, K., Berch, D. B., Helmers, K. F., Jobe, J. B., Leveck, M. D., Marsiske, M., et al. (2002). Effects of cognitive training interventions with older adults: a randomized controlled trial. JAMA: The Journal of the American Medical Association, 288(18), 2271–2281.CrossRefPubMedGoogle Scholar
  5. Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B (Methodological), 57(1), 289–300.Google Scholar
  6. Bjork, E. L., & Bjork, R. A. (2014). Making things hard on yourself, but in a good way: creating desirable difficulties to enhance learning. In Psychology and the real world: Essays illustrating fundamental contributions to society (2nd ed., pp. 59–68). New York (NY, USA): Worth.Google Scholar
  7. Boduroglu, A., & Shah, P. (2009). Effects of spatial configurations on visual change detection: an account of bias changes. Memory & Cognition, 37(8), 1120–1131.  https://doi.org/10.3758/MC.37.8.1120.CrossRefGoogle Scholar
  8. Boduroglu, A., Mueller, S., Ng, A., & Shah, P. (submitted). Representation resolution is correlated with short-term memory capacity.Google Scholar
  9. Bors, D. A., & Vigneau, F. (2001). The effect of practice on Raven’s advanced progressive matrices. Learning and Individual Differences, 13(4), 291–312.  https://doi.org/10.1016/S1041-6080(03)00015-3.CrossRefGoogle Scholar
  10. Chein, J. M., & Morrison, A. B. (2010). Expanding the mind’s workspace: training and transfer effects with a complex working memory span task. Psychonomic Bulletin & Review, 17(2), 193–199.  https://doi.org/10.3758/PBR.17.2.193.CrossRefGoogle Scholar
  11. Conway, A. R., Kane, M. J., Bunting, M. F., Hambrick, D. Z., Wilhelm, O., & Engle, R. W. (2005). Working memory span tasks: a methodological review and user’s guide. Psychonomic Bulletin & Review, 12(5), 769–786.CrossRefGoogle Scholar
  12. Cowan, N. (2001). The magical number 4 in short-term memory: a reconsideration of mental storage capacity. Behavioral and Brain Sciences, 24, 87–185.CrossRefPubMedGoogle Scholar
  13. Cowan, N., Elliot, E. M., Saults, J. S., Morey, C. C., Mattox, S., Hismjatullina, A., & Conway, A. R. A. (2005). On the capacity of attention: its estimation and its role in working memory and cognitive aptitudes. Cognitive Psychology, 51, 42–100.CrossRefPubMedPubMedCentralGoogle Scholar
  14. Cowan, N., Fristoe, N. M., Elliott, E. M., Brunner, R. P., & Saults, J. S. (2006). Scope of attention, control of attention, and intelligence in children and adults. Memory & Cognition, 34(8), 1754–1768.CrossRefGoogle Scholar
  15. Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39(1), 1–38.Google Scholar
  16. Deveau, J., Ozer, D. J., & Seitz, A. R. (2014). Improved vision and on-field performance in baseball through perceptual learning. Current Biology, 24(4), R146–R147.CrossRefPubMedPubMedCentralGoogle Scholar
  17. Dosher, B. A., Jeter, P., Liu, J., & Lu, Z.-L. (2013). An integrated reweighting theory of perceptual learning. Proceedings of the National Academy of Sciences of the United States of America, 110(33), 13678–13683.  https://doi.org/10.1073/pnas.1312552110.CrossRefPubMedPubMedCentralGoogle Scholar
  18. Eng, H. Y., Chen, D., & Jiang, Y. (2005). Visual working memory for simple and complex visual stimuli. Psychonomic Bulletin & Review, 12(6), 1127–1133.CrossRefGoogle Scholar
  19. Ericsson, K. A., Chase, W. G., & Faloon, S. (1980). Acquisition of a memory skill. Science, 208, 1181–1182.CrossRefGoogle Scholar
  20. Fahle, M., & Morgan, M. (1996). No transfer of perceptual learning between similar stimuli in the same retinal position. Current Biology: CB, 6(3), 292–297.CrossRefPubMedGoogle Scholar
  21. Fukuda, K., Awh, E., & Vogel, E. K. (2010a). Discrete capacity limits in visual working memory. Current Opinion in Neurobiology, 20(2), 177–182.  https://doi.org/10.1016/j.conb.2010.03.005.CrossRefPubMedPubMedCentralGoogle Scholar
  22. Fukuda, K., Vogel, E., Mayr, U., & Awh, E. (2010b). Quantity, not quality: the relationship between fluid intelligence and working memory capacity. Psychonomic Bulletin & Review, 17(5), 673–679.  https://doi.org/10.3758/17.5.673.CrossRefGoogle Scholar
  23. Gaspar, J. G., Neider, M. B., Simons, D. J., McCarley, J. S., & Kramer, A. F. (2013). Change detection: training and transfer. PLoS One, 8(6), e67781.  https://doi.org/10.1371/journal.pone.0067781.CrossRefPubMedPubMedCentralGoogle Scholar
  24. Jaeggi, S. M., Seewer, R., Nirkko, A. C., Eckstein, D., Schroth, G., Groner, R., & Gutbrod, K. (2003). Does excessive memory load attenuate activation in the prefrontal cortex? Load-dependent processing in single and dual tasks: functional magnetic resonance imaging study. NeuroImage, 19(2 Pt 1), 210–225.CrossRefPubMedGoogle Scholar
  25. Jaeggi, S. M., Buschkuehl, M., Jonides, J., & Perrig, W. J. (2008). Improving fluid intelligence with training on working memory. Proceedings of the National Academy of Sciences of the United States of America, 105(19), 6829–6833.  https://doi.org/10.1073/pnas.0801268105.CrossRefPubMedPubMedCentralGoogle Scholar
  26. Jaeggi, S. M., Buschkuehl, M., Shah, P., & Jonides, J. (2014). The role of individual differences in cognitive training and transfer. Memory & Cognition, 42(3), 464–480.  https://doi.org/10.3758/s13421-013-0364-z.CrossRefGoogle Scholar
  27. JASP Team. (2017). JASP (Version 0.8.2). https://jasp-stats.org/faq/how-do-i-cite-jasp/.
  28. Johnson, M. K., McMahon, R. P., Robinson, B. M., Harvey, A. N., Hahn, B., Leonard, C. J., et al. (2013). The relationship between working memory capacity and broad measures of cognitive ability in healthy adults and people with schizophrenia. Neuropsychology, 27(2), 220–229.  https://doi.org/10.1037/a0032060.CrossRefPubMedPubMedCentralGoogle Scholar
  29. Karbach, J., & Verhaeghen, P. (2014). Making working memory work: a meta-analysis of executive control and working memory training in younger and older adults. Psychological Science, 25(11), 2027–2037.  https://doi.org/10.1177/0956797614548725.CrossRefPubMedPubMedCentralGoogle Scholar
  30. Klingberg, T., Fernell, E., Olesen, P. J., Johnson, M., Gustafsson, P., Dahlström, K., et al. (2005). Computerized training of working memory in children with ADHD—a randomized, controlled trial. Journal of the American Academy of Child & Adolescent Psychiatry, 44(2), 177–186.  https://doi.org/10.1097/00004583-200502000-00010.CrossRefGoogle Scholar
  31. Kundu, B., Sutterer, D. W., Emrich, S. M., & Postle, B. R. (2013). Strengthened effective connectivity underlies transfer of working memory training to tests of short-term memory and attention. The Journal of Neuroscience, 33(20), 8705–8715.  https://doi.org/10.1523/JNEUROSCI.5565-12.2013.CrossRefPubMedPubMedCentralGoogle Scholar
  32. Kuo, C.-C., Zhang, C., Rissman, R. A., & Chiu, A. W. L. (2014). Long-term electrophysiological and behavioral analysis on the improvement of visual working memory load, training gains, and transfer benefits. Journal of Behavioral and Brain Science, 04(05), 234–246.  https://doi.org/10.4236/jbbs.2014.45025.CrossRefGoogle Scholar
  33. Lin, P.-H., & Luck, S. J. (2012). Proactive interference does not meaningfully distort visual working memory capacity estimates in the canonical change detection task. Frontiers in Psychology, 3, 42.  https://doi.org/10.3389/fpsyg.2012.00042.CrossRefPubMedPubMedCentralGoogle Scholar
  34. Morrison, A. B., & Chein, J. M. (2011). Does working memory training work? The promise and challenges of enhancing cognition by training working memory. Psychonomic Bulletin & Review, 18(1), 46–60.  https://doi.org/10.3758/s13423-010-0034-0.CrossRefGoogle Scholar
  35. Oberauer, K. (2005). The measurement of working memory capacity. In O. Wilhelm & R. W. Engle (Eds.), Handbook of understanding and measuring intelligence (pp. 393–407). Thousand Oaks: Sage Publications.CrossRefGoogle Scholar
  36. Olson, I. R., & Jiang, Y. (2004). Visual short-term memory is not improved by training. Memory & Cognition, 32(8), 1326–1332.CrossRefGoogle Scholar
  37. Olson, I. R., Jiang, Y., & Sledge Moore, K. S. (2005). Associative learning improves visual working memory performance. Journal of Experimental Psychology Human Perception and Performance, 31(5), 889–900.  https://doi.org/10.1037/0096-1523.31.5.889.CrossRefPubMedGoogle Scholar
  38. Owens, M., Koster, E. H. W., & Derakshan, N. (2013). Improving attention control in dysphoria through cognitive training: transfer effects on working memory capacity and filtering efficiency. Psychophysiology, 50(3), 297–307.  https://doi.org/10.1111/psyp.12010.CrossRefPubMedGoogle Scholar
  39. Phillips, W. A. (1974). On the distinction between sensory storage and short-term visual memory. Perception & Psychophysics, 16(2), 283–290.  https://doi.org/10.3758/BF03203943.CrossRefGoogle Scholar
  40. R Core Team. (2013). R: a language and environment for statistical computing. Vienna (Austria): R Foundation for Statistical Computing Retrieved from http://www.R-project.org/.Google Scholar
  41. Redick, T. S., Broadway, J. M., Meier, M. E., Kuriakose, P. S., Unsworth, N., Kane, M. J., & Engle, R. W. (2012). Measuring working memory capacity with automated complex span tasks. European Journal of Psychological Assessment, 28(3), 164–171.  https://doi.org/10.1027/1015-5759/a000123.CrossRefGoogle Scholar
  42. Rouder, J. N., Morey, R. D., Cowan, N., Zwilling, C. E., Morey, C. C., & Pratte, M. S. (2008). An assessment of fixed-capacity models of visual working memory. Proceedings of the National Academy of Sciences of the United States of America, 105(16), 5975–5979.  https://doi.org/10.1073/pnas.0711295105.CrossRefPubMedPubMedCentralGoogle Scholar
  43. Schellig, D. (1997). Block-tapping test. Frankfurt am Main: Swets Tests Services.Google Scholar
  44. Schwarb, H., Nail, J., & Schumacher, E. H. (2016). Working memory training improves visual short-term memory capacity. Psychological Research, 80(1), 128–148.  https://doi.org/10.1007/s00426-015-0648-y.CrossRefPubMedGoogle Scholar
  45. Shute, V. J. (2008). Focus on formative feedback. Review of Educational Research, 78(1), 153–189.  https://doi.org/10.3102/0034654307313795.CrossRefGoogle Scholar
  46. Sligte, I. G., Scholte, H. S., & Lamme, V. A. F. (2008). Are there multiple visual short-term memory stores? PLoS One, 3(2), e1699.  https://doi.org/10.1371/journal.pone.0001699.CrossRefPubMedPubMedCentralGoogle Scholar
  47. Soveri, A., Antfolk, J., Karlsson, L., Salo, B., & Laine, M. (2017). Working memory training revisited: a multi-level meta-analysis of n-back training studies. Psychonomic Bulletin & Review, 24(4), 1077–1096.  https://doi.org/10.3758/s13423-016-1217-0.CrossRefGoogle Scholar
  48. Sperling, G. (1960). The information available in brief visual presentations. Psychological Monographs: General and Applied, 74(11), 1–29.  https://doi.org/10.1037/h0093759.CrossRefGoogle Scholar
  49. Sungur, H., & Boduroglu, A. (2012). Action video game players form more detailed representation of objects. Acta Psychologica, 139(2), 327–334.  https://doi.org/10.1016/j.actpsy.2011.12.002.CrossRefPubMedGoogle Scholar
  50. te Nijenhuis, J., van Vianen, A. E. M., & van der Flier, H. (2007). Score gains on g-loaded tests: no g. Intelligence, 35(3), 283–300.  https://doi.org/10.1016/j.intell.2006.07.006.
  51. Weicker, J., Villringer, A., & Thöne-Otto, A. (2016). Can impaired working memory functioning be improved by training? A meta-analysis with a special focus on brain injured patients. Neuropsychology, 30(2), 190–212.  https://doi.org/10.1037/neu0000227.CrossRefPubMedGoogle Scholar
  52. Wheeler, M. E., & Treisman, A. M. (2002). Binding in short-term visual memory. Journal of Experimental Psychology General, 131(1), 48–64.CrossRefPubMedGoogle Scholar
  53. Whipple, G. M. (1910). The effect of practise upon the range of visual attention and of visual apprehension. The Journal of Educational Psychology, 1(5), 249–262.CrossRefGoogle Scholar
  54. Xu, Z., Adam, K. C. S., Fang, X., & Vogel, E. K. (2017). The reliability and stability of visual working memory capacity. Behavior Research Methods.  https://doi.org/10.3758/s13428-017-0886-6.
  55. Zhang, W., & Luck, S. J. (2008). Discrete fixed-resolution representations in visual working memory. Nature, 453(7192), 233–235.  https://doi.org/10.1038/nature06860.CrossRefPubMedPubMedCentralGoogle Scholar
  56. Zhang, W., & Luck, S. J. (2011). The number and quality of representations in working memory. Psychological Science, 22(11), 1434–1441.  https://doi.org/10.1177/0956797611417006.CrossRefPubMedPubMedCentralGoogle Scholar
  57. Zimmer, H. D., Popp, C., Reith, W., & Krick, C. (2012). Gains of item-specific training in visual working memory and their neural correlates. Brain Research, 1466, 44–55.  https://doi.org/10.1016/j.brainres.2012.05.019.CrossRefPubMedGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2017

Authors and Affiliations

  • Martin Buschkuehl
    • 1
  • Susanne M. Jaeggi
    • 2
  • Shane T. Mueller
    • 3
  • Priti Shah
    • 4
  • John Jonides
    • 4
  1. 1.MIND Research InstituteIrvineUSA
  2. 2.School of EducationUniversity of California, IrvineIrvineUSA
  3. 3.Department of Cognitive and Learning SciencesMichigan Technological UniversityHoughtonUSA
  4. 4.Department of PsychologyUniversity of MichiganAnn ArborUSA

Personalised recommendations