Introduction

There is increasing evidence that individual differences in cognitive and academic performance share high variance with basic capacity-limited processes, first and foremost by working memory (WM), defined as the capacity to retain and manipulate information (e.g., Engle, Tuholski, Laughlin, & Conway, 1999; Pickering, 2006; Shah & Miyake, 1999). Explanations of shared variance between WM, higher cognitive performances, and scholastic abilities are based on evidence of common capacity constraint (Halford, Cowen, & Andrews, 2007), attentional control processes (Kane et al., 2004), and overlapping neuronal networks in the lateral prefrontal and parietal cortices (Gray, Chabris, & Braver, 2003). These findings have motivated attempts to improve WM capacity and with it the academic abilities of children by training regimens targeting WM; such efforts challenge the long-held view that WM capacity is primarily inherited and fixed (e.g., Engel, Heloisa Dos Santos, & Gathercole, 2008). Many of these attempts have demonstrated that an intense period of short-term WM training with children leads to improvement in training-related tasks such as visual-spatial WM and executive functions (near transfer; see Diamond & Lee, 2011, for a review). Even though most of these training studies do not demonstrate far transfer or IQ improvements (e.g., Bergman-Nutley et al., 2011; Mackey, Hill, Stone, & Bunge, 2011; see Diamond & Lee, 2011, for a review), some promisingly do, for instance, by demonstrating improvements in measures of fluid intelligence (Jaeggi, Buschkuehl, Jonides, & Shah, 2011) or crystallized intelligence (Alloway, Bibile, & Lau, 2013).

Gains in measures of general intelligence after WM training are still rare, and evidence of improved scholastic abilities after WM training with healthy school-aged children is even more sparse. On the one hand, some studies found positive evidence, giving rise to optimism (Titz & Karbach, 2014), for example a laboratory study which showed increased reading abilities after WM training (Looslie, Buschkuehl, Perrig, & Jaeggi, 2011), and an investigation in a school setting which demonstrated that WM-trained academically low-achieving children made significantly greater progress across the academic year in mathematics and English than matched untrained pupils (Holmes & Gathercole, 2014). On the other hand, there are also WM training studies which did not find any significant progress in childrens’ academic or higher-order cognitive performance, thus casting doubt on the general scope of WM-training-related improvements (e.g., Dunning, Holmes, Gathercole, 2013; Thorell, Lindqvist, Nutley, Bohlin, & Klingberg, 2009).

In recent years, this heterogeneity of training research results has given rise to a series of critical reviews or meta-analyses (e.g., Melby-Lervåg & Hulme, 2012; Shipstead, Redick, & Engle, 2010, 2012). They primarily identify methodological reasons for the inconsistency of results, such as inadequate control of study groups (Shipstead et al., 2012), and suggest interpreting results with caution. However, methodological weaknesses should not belie the fact that WM training shows a clear potential: The general trainability of WM can be considered reliable, since near transfer on non-trained WM tasks is a consistent result in most of the training studies with participants at every age (Melby-Lervåg & Hulme, 2012), and several findings of far transfer to intelligence measures indicate the impact of WM training on important intellectual abilities (cf. Au et al., 2014; Buschkuehl & Jaeggi, 2010; Bryck & Fisher, 2012; Klingberg, 2010).

These inconsistent training results have led to a recent shift in research attention to the influence of individual differences on training outcomes (e.g., Jaeggi, Buschkuehl, Shah, & Jonides, 2014; Studer-Luethi, Jaeggi, Buschkuehl, & Perrig, 2012). However, such investigations are still in their infancy. Whereas active compliance with training, stress, need for cognition, or beliefs about the malleability of intelligence have been found to influence training outcomes in adults (Bagwell & West, 2008; Jaeggi et al., 2014; Valentijn et al., 2005), perceived training task difficulty has been found to interfere with training benefits in children (Jaeggi et al., 2011). Only very few studies investigated the influence of personality characteristics on cognitive training outcomes. Yesavage (1989) has suggested that subjects with high scores of neuroticism show the least profit from memory training, and Bäckman, Hill, and Rosell (1996) state that subjects with depressive symptoms have difficulty in activating the necessary cognitive resources to achieve improvement after training. Contrarily, training studies in different fields found conscientiousness to be the strongest predictor of training success (e.g., Barrick, Stewart, & Piotrowski, 2002; Tziner, Fisher, Senior, & Weisberg, 2007). Therefore, in a previous study, we investigated the moderating effect of the personality traits neuroticism and conscientiousness on WM training outcomes in young adults (Studer-Luethi et al., 2012). Results revealed a significant interaction of neuroticism and intervention in terms of training efficacy, in that the demanding WM training task was more effective for participants low in neuroticism. Furthermore, conscientiousness was associated with higher WM training scores and improvement in near-transfer measures.

To our knowledge, no study has yet investigated the moderating effect of these two prominent personality traits on cognitive training outcomes in samples of children. In children, these individual differences can be found at the level of temperament: Whereas neuroticism, representing negative affectivity and increased emotional reactivity, is a temperament factor that can be observed already in childhood (Eysenck, 1967), conscientiousness is a personality factor which only fully develops after adolescence. Rothbart and her collegues (2001) identified a temperament factor in childhood representing the developmental process underlying conscientiousness, naming it effortful control (cf. Ahadi & Rothbart, 1994; Blair & Razza, 2007). Together, neuroticism and effortful control represent the two temperament categories reactivity and self-regulation (Rothbart, Derryberry, & Posner, 1994).

Even though results regarding the interplay between these temperament factors and general performance in cognitive tests are very heterogeneous (see, e.g., Owens, Stevenson, Norgate, & Hadwin, 2008; Seipp, 1991, for meta-analyses), the theoretical assumptions and findings can serve as a framework for hypotheses about the influence of temperament on cognitive training outcomes.

Dispositional temperament factors and their relationship to working memory (WM)

Temperament refers to individual differences in emotional reactivity and the regulation of this reactivity (e.g., Ahadi & Rothbart, 1994; Posner & Rothbart, 2000). At a neural system level, emotional reactivity is mainly associated with the limbic systems in the ventromedial prefrontal cortex, whereas self-regulation, the innate ability to maintain optimal levels of emotional, motivational, and cognitive arousal (Eisenberg, Hofer, & Vaughan, 2007; Liew, 2011), is mainly rooted in the lateral prefrontal cortex and the anterior cingulate cortex (ACC) (Botvinick et al., 2001). Temperament factors have received increased attention during recent years, as they have been shown to play key roles in children’s academic success. In a study by Blair and Razza (2007), self-regulation accounted for unique variance in the scholastic achievements independent of general intelligence (see also Chamorro-Premuzic & Furnham, 2006; De Fruyt & Mervielde, 1996; Duckworth & Seligman, 2005; Tangney, Baumeister, & Boone, 2004). This relationship seems to be mediated by WM capacity (e.g., Owens, Stevenson, Hadwin, & Norgate, 2014), or more specifically by attentional efficiency (see Rueda, Posner, & Rothbart, 2005).

One dispositional component of emotional reactivity is the hyperarousability of the limbic systems, labeled neuroticism in Eysenck’s model of personality (1967). Neuroticism is related to higher excitability and emotional responsiveness, resulting in a higher variability of emotional and motivational states and the tendency to experience more negative emotions, such as anxiety or distress. On the one hand, neuroticism and related traits (e.g., anxiety) seem to generally diminish processing efficiency through disadvantageous arousal level as well as emotional and cognitive resource-demanding interferences, such as worrisome thoughts or negative emotions. On the other hand, in accordance with many studies showing the effect of distress on aggravated operations in the prefrontal cortex (for a review, see Arnsten, 2009), neuroticism-related characteristics seem to mainly reduce resources available to control attention by impairing processes in the central executive of WM (e.g., Bishop, 2009; Derakshan & Eysenck, 2009; Eysenck & Calvo, 1992; Eysenck, Derakshan, Santos, & Calvo, 2007; Gray et al., 2005; Schmeichel, Volokhov, & Demaree, 2008; Shackman et al., 2006). These assumptions were confirmed in neuroimaging studies revealing that neuroticism is associated with reduced neuronal efficiency and impoverished recruitment of prefrontal attention control mechanisms during a WM task (Bishop, 2009; Gray et al., 2005). However, lower efficiency does not necessarily mean lower efficacy, meaning quality of performance. Investigations regarding WM task performance in relation to neuroticism-related characteristics are very sparse, and the published findings are inconsistent. To date, to our knowledge, there is no study demonstrating a significant association between neuroticism and general WM performance in children. There is one study demonstrating a negative association between the trait anxiety and verbal WM, but not spatial WM (Visu-Petra et al., 2010), whereas another investigation found a positive association between neuroticism and verbal WM in adults (Arbune et al., 2015). These examples demonstrate that more research is needed to disentangle the interaction of neuroticism and WM task performance (see also Hadwin, Brogan, & Stevenson, 2005; DeYoung et al., 2009). As Eysenck and Calvo (1992) put it, individuals with high anxiety often apply compensatory strategies such as enhanced effort, which can explain why they often reach task performance comparable to individuals with low anxiety (cf. Eysenck & Calvo, 1992).

One predisposition for self-regulation skills is the temperament factor called effortful control (Rothbart & Bates, 2006). It is believed to be associated with early-appearing individual differences in self-regulation, and with it the developmental process underlying conscientiousness (Ahadi & Rothbart, 1994). It allows individuals to voluntarily regulate their behavior in relation to current and future needs, as for instance to inhibit a dominant response in favor of a subdominant response (Blair & Diamond, 2008; Derryberry, Reed, & Pilkenton-Taylor, 2003). In temperament questionnaires, effortful control emerges from factors including shifting and focusing attention, inhibitory control, perceptual sensitivity, and low-intensity pleasure. As Rothbart and Rueda (2005) postulate, the systems of this temperament factor provide the flexibility required to master negative affect and consider potential actions in the light of principles. Effortful control overlaps substantially with inhibitory control (see Diamond, 2013), but while executive functions emerge from a neural system approach with a historical focus on volitional control of cognitive self-regulation, effortful control historically focused more tightly on automatic or nonconscious emotional regulation (cf. Blair & Razza, 2007; see also Eisenberg, Spinrad, & Eggum, 2010; Mischel & Ayduk, 2002). Effortful control was found to facilitate performance efficiency by helping suppress distracting stimuli and monitor optimal arousal maintenance for a given task (e.g., Blair & Diamond, 2008; Eisenberg et al., 2004; Rothbart, Ellis, Rueda, & Posner, 2003), as well as by improving internalized control (Kochanska, Murray, & Harlan, 2000; Kochanska, Murray, Jacques, Koenig, & Vandegeest, 1996) and the ability to deal with conflict (see Rueda, Posner, & Rothbart, 2005). Regarding the interplay of effortful control and task effectiveness (quality of performance), results show positive associations with scholastic abilities (e.g., Deater-Deckard, Mullineaux, Petrill, & Thompson, 2009), but mixed findings regarding performance in executive functions (e.g., Bridgett et al., 2013). Some authors postulate the association of effortful control with heightened levels of evaluation apprehension or the tendency to be self-deceptive as an explanation for these inconsistent findings (e.g., Martocchio & Judge, 1997).

The current study

The current study investigates the effects of school-based WM training in a sample of non-selected primary-school children. We chose this setting because there is still a lack of findings from natural learning settings as opposed to optimized laboratory settings, and we chose this sample because this provides a strong basis for the assessment of individual influences on training results. The aim of this investigation was twofold.

Firstly, we aimed to investigate the near and far transfer effects of WM training. On the basis of the empirical findings discussed above, we hypothesized that WM training improves performances in WM, and that there might be far-transfer effects on fluid or crystallized intelligence and on scholastic performance of children who participate in WM training in comparison to children who participate in an alternative training or in no training.

However, moderator variables are critical for understanding the generalizability of training results to subgroups. For intervention research, moderator variables may reflect subgroups of persons for whom the training is more or less effective than for other groups (see MacKinnon, 2011). Therefore, secondly, we sought to build upon previous findings in adults showing the influence of neuroticism and conscientiousness on WM training outcomes (Studer-Luethi et al., 2012). Focusing on the corresponding temperament factors in children, we predicted that neuroticism and effortful control would moderate training and transfer measures by affecting the ability to deal with frustration and negative affect during a WM training period:

  1. 1)

    WM training tasks largely rely on attentional control. Based on the assumption that subjects with high neuroticism experience cognitive and emotional interferences which decreases attentional control, processing, and storage resources of the WM system, we assume that neuroticism will negatively predict WM training efficiency and with it training effectiveness regarding transfer. However, since the WM training tasks are not too complex, compensatory effects could facilitate individuals with high neuroticism to improve their training performance. Consequentially, for the subgroup analyses, we hypothesized that children with low neuroticism scores will show higher WM training average performance than but comparable training gain to children with high scores and that WM training will show superior effects on untrained measures in comparison to the two control groups only in this subgroup of children.

  2. 2)

    Successful suppression of distracting stimuli and the monitoring of optimal arousal are necessary for an effective training process. Based on the assumption that subjects with high effortful control have increased internalized control and a higher ability to deal with frustration and conflictual tendencies, such as wanting to stop the task after a failure but at the same time wanting to improve task performance and keeping up with others, we assumed that effortful control would positively predict WM training performance and benefit regarding transfer. Consequently, for the subgroup analyses, we hypothesized that children with high effortful control will show better WM training improvement than the subgroup with low effortful control, and that WM training will show superior effects on untrained measures in comparison to the two control groups only in this subgroup of children.

Method

Participants

A total of 99 second-grade elementary school children (36 % female) were recruited in four public schools in Switzerland. At the time of first data collection, the mean age of the children was 8 years and 3 months (SD = .50). Besides the written consent of the parents for their children to participate in the study, no exclusion criteria were applied at recruitment. Most of the children (76.8 %) reported German as their first language, and 23.2 % reported another language as their first language. None of them had any problems understanding and speaking German. As a reward, all children received a medal after the completion of the study.

After first data collection, we allocated the children of each class to three study groups, matched for age, gender, and general intelligence. The experimental group completed computer-based WM training (n = 34; mean age = 8.28 years; SD = .43; 21 male), the active control group participated in computer-based reading training (n = 31; mean age = 8.15 years; SD = .38; 19 male), and the third part was assigned to a no-contact control group (n = 30; mean age = 8.49 years; SD = .58; 22 male). We had to exclude the data of four children from longitudinal analyses due to their infrequent attendance at training (a minimal attendance of 17 sessions was required).

Procedure

At the beginning of the study, teachers’ ratings (effortful control) and parents’ ratings (effortful control, neuroticism) were collected to assess the temperamental factors of the children. To assess performance in cognitive and academic abilities, all children completed a battery of cognitive and academic ability tests during two regular school lessons, 1–4 days before and then 1–4 days and 3 months after the training period. A and B versions were used for the pre- and the post-tests in counterbalanced order. With the exception of the memory span task, which was individually administered, all the tests were carried out as a group with the whole class. The self-reported personality questionnaire to measure neuroticism was conducted with the children pre-testing.

The children of the WM-training and reading-training groups completed daily training sessions of 15 min in groups of six to 13 children in the computer laboratories of the schools for four consecutive weeks on school days, resulting in 17–20 training sessions. The children of the no-contact control group stayed in the classroom with their respective teacher.

Material

Training tasks

WM training comprised two different tasks, the single n-back task and the animal span task. In every training session, both of the training tasks were applied in the same order.

WM single n-back task

We chose an adaptive visual-spatial single n-back task, similar to the task used in other WM training studies (e.g., Jaeggi, Studer-Luethi et al., 2010). A sequence of visual stimuli was shown to the children. Each stimulus was presented for 500 ms and was followed by a 2,500-ms interstimulus interval. During this interval, the children had to respond by pressing a pre-defined key each time when the location of the current stimulus was identical to one presented n positions back in the sequence; no response was required for non-targets. The stimulus material consisted of squares in a different color for each level of n. The level of n was increased by one if the child made fewer than three mistakes, and it was decreased by one if the child made more than five mistakes. One training session comprised 5–6 blocks consisting of 6 + n trials. After each block, children received feedback concerning their performance (percent correct). The average level of n of every training session served as the dependent variable defining the training performance, whereas the difference between the last two training sessions and the first two training sessions served as the dependent variable training gain.

WM animal span task

As a second training task, we chose an adaptive WM span task as used before (see Looslie et al., 2011). In this task, children were presented with a sequence of pictures of animals either normally oriented or upside-down. Firstly, they were asked to decide as quickly as possible on the orientation of the animal by pressing the right or the left mouse button. If children waited longer than 3,000 ms to give their answer, they were reminded to respond more quickly. Secondly, at the end of each animal sequence, they were asked to reproduce the chronological presentation order by clicking on the animals. Children received performance feedback and the next sequence length was increased by one if the child made no mistakes in orientation decisions and reproduction of the sequence. Similarly, it was reduced by one if the sequence was not correctly reproduced. The averaged sequence length of every training session served as the dependent variable defining the training performance, whereas the difference of the last two training sessions and the first two training sessions served as the dependent variable training gain.

Reading training

We used a computer-based reading training program (Lesewerkstatt; Isler, Bünzli, Fehr-Biscioni, & Tresch, 2010), which included a number of different reading games targeting reading comprehension, syntax, and word recognition, among other elements.

Scholastic ability tests

Reading

We applied a widely used German reading ability assessment which demonstrated good external validity, the Knuspels Lesetest (KNUSPELS-L; Marx, 1998). Of four subtests, three were selected for the current study: phonological encoding, phonological recoding, and reading comprehension. The first subtest requires children to read pronounceable pseudowords and to decide whether they sound like a real German word. In the second subtest, they have to decide whether pronounceable pseudoword pairs have the same pronunciation, although spelt differently. The third subtest requires children to carefully read sentences and exactly execute the task described. The sum of the correct responses in all the subtests was taken as the dependent variable indicating general reading ability.

Mathematics

To measure mathematical abilities, we used the Deutscher Mathematiktest für zweite Klassen (DEMAT 2+; Krajweski, Liehm, & Schneider, 2004). The test contains subsets dealing with characteristics of numbers, comparison of length, addition and subtraction, duplication and bisection, division, and counting with money. All subtests were included in our study. We chose the sum of all subtest scores as the dependent variable.

Cognitive ability tasks

Proxy for crystallized intelligence (Gc)

A vocabulary task was used as a measure of Gc, the Wortschatztest taken from the Culture Fair Intelligence Test (CFT; Grundintelligenztest CFT 20; Weiss, 1991). The test consisted of 30 words of colloquial vocabulary, which are not part of the basic vocabulary of the German language. Each task included a key word, and the children were asked to choose the word with the same or closest meaning from a sample of five words. They were allowed to work for 6 min. The number of correct word choices served as the dependent variable.

Proxy for fluid intelligence (Gf)

Fluid intelligence was assessed using either the even or odd items of Raven’s Progressive Matrices in counterbalanced order (RPM, 30 items; Raven, 1998). After two practice trials, children were allowed to work for 10 min and cross the right solution for each task. The number of correct solutions provided in this time limit was used as the dependent variable.

WM and inhibition

WM

To measure WM capacity, a backwards color recall task was included (see Roethlisberger, Neuenschwander, Michel, & Roebers, 2010). The task was carried out individually with each child. Children were presented with a sequence of colored discs on a computer screen and were asked to recall the sequence in the reverse order. The presentation time for each disc was 1 s. Sequence length was two at the beginning and increased by one item when the child correctly recalled two of three sequences on a particular level. The dependent variable was the number of trials of correctly reproduced sequences of colors (Schmid, Zoelch, & Roebers, 2008).

Stroop task

To measure cognitive control (inhibition component) of children, we used the fruit and vegetable Stroop task (Roethlisberger et al., 2010), an adapted form of the fruit Stroop task used by Archibald and Kerns (1999). The child had four tasks, each consisting of a practice trial and an experimental trial. In the first task, colored squares were presented, and in the second task, fruit and vegetables in their correct colors. Children were asked to name the colors they saw as fast as possible (congruent condition). In the third and fourth tasks, the fruit and vegetables were black and white and in wrong colors, respectively. Children were asked to name the correct color of the fruit and vegetables as fast as possible (incongruent condition). The degree of interference served as the dependent variable and was computed according to the formula in Archibald and Kerns (1999).

Temperament questionnaires

Effortful control

Information on a child’s effortful control was obtained by teachers (21 items; Cronbach’s α = .93) and parents (nine items; Cronbach’s α = .66) using the questions from the Children’s Behavior Questionnaire (CBQ; Putnam & Rothbart, 2006; adapted from Blair & Razza, 2007; cf. Michel, Roethlisberger, Neuenschwander, & Roebers, 2011). None of the other subscales of the CBQ were assessed. Teachers and parents responded on a 7-point Likert scale to express their opinion of how well a description of a behavior fitted that of the child. Questions referred to attention (e.g., “shows strong concentration when drawing or coloring in a book”), inhibitory control (e.g., “is good at following instructions”), anger (e.g., “gets angry when s/he can’t find something”), and approach (e.g., “becomes very excited before an outing”). The return rates of teacher and parent questionnaires were 100 % and 92 %, respectively. The average score of both questionnaires was used as the dependent variable indicating the level of effortful control of the participants.

Neuroticism

We used a self-reported questionnaire, form 1 of the Hamburger Neurotizismus- und Extraversionsskala für Kinder und Jugendliche (HANES, KJ; Buggle & Baumgaertel, 1975) and a parent-reported questionnaire, the Hierarchical Personality Inventory for Children, designed for children between 6 and 12 years of age (HiPIC; Mervielde & De Fruyt, 1999), for the assessment of the personality traits. The HANES questionnaire is based on Eysenck’s (1967) model of personality and is one of the German personality questionnaires most commonly used for children and adolescents. The questions were read out loud to the class and children responded with yes or no to each question. The HiPIC assesses the dimensions of the five-factor model of personality with the different facets hierarchically organized under these higher-order factors. Anxiety and self-confidence are the two facets regarding neuroticism. Parents were asked to indicate on a 5-point Likert scale the degree to which each statement was characteristic of their child. For this study, only the neuroticism subscale was included in the analysis.

Results

First, data from the temperament questionnaires were analyzed. The intercorrelation of the neuroticism scale in the self-reported and parent-reported questionnaires was r = .32 (p < .01). Regarding effortful control, the intercorrelation of the teachers’ and parents’ ratings was r = .44 (p < .001). Therefore, for both traits, the average score of both questionnaires was used in our data analysis. Furthermore, a correlation between the effortful control score and the behavioral result from the cognitive control test, the Stroop task, was detected (r = −.26, p < .01), supporting the validity of the questionnaires.

Next, we calculated standardized gain scores for the cognitive measures and the scholastic ability tests (gain divided by the standard deviation of the whole sample at pretest; cf. Jaeggi et al., 2011).

For the analyses of the subgroups based on individual differences in neuroticism and effortful control, subjects were assigned to three subgroups (with low, medium, and high scores) based on values around the mean and ± 1 standard deviation (SD) from the mean, as this is often done for personality traits (cf. Jokela et al., 2013). Because of the intercorrelation of neuroticism and effortful control (r = −.21, p < .05), a combined measure of the mean z-scores of both temperament traits was computed. Neuroticism scores were recoded to have the same direction as effortful control. That is, a higher combined temperament score represents higher effortful control and emotional stability. This combined measure represents self-regulation, as it combines innate temperamental predispositions to exercise better self-regulation and to maintain an optimal level of emotional arousal (Diamond, 2013). The same subgroup recoding was done for this combined measure.

Correlations among the cognitive and scholastic scores of all participants as well as from the training and transfer variables of the WM training group, together with the reliability data, are reported in Table 1. Note that only the two transfer factors that showed significant improvement patterns are included in the table.

Table 1 Correlations between cognitive baseline, training, and temperament variables

Overall WM training data

Children significantly improved their performance in the WM training tasks (animal span: t(32) = 8.85, d = 3.13; n-back: t(32) = 5.42, d = 1.92; both ps < .001), from an average animal span level of 2.28 (.27) in the first two sessions to a level of 3.28 (.46) in the last two sessions, and from an average n-back level of 1.41 (.27) to an n-back level of 1.89 (.46) (see Fig. 1). For the reading training group, no training data were registered.

Fig. 1
figure 1

a Gain scores in memory span and b gain scores in vocabulary (proxy for Gc, crystallized intelligence) as a function of the intervention group. Error bars represent standard errors of the mean. *p < .05. WM = working memory

Overall transfer data

To examine training benefits, we conducted ANOVAs for repeated measures (pretest and post-test session) and analyzed the improvement pattern as a function of group (WM training vs. reading training vs. no-contact control). Regarding near transfer on WM performance, there was a significant group × session interaction at the p < .10 level on the improvement in the backward color span task (group × test session: F(2,92) = 2.09, p = .065, ηp2 = .06), showing that the WM training group was the only one significantly improving from pretest to post-test (t(34) = 3.64, p < .05; see Fig. 1a). There was no such significant interaction regarding performance in the Stroop task (F < 1).

Regarding far transfer, our analyses revealed a significant differential training effect on the measure of vocabulary (group × test session: F(2,92) = 4.42, p = .02; ηp2 = .10, again establishing the WM training group as the only group with significant improvements from pretest to post-test (t(32) = 2.55, p < .05, see Fig. 1b). We found no significant training group interaction on the performance in the Raven’s Progressive Matrices (F(2,92) = 1.57, p = .22, ηp2 = .004).

Regarding scholastic abilities, the children of the WM training group demonstrated greater improvement by trend compared to the rest of the sample, but the group × test session interaction did not reach significance (math: F(2,92) = 1.22, p = .15, ηp2 = .02; reading: F(2,92) = 2.33, p = .10, ηp2 = .04).

Considering individual differences in transfer, there was a positive association between WM task performance at pretest and near transfer (r = −.50, p < .01) as well as far transfer (r = −.12, p < .05), suggesting stronger profit for children with initially lower WM capacity (see Table 1).

Long-term effects of training

We found no significant long-term effects in the variables memory span, cognitive control, Gf, Gc, and scholastic tests (all T < 1.4).

Moderator variables: Neuroticism and effortful control

Correlations of the temperament factors with the cognitive baseline and training measures are shown in Table 1. Effortful control was positively related to pretest performance in the scholastic measures and intelligence tests Gc and Gf. Regarding the WM training group, effortful control was positively associated with the average training level during the 4 weeks of training, the training gain, and by trend with the gain score in Gc. In contrast to this, neuroticism was negatively associated with Gf and math scores and, regarding the WM training group, it was negatively related to the average training level and the gain score in Gc.

Generally, WM performance in pretest was negatively correlated with the transfer gain scores in WM and Gc, suggesting that children with lower WM capacity profited most from the WM training.

Effortful control predicts WM training success

To disentangle the effects of the temperament variables on training outcome exceeding the influence of initial training task performance, general linear models were performed separately for the mean training level and the training gain score as dependent variables. As predictors, the score in the first two training session was entered in step 1, and both temperament variables were entered in step 2 (see Table 2).

Table 2 Results of regression analyses testing main effects of neuroticism and effortful control on working memory training performance

Results indicated that temperament variables accounted for 20 % of the variance of average training score, which significantly improved the prediction model (p < .01, f 2 = .25) after controlling for the influence of initial training task performance accounting for 41 % of the variance. Regarding training gain, temperament variables accounted for 16 % of the variance (p < .05, f 2 = .19), whereas the initial task performance did not account for variance of this factor. Looking at the contributions of the temperament variables to the prediction models, it appears that only effortful control was a significant unique predictor, whereas the effects of neuroticism disappeared.

Figure 2 shows the performance in both WM training tasks of the subgroups with high, medium, and low neuroticism and effortful control, respectively. Regarding neuroticism, the performance difference between the subgroups was not significant (see Fig. 2a and b). Regarding effortful control, high scorers clearly yielded higher levels of training performance (average training level in both tasks = 2.61(.29)) than children with average (2.29(.27); t(27) = 3.03, p = .003, d = 1.14) and low effort control (2.15(.27); t(20) = 3.23, p = .002, d = 1.64). More importantly, children with high effortful control showed significantly higher training gain (average training gain in both tasks = 0.82(.39)) than children with low effort control (0.37(.34); t(19) = 2.29, p = .015, d = 1.23 ). As shown in Fig. 2d, the moderation effect of effortful control was most obvious in the animal span task, a less complex and more monotonous WM training task than the more stimulating n-back task.

Fig. 2
figure 2

Mean training level (n-back task, animal span task) obtained during working memory training as a function of neuroticism (Neuro) and effortful control (EC) (recorded in three groups: high/medium/low scores). Error bars represent standard errors of the mean. *p < .05

Causal interaction of intervention and temperament on pre-post gain

To test moderator effects of temperament on pre-post gain in the intervention groups, multiple regressions were performed separately for the near- and far-transfer measures with standardized gain score entered as the dependent variable and, in the first step, intervention group as well as temperament traits as independent variables. In the second step, the interaction term (intervention group × temperament) was added to the equation (see Baron & Kenny, 1986). Interaction terms were created by multiplying the centered temperament scores with the group dummy variables.

Whereas the statistical interaction term of neither neuroticism × intervention nor effortful control × intervention became significant, the interaction of the combined temperament measure self-regulation × intervention significantly improved the prediction of WM gain (β = .22, t(84) = 1.53, p = .06, f 2 = .26) as well as the prediction of vocabulary gain (β = .20, t(83) = 1.88, p = .03, f 2 = .24). That is, the relationship between training group and pre-post gain varies according to the level of temperament, as demonstrated in the following subgroup analyses.

Subgroup analyses of training effects

Figure 3 visualizes the improvements in WM and Gc performance of the WM training group at different levels of neuroticism (Fig. 3a and b) and effortful control (Fig. 3c and d). As can be seen, the subgroups with high neuroticism and low effortful control, respectively, showed no transfer effects, justifying presenting data of the combined self-regulation measure (see Fig. 4).

Fig. 3
figure 3

Gain in working memory (WM) performance (backward color span task) and gain in vocabulary performance (proxy for Gc, crystallized intelligence) as a function of the intervention group, WM training group (WMT) versus active control group (AC) versus no-contact control group (PC), and of the combined temperament trait self-regulation (recoded in three groups: high/medium/low self-regulation (SR)). Error bars represent standard errors of the mean. *p < .05

Fig. 4
figure 4

Near transfer to working memory (WM) performance (backward color span task) and to vocabulary performance (proxy for Gc, crystallized intelligence) as a function of neuroticism (Neur) and effortful control (EC) (recorded in three groups: low/medium/high scores) of the WM training group. Error bars represent standard errors of the mean. *p < .05

General linear models (ANOVAs) with group (WM training vs. reading training vs. no-contact control) as the independent variable and the gain scores of WM and Gc performance were computed separately for the subgroups with high/medium/low self-regulation. Regarding near transfer on WM, a significant intervention group × session interaction on the improvement in the measure of WM was found only in the subgroup with high emotional regulation (group × test session: F(2,39) = 3.42, p < .05, ηp2 = .14). There was no such significant interaction in the subgroup with medium emotional regulation (F(2,28) = 1.28, p = .49, ηp2 = .04) nor in the subgroup with low emotional regulation (F(2,10) = .056, p = .95, ηp2 = .02). Regarding far transfer, subgroup analyses again revealed a significant differential training effect on the measure of vocabulary only in the subgroup with high emotional regulation (group × test session: F(2,42) = 5.54, p < .01, ηp2 = .20), whereas no such interaction was detected in the subgroups with medium emotional regulation (F(2, 30) = 2.62, p = .09, ηp2 = .07) within the subgroup with low emotional regulation (F(2,12) = .64, p = .55, ηp2 = .03). These results are presented in Fig. 4 and show that children scoring high in self-regulation reach the highest transfer effects in the WM training group, clearly outperforming the children from the active and the no-contact controls.

Discussion

Studies investigating effects of WM training in healthy children have led to inconsistent conclusions about near and far transfer effects on non-trained tasks. This study aimed to test such training effects in a school setting and to examine whether individual differences in neuroticism and effortful control can account for some differences in training and transfer success exhibited by children.

In this context, emphasis should be placed on two results from our analysis. First, WM training brings about a significant improvement in both a near-transfer WM task and a far-transfer task of crystallized intelligence. However, we intended not only to answer the hypotheses of “if it works” but also “for whom it works” (see Wu & Zumbo, 2008). That is, the moderator analysis allows us to understand to what degree WM training is more or less effective for subgroups of children. Therefore our second result is not less important, as it shows the critical role of temperament dispositions regarding training performance and transfer profit: Only children with low neuroticism and high effortful control, respectively or taken together, with high self-regulation profited from the WM training and outperformed the active and the no-contact controls, whereas this intervention did not show any impact for children with high neuroticism and low effortful control.

With regard to the first finding, children who took part in the WM training significantly improved their performance both in a backward color recall task, a measure for WM, and in a vocabulary test, a proxy for crystallized intelligence (Gc), in comparison to an active and no-contact control group. Participants in the WM training group had an average increase of almost 2 to 3 test points in WM and Gc measures, respectively, which represents a gain of 40–50 % in performance. Comparable to previous findings, children with initially low WM and therefore more room to improve showed higher transfer (see Au et al., 2014, for a review). Furthermore, transfer was found to be positively related to the average WM training score as well as associated with a trend to training gain (cf. Jaeggi et al., 2011). Regarding near-transfer performance on the WM task, the results replicate findings from studies demonstrating improved performance in non-trained WM tasks after an intense training period (e.g., Dahlin et al., 2008; Schmiedek, Lovden, & Lindenberger, 2010). Bearing in mind the significance of WM capacity for scholastic achievements, this result indicates promising possibilities for school settings, as for instance in the support of children with poor WM. Our far transfer result on a measure of Gc is comparable to the outcome of a WM training study by Alloway and her colleagues (2013). Based on the assumption that WM capacity is a crucial factor for learning and the ability to acquire new knowledge (cf. Gathercole, Alloway, Williw, & Adams, 2006), we suppose that the WM training applied in this study improved WM capacity and with it the acquisition of new knowledge, on the one hand, and the activation of present knowledge, on the other. Furthermore, as the CFT vocabulary task requires choosing a synonym from a choice of five words, it puts considerable strain on WM. The meaning of the target word needs to be decoded and memorized while decoding the five response options and comparing their meanings with the target word. Even though it would be premature to draw strong conclusions about this result, a supposition regarding the improvement in vocabulary is that WM training can be especially beneficial for children with word-activation problems, a hypothesis that needs to be corroborated in further research.

The question on whether and how long transfer effects last beyond the training period is still unresolved. In our study, we did not find long-term maintenance of transfer effects of WM training, which is in line with other short intervention studies (e.g., Buschkuehl et al., 2008; Kronenberger et al., 2011). We can only speculate about reasons, such as reduced motivation in the follow-up testing, or simply that the transfer effects were not strong enough to endure cognitive inferences following the intervention. However, there are a handful of studies providing encouraging evidence of long-term maintenance of transfer effects (e.g., Alloway et al., 2013; Borella et al., 2013; Holmes, Gathercole, & Dunning, 2009; Jaeggi et al., 2011; Salminen, Strobach, & Schubert, 2012). Much more research is needed into the possible reasons for training profit sometimes lasting and sometimes disappearing, and if methods like the occasional practice of booster sessions might be necessary in order to achieve better long-term effects (see e.g. Cepeda, Pashler, Vul, Wixted, & Rohrer, 2006).

Furthermore, we did not find significant transfer to matrix reasoning as a proxy for Gf, or to the scholastic ability tests. On the one hand, our data fail to replicate some earlier findings from WM training studies with school-aged children, which found improvement in Gf (e.g., Jaeggi et al., 2011) or in reading skills (Chein & Morrison, 2010; Looslie et al., 2011). On the other hand, the results are in line with a handful of studies which did not find far transfer on Gf or scholastic abilities (e.g., Holmes et al., 2010; Thorell et al., 2009). Some researchers presume that the lack of transfer to scholastic achievement as measured with standardized test instruments might not reflect actual improvement in everyday school performance (cf. St. Clair-Thompson, Stevens, Hunt, & Bolder, 2010). In any case, the aim is to enhance scholastic achievement rather than performance on laboratory-based assessments, therefore our results represent open questions and limitations on the utility and effectiveness of WM training. But even more than that, these findings point to the need to further investigate moderating variables regarding training regimen and individual factors which might explain some sources of these inconsistent results regarding WM training effects.

Regarding our second goal investigating the effects of two prominent temperament variables, namely on neuroticism and effortful control, on training outcomes, we found that they do account for significant variance in training performance and improvement in untrained tasks. It is important to note that we observed no differences in baseline WM and cognitive task performances as a function of temperament. Therefore, none of the described effects of temperament on training and transfer can be attributed to initial performance differences.

Regarding training performance and gain, even though effortful control did not relate to WM performance at pretest, our analyses revealed that effortful control is a good predictor for WM training mean and gain scores. It seems that individuals with low effortful control can show improved WM performance by increasing effort for the task, but that good effortful control is needed for a successful WM training by enabling a child to efficiently regulate emotional and cognitive processes and maintaining motivation and the optimal arousal for the training task (see Blair & Diamond, 2008). Effortful control is furthermore associated with successfully regulating external or internal distractions, which in our case was crucial for successful training, since training was conducted in groups in the classroom (cf. Carver, 2004). The differences between the subgroups were most obvious in the animal span task. This training task is less complex and more monotonous than the n-back task. Therefore, this result confirms the suggested role of effortful control to keep up training motivation and focused concentration even when the training task becomes monotonous and tedious.

Regarding neuroticism, we found a relation with decreased WM training average score of children (r = −.32). This replicates our previous result found in a sample of young adults (r = .24; Studer-Luethi et al., 2012). This effect can be attributed to cognitive and emotional interferences or stressful thoughts that adversely affect WM training performance, as the training largely relies on attentional control. Therefore, when effortful control and neuroticism were entered in the same prediction model for training performance, neuroticism did not explain additional significant variance in training performance. The experience of stress was shown to impair complex operations in the prefrontal cortex, but it could also improve the performance of simple or well-rehearsed tasks (see Arnsten, 2009). This is in line with the finding that neuroticism is not negatively related to WM training gain (r = −.09, n.s.), again replicating our previous finding in young adults (r = −.03, n.s., Studer-Luethi et al., 2012). Correspondingly, subjects with neuroticism were able to improve their WM training scores to levels comparable to emotionally more stable children. Another explanation comes from Eysenck and Calvo (1992), postulating that highly anxious individuals are afraid of negative evaluation and therefore highly motivated to improve their performance. That is, as the WM training tasks were not too complex, individuals with high neuroticism managed to compensate for their lower performance efficacy and reach good task performance by increasing their effort and using strategies. Furthermore, this finding is in line with accumulating research demonstrating that WM training can be beneficial for emotionally vulnerable subjects to improve cognitive performance and neural function, and even more than that, to increase regulation skills (see, e.g., Owens, Koster, & Derakshan, 2013).

Regarding transfer after training, our results show that the combined temperament measure representing self-regulation moderates observed improvements on WM and vocabulary in the WM training group. In other words, near and far transfer found in this investigation are critically dependent on the participants’ level of neuroticism and effortful control, showing the best result if combined in a self-regulation measure: WM training leads to superior gains in near- and far-transfer measures in comparison to the control groups only in subjects with good self-regulation, i.e., with low levels of neuroticism and high levels of effortful control. A speculative interpretation of this result would be that the cognitive load of the cognitive tasks and imposed by low self-regulation skills (e.g., distracting thoughts and emotions, experience of distress) as well as the suboptimal levels of arousal impede complex operations in the prefrontal cortex and diminish transfer processes to higher cognitive abilities (see also Arnsten, 2009). That is, the effectiveness of the WM training seems to depend on the dispositional self-regulation abilities of a child in order to control for stressful thoughts and avoid detrimental influences on PFC operations. This result is important from a treatment point of view, because WM training should not hold claims to efficacy for children if there is evidence that there are subgroups for which the intervention is ineffective. For these subgroups, alternative interventions focusing on other cognitive or emotional abilities may be more effective.

To sum up, our findings demonstrate the potential of an adaptive WM training, implemented in a regular school setting, to improve task performance in a near-transfer measure and in a far-transfer measure, and that neuroticism and effortful control are relevant variables when seeking to explain individual differences in both the training achievement and the transfer performance. That is, effortful control abilities seem to be necessary to perform well in WM training and show significant improvement in the training task, which can be attributed to the greater ability to focus attention on the training task and to inhibit impulsive responses, such as letting boredom and demotivation take over. Also, good effortful control and emotional stability, which can be summarized as self-regulation, seem to be necessary for the benefits of the training to extend to non-trained abilities. This is an important message to be considered in future training research and training regimens.

Limitations and implications

Some limitations have to be considered. The sample size for the comparison of the subgroups was rather small, so some of the null effects might have resulted from a lack of power. Also, we did not statistically control for multiple comparison and need to acknowledge that the transfer effects found are preliminary and require replication.

The results of the present study contribute to the call for evidence on factors that moderate WM training success (cf. Jaeggi et al., 2011; Morrison & Chein, 2011) and to the growing body of literature linking temperament with cognitive performance and learning (e.g., Duncan et al., 2007; McClelland, Morrison, & Holmes, 2000). The knowledge of subgroups for which WM training seems ineffective is important from a program development perspective because this can spur further research to find out what works for these groups so that they are not marginalized. It can move us closer to the goal of personalized treatment programs that match the needs of particular groups and individuals. That is, regarding children with good self-regulation, our findings indicate the potential of fostering their cognitive abilities by means of adaptive WM training. Regarding children with low self-regulation abilities, on the one hand, it could be more beneficial to promote self-regulation so as to increase learning and training processes, in order to strengthen the ability to acquire new knowledge, and to develop scholastic abilities (see also Blair & Razza, 2007). Even though most researchers recognize that some individual differences in the capacity for mastering emotionally challenging tasks are biologically grounded, most approaches assume that a significant proportion of self-regulation skills can be improved with practice and training (cf. Diamond, Barnett, Thomas, & Munro, 2007). For example, Lyons and Beilock (2011) concluded from their findings that schools should implement educational interventions which emphasize self-regulation rather than additional skill training in order to support children with scholastic weaknesses. A study which implemented such a classroom curriculum demonstrated improved cognitive control of preschool children (Diamond, Barnett, Thomas, & Munro, 2007). Additionally, a growing number of interventional studies have demonstrated that specific training programs targeting attention, focusing, and control can scaffold attentional control and self-regulation skills (e.g., Rueda, Rothbart, McCandliss, Saccamanno, & Posner, 2005; Tang et al., 2007). On the other hand, results from studies implementing cognitive training, such as emotional WM training, point to the high potential of cognitive training to be beneficial regarding affective and attentional control for both healthy and emotionally vulnerable subjects (e.g., Owens, Koster, & Derakshan, 2013; Schweizer et al., 2013).

Finally, the aim should be to design effective programs that focus on the unique needs of an individual. Such interventions should increasingly be promoted in schools and other institutions. More research is warranted to further disclose the role of individual differences in cognitive abilities, training, and transfer.