Attentional control is more important than ever in our modern everyday lives. It enables us to focus on specific tasks in the midst of a plethora of information. The possibility of enhancing it through training is attractive for a variety of groups in the population, ranging from the young to older adults. Indeed, the observation that the human brain is plastic and that cognitive processes become more efficient as a result of regular and focused mental exercise has encouraged researchers to examine the effects of targeted interventions, above all by means of computerized cognitive tasks (see, e.g., von Bastian and Oberauer, 2014). Consequently, research has elicited promising findings during the last decade by demonstrating learning effects and transfer to untrained tasks; most of these studies have applied the n-back task as a cognitive training paradigm (see Au et al., 2015; Schwaighofer et al., 2015; Soveri et al., 2017a, for recent meta-analyses). This has stirred widespread interest among researchers and the public at large (Simons et al., 2016) but has also attracted widespread criticism (e.g., Melby-Lervåg and Hulme, 2013). Many critics have focused on a lack of insight into how training tasks lead to training effects (e.g., Shipstead, Redick, and Engle, 2012a). One way to better understand such mechanisms is to investigate the differential effects of diverse training approaches. Thus, the present study compared the effects of the commonly used and extolled dual n-back task with those of a dichotic listening task, which places a heavy load on selective attention, and with active and passive control conditions.

In the dual n-back task, trainees simultaneously see and hear a series of stimuli. They are required to indicate whether each stimulus is the same as that seen or heard n items back. The task is assumed to train our working memory (WM), which represents our ability to simultaneously store and process information and hold it available for complex cognition at a given moment (Oberauer and Hein, 2012). WM has been defined as one core component of executive functions, which stands for a set of cognitive top–down mental processes needed for paying attention. Besides WM, the two other core executive functions are inhibition and shifting (Miyake et al., 2000).

WM has been linked to a number of important skills, such as attentional control, reasoning, and general intellectual capacity (Engle, 2018; Kane et al., 2007; Shipstead et al. 2012b; Wongupparaj et al., 2015). Indeed, training with the n-back task has been shown not only to enhance performance in the trained task but also to generalize to untrained tasks of WM and attention (Lilienthal et al., 2013; Pergher et al., 2018; Studer-Luethi et al., 2016), here referred to as near transfer, and higher-order cognition (Jaeggi et al., 2008; Jaušovec and Jaušovec, 2012; Klingberg, 2010; Soveri et al., 2017b), here referred to as far transfer. Even though far transfer to general intelligence was present most often in response to dual n-back training (Au et al., 2015; Blacker et al., 2017), they seem to be smaller and more inconsistent as compared to the more consistently observed near transfer effects (Soveri et al., 2017a).

In the forced-choice dichotic listening (DL) task, trainees are presented with auditory words via headphones; one word is played to the right ear, and a different word is simultaneously played to the left ear. The participant is instructed to direct attention to one of the ears and decide on the category of the presented word (e.g., natural vs. artificial). The inputs in each ear cross over to the contralateral cerebral hemisphere, while the ipsilateral inputs are automatically inhibited (Tallus et al., 2015). The task is assumed to train our attentional capacity by obliging trainees to direct their attention focus to one source of information while inhibiting the other (cf. Rothen and Meier, 2018). With that, the DL task puts high demand on the core executive function of inhibition, which is the capacity to obstruct dominant responses and to suppress the influence of interfering information (e.g., Bexkens et al., 2015). The DL task has been applied to assess impairments within attention, working memory, and executive functions (Hugdahl, 2011) and found to be beneficial for participants with auditive, verbal, or neurological impairments (Helland et al., 2018; McCullagh and Palmer, 2017; Osisanya and Adewunmi, 2018). Apart from that, little research has been done with this task, but some evidence has indicated improvements in auditory attention and attentional control after 4 weeks of DL training (Soveri et al., 2013). A more recent study demonstrated increased post-training attentional control at the neuronal level but no behavioral improvements (Tallus et al., 2015).

In response to some inconsistencies in cognitive training results, some studies focused on the potential modulatory roles of individual personal and motivational differences (e.g., Jaeggi et al., 2014; Studer-Luethi et al., 2016; Zhao et al., 2018). While most researchers agree on the relevance of individual, motivational, and emotional factors, findings on this topic are rather inconclusive (see, e.g., Borella et al., 2017; Katz et al., 2014; Linares et al., 2019; Maraver et al., 2016). It seems worthwhile to include personal and motivational factors in cognitive intervention designs to bring more clarity about possible links.

But what are the mechanisms through which training-induced improvements occur (see, e.g., Meiran et al., 2019)? The mismatch model of cognitive plasticity predicts that a rise in demand on cognitive processes results in increased resources associated with cognitive functioning (Lindenberger, 2014). When the training tasks continually and sufficiently challenge the upper limits of attention and memory, trainees’ cognitive abilities will increase in various cognitive tasks. Related to this model is the phenomenon of dual-task practice advantage which suggests an advantage of dual-task trainings versus single-task trainings in regard to their effects on performance in demanding cognitive tasks (see Strobach, 2020). Finally, other approaches assume that cognitive training enhances the efficiency of the specific processes involved in the training (e.g., Dahlin et al., 2008) or develops the highly specific skills required to perform specific cognitive tasks (Gathercole et al., 2019). In this case, training tasks targeting different components of executive functions (that is, WM updating/shifting vs. inhibition) are expected to show differential improvements on transfer tasks with similar or dissimilar cognitive demands (cf. Miyake et al., 2000).

The Present Study

The present study investigates whether WM training is effective in a sample of adults at a range of ages and whether various cognitive training approaches lead to differential cognitive improvements. Specifically, we aimed to compare the effects of 4 weeks of training with a new tablet-based version of the dual n-back task with a new version of the DL task on the same set of near-transfer measures of attention and memory and far-transfer measures of intelligence and daily life memory. To estimate the significance of the training effects, we compared them to an active control group using a simple listening (SL) training task and a no-training control group. The SL task followed the same structure as the DL task but consisted of identical auditory stimuli simultaneously presented to both ears. Therefore, no directing or shifting of attention was required in this task. In both versions of the listening tasks, we implemented a prospective memory task in the second part: participants were asked to react to a specific word (e.g., “dog”) by pressing a special button. The main reason for this addition was to keep the task demanding and interesting for the participants.

With perspective on the methods of the training tasks, the WM task, as well as the listening training tasks, includes a steady flow of information as well as a simultaneous presentation of information on two canals (n-back: one visual and one auditive information; DL: two different auditive information presented in one ear each). Also, all the training tasks combine attentional and memory demands.

We were interested whether possible training-related changes could be explained by the cognitive processes involved in specific training tasks. The assumption here was that while all three training tasks require measures of executive function, such as attentional control, the n-back task and the DL task put their focus on WM and inhibition, relatively. That is, the n-back task puts high demand on the WM components updating and shifting, whereas the DL task puts high demand on the attentional component inhibition (of irrelevant information) and with this on selective attention.

If cognitive training effects are unspecific, no differential gains should emerge across the three training conditions. If cognitive training is less specific but requires high attentional load and processing speed to increase general cognitive processes, we expected higher benefits for both the n-back and the DL training than for the SL training. The same expectation (advantage of the two dual training tasks versus the single training task regarding broader cognitive benefits) results from the dual-task practice advantage phenomenon. If n-back training’s high WM demands produce a specific effect, we expected higher improvements especially in far transfer measures in the dual n-back condition than in the other conditions. In this case, the DL training is assumed to show specific improvements of inhibition. In contrast, if cognitive training is not effective, we expected no differential retest improvements in either the trained or the untrained participants.

Furthermore, cognitive training effects should ultimately be evaluated with measures that more closely reflect real-life experience (cf. Soveri et al., 2017a). We were interested in whether training participants noticed any impact of the intervention on the mindfulness and memory performance they experienced in daily life.

Finally, we were interested in whether we would find associations of personality, emotion regulation, self-efficacy, and training motivation with training outcomes, since such individual variables can change the engagement, commitment, and persistence of trainees.



One hundred thirty participants (62 male) with a mean age of 26.26 years (SD = 10.62; range = 18–55) were recruited from the personal environment of the study leaders. Participants were required to be adults between the ages of 18 and 55, in good health, and not taking any drugs. The participants were not paid, but they received our collection of cognitive training tasks after the completion of the study. All participants received the same information, reported normal vision and audition, and provided informed, written consent before participation.

Assignment to the training groups was random except matching for gender and age. To complete the study design, participants for the passive control group were recruited later, after the 3 training groups ended the training. A higher number of participants, which were also matched for gender and age, were included for this group in order to enhance statistical power. The final sample consisted of 28 participants (mean age = 24.5 years; SD = 7.46; 11 male) in the dual n-back training, 30 participants (mean age = 25.6 years; SD = 10.01; 10 male) in the DL training, 24 participants (mean age = 26.70 years; SD = 11.45; 9 male) in the active control group (SL training), and 48 participants (mean age = 27.5; SD = 12.14; 32 male) in the no-training group.


The recruited participants were assigned to one of the four experimental groups before taking any tests. They took the pretraining behavioral test in groups of around 20 participants in a computer room at the university. After the completion of the pretests, each training participant received a tablet to take home and training instructions. Participants were instructed to schedule five training sessions each week for 4 weeks for a total of 20 sessions. Finally, all the participants were tested 2 to 5 days after their last training session.


Cognitive Tasks

Choice Reaction Task

In this task, arrows pointing to the right or left were presented on the screen (presentation time of max. 5000 ms, interval between 300 and 500 ms), followed by a black screen (500 ms). Participants were requested to press the corresponding arrow on the keyboard as fast as possible. Mean accuracy served as the dependent variable.

Task Switching

A total of 32 numbers (1–10) were serially presented on the screen. Participants were required to assign the numbers to one of two categories by pressing a predefined key as fast as possible. Crucially, the task changed from odd/even to lower/higher than five in an AABB order, thus enabling switch costs to be calculated. The dependent variable was the difference in accuracy between task change and task repetition.

Processing Speed

The digit symbol substitution test (DSST) of the Wechsler Adult Intelligence Scale (WAIS; Wechsler, 1958) consists of nine digit-symbol pairs (e.g., 1/-,8/X) followed by a list of digits. Participants were required to write the corresponding symbol under each digit as fast as possible. The number of correct symbols within the time allowed (120 s) served as the dependent measure.

Fluid Intelligence

We used Raven’s Standard Progressive Matrices test (RPM; Raven, Raven, and Court, 1998) separated into two forms of 30 items (items were split into odd and even sets and counterbalanced across testing times). Participants saw a 3 × 3 matrix of shapes presented with the last shape missing and were required to choose the item that completed the pattern from a set of six to eight choices. Participants were given 10 min to complete the task. The number of correctly answered items served as the dependent variable.

Working Memory

Verbal working memory capacity was individually assessed with the backwards number span task of Wechsler Memory Scale (Wechsler, 1997). Starting with two numbers, growing sequences of numbers between 1 and 9 were read out, and the participant was required to repeat each sequence in reverse order. The number of correctly reproduced sequences served as the dependent variable.

Episodic Memory

A total of 48 words consisting of a maximum of 9 letters were presented serially and dichotically through headphones (interval of 200 ms). Participants were instructed to pay attention only to the words presented to one ear and decide on the category (part 1: flower vs. tree; part 2: furniture vs. clothes). Half of the word pairs presented were congruent (identical words), and the other half were incongruent (different words). After completing the tasks, participants were asked to recall as many of the words as possible in 2 min. The number of correctly recalled words served as the dependent variable (Muhmenthaler and Meier, 2019).

Self-Reported Measures


We used the German version of the Mindfulness Inventory (FMI; Walachet al. 2006), which consists of 14 items (e.g., “I feel connected to my experience in the here-and-now”). Answers are given on a Likert scale ranging from 1 (rarely) to 5 (almost always).

Memory in Everyday Life

We used the Prospective and Retrospective Memory Questionnaire (PRMQ; Smith et al. 2000) as a self-report measure of prospective and retrospective memory slips in everyday life. The questionnaire consists of 16 questions, 8 asking about retrospective memory failures (e.g., “Do you forget what you watched on television the previous day?”) and 8 concerning prospective failures (e.g., “Do you decide to do something in a few minutes’ time and then forget to do it?”). Answers are given on a Likert scale ranging from 1 (never) to 5 (very often).

Neuroticism and Conscientiousness

These two personality traits of the Big Five Model developed by McCrae and Costa (1999) were measured with 24 items from the NEO-FFI Questionnaire (Costa and McCrea, 1992): 12 items concerning neuroticism (e.g., “I’m often tense and nervous”) and 12 items concerning conscientiousness (e.g., “I try to conscientiously finish given tasks”). Participants are asked to rate their agreement with a statement on a Likert scale from 1 (strong disagreement) to 5 (strong agreement).

Emotion Regulation

We used the Emotion Regulation Skills Questionnaire (ERSQ; Berking and Znoj 2008) to measure emotional regulation competences. The 27 questions explore the emotional competencies of awareness, clarity, sensation, understanding, acceptance, resilience, self-support, willingness to confront, and modification (e.g., “I can influence my negative emotions”). Answers are given on a Likert scale ranging from 1 (very often) to 5 (very rarely).


To assess belief in one’s own capacity to handle difficulties and challenges in everyday life, we used the General Self-Efficacy Short Scale (GSE; Beierlein et al. 2013). Answers to the four items (e.g., “When I am confronted with a problem, I can usually find several solutions”) are given on a Likert scale ranging from 1 (strong disagreement) to 4 (strong agreement).

Training Tasks

Both of the training tasks are part of our cognitive training task collection designed for application on tablets and smartphones (Studer-Luethi et al. 2017).

Dual N-Back Task

We used the dual n-back procedure described by Jaeggi et al. (2008). We created a version with motivating features and constant direct feedback. In our version of the task, an animal such as a rabbit or mole appears at different locations on the screen (presentation time 500 ms, interstimulus interval 2500 ms). Simultaneously, one of the alphabetic letters is presented through the earphones. During each interval, the trainee is required to touch a predefined target button on the tablet screen and decide whether the current location of the animal and whether the heard letter is the same as n positions back in the sequence or to press a predefined nontarget button in any other case. Immediate feedback is provided at the top of the screen for each response in both the visual and auditory modalities (see Fig. 1a). For every level of n, there are three field sizes with 4, 8, and 11 grid compartments. If the trainee makes fewer than three mistakes, the field size increases. The level of n increases after successful completion of the third block. Similarly, the field size decreases after more than five mistakes, but the level of n decreases only after three unsuccessful blocks. After each block consisting of 20+ n trials, trainees receive performance feedback. Each training session consisted of 15 blocks and lasted approximately 20 min.

Fig. 1
figure 1

a Dual n-back task with animated animal appearing at various locations and the direct feedback function at the top of the screen. b Example response screen for the dichotic and simple listening tasks (concrete/abstract decision)

DL Task

In the forced-choice dichotic listening task, the participant is presented on each trial with two different words to each ear over the headphones. In the first part of the task, the participant is instructed to direct auditory attention to either the left or the right ear and assign the word to one of two categories by touching the corresponding button on the right or the left side of the screen (see Fig. 1b). During the 20 training sessions, the categories changed between concrete/abstract, English/German, male/female voices, natural/artificial sounds, smaller/bigger objects, and the to-be-attended ear (i.e., left vs. right). In the second part, a prospective memory task is added by instructing participants to react to a predefined word (i.e., “dog”) or category (i.e., animal) by pressing a special key (Meier et al. 2011). Each part of the task consisted of 90 words and lasted approximately 20 min.

SL Task

In the simple listening task, the participant is presented on each trial with the same word to both ears over the headphones. Thus, auditory attention is not to be directed to one ear as in the dichotic listening task. Apart from that, the procedure is identical for the simple and dichotic listening tasks.


Data Processing

Following outlier analysis, 5% of the data was trimmed to 3 SD above or under the mean scores. We compared pretest and post-test performance as a function of the three training groups and the no-training control group to analyze transfer effects. Descriptive data of pretest and post-test as well as within-group changes are presented in Table 1. Importantly, the 4 experimental groups did not significantly differ in any of their test performance at pretest (all t < 0.78, p = n.s.).

Table 1 Mean (M) and standard deviations (SD) of performance in the untrained cognitive measures and within group changes in these tasks (t values and effect size Cohen’s d for repeated measures)

Participants had to be excluded from the transfer analysis if they did not complete the post-tests (n = 4) or if they completed fewer than 17 of the 20 training sessions (dual n-back group: n = 4; DL training group: n = 3; SL training group: n = 6). This left a sample of N = 93 for training and transfer analysis.

Training Performance and Motivation

All three training groups showed improved performance across the 20 training sessions (see Fig. 2). The dependent measure for the dual n-back training was the average n-back level achieved in each session. The dependent measure for the dichotic listening task was the average accuracy of responses in each session. Mean performance in the dual n-back task increased from level 1.36 to level 9.69 (which is equivalent to level 4 in the classical dual n-back task; F(1.19) = 81.91, p < 0.001, ηp2 = 0.79). The mean DL and SL task performance increased in accuracy from 0.92 to 0.95 (F(1.19) = 3.17, p < 0.01, ηp2 = 0.10) and from 0.91 to 0.94 (F(1.19) = 4.41, p < 0.001, ηp2 = 0.17), respectively. Changes in reaction time differed: There were no significant changes in the dual n-back task (F < 1), but participants in the DL training decreased their reaction time from 1419 to 1204 ms (F(1.19) = 39.36, p < 0.001, ηp2 = 0.69) and those in the SL training from 1747 to 1438 ms (F(1.19) = 28.66, p < 0.001, ηp2 = 0.61).

Fig. 2
figure 2

a Dual n-back task performance during the four training weeks (levels 1–3 correspond to 1-back, levels 4–6 correspond to 2-back, levels 7–9 to 3-back, levels 10–12 to 4-back). b Reaction time in the listening tasks during the four training weeks. In the dichotic listening task, the focus was on the right ear in the first 2 weeks and on the left in the second 2 weeks. c Accuracy in the listening tasks during the four training weeks. In the dichotic listening task, the focus was on the right ear in the first 2 weeks and on the left in the second 2 weeks

We also collected feedback after completing the training: (1) How motivated were you for the training? (2) How much did you enjoy the training? (3) To what extent did you feel it improved (a) your concentration, (b) your responsiveness, (c) your memory performance? We found no differences between the three training groups for any of these variables. When examining association with training performance, we found a positive relation between training enjoyment and DL training performance (r = 0.34, p < 0.01) and with higher n-back training by trend (r = 0.29, p = 0.08). However, we did not find any significant association between these motivational variables and transfer performance in any of the training groups.


We conducted ANOVAs for repeated measuresFootnote 1 for the transfer variables with the factors group (dual n-back training group, DL training group, SL active control group, no training control group), and time (pre- and post-training assessment). We conducted post hoc analyses of differences of means (Δ), corrected for multiple comparisons with the Bonferroni correction. We also computed the within-group changes by calculating the effect size Cohen’s d with the correction for repeated measures as proposed by Morris (2007). The resulting transfer effects are presented in Table 1 and Fig. 3.

Fig. 3
figure 3

Percentage change from pre- to post-training performance in all transfer measures

Cognitive Tasks

Choice Reaction Accuracy

There was no main effect of time (F(1.112) = 0.993, p = 0.321, ηp2 = 0.009) and a marginally significant time × group interaction (F(3.112) = 2.07, p = 0.05, ηp2 = 0.053). Post hoc comparison revealed that only the DL training group showed significant improvements (Δ = 1.20, p = 0.015), whereas the other groups did not change their performance (all p > 0.10).

Switching Cost Accuracy

There was a significant effect of time (F(1.109) = 3.34, p = 0.035, ηp2 = 0.03) but no significant time × group interaction (F(3.109) = 1.12, p = 0.34, ηp2 = 0.03).

Processing Speed

All experimental groups improved their processing speed (F(1.114) = 44.43, p < 0.001, ηp2 = 0.28), but this improvement did not depend on a significant time × group interaction (F(3.114) = 0.17, p = 0.92, ηp2 = 0.004).


There was a general improvement of WM performance over time (F(1.85) = 11.98, p = 0.001, ηp2 = 0.123) and no significant time × group interaction (F(3.107) = 0.96, p = 0.42, ηp2 = 0.033). Post hoc comparisons revealed that the no-training control group was the only group with no significant change (Δ = 0.26, p = 0.47), while all training groups significantly increased their WM span (dual n-back training: Δ = 0.56, p = 0.05; DL training group: Δ = 1.13, p = 0.01; SL training group: Δ = 0.93, p = 0.04).

Episodic Memory

There was a significant main effect of time (F(1.111) = 10.40, p = 0.002, ηp2 = 0.086), and there was a trend for a significant time × group interaction (F(3.111) = 1.78, p = 0.07, ηp2 = 0.046). Post hoc comparison revealed a significant change in the dual n-back training (Δ = 2.48, p < 0.001) and DL training (Δ = 1.86, p < 0.001) groups but no improvements in the SL training (Δ = 0.01, p = 0.91) and the no-training control (Δ = 0.90, p = 0.09) groups.

Fluid Intelligence

None of the experimental groups significantly improved their performance in the RPM test (F(1.107) = 0.53, p = 0.47, ηp2 = 0.005), and there was no significant time × group interaction (F(3.107) = 0.32, p = 0.81, ηp2 = 0.009).

Self-Report Measures

Prospective and Retrospective Memory

Participants reported no improvement in their daily life memory performance (F(1.80) = 0.051, p = 0.82, ηp2 = 0.001), and there was no time × group interaction (F(3.80) = 2.84, p = 0.54, ηp2 = 0.026).


There was a significant main effect of time on mindfulness (F(1.80) = 2.82, p = 0.045, ηp2 = 0.034), and this change in performance did not depend on the time × group interaction (F(3.80) = 0.94, p = 0.43, ηp2 = 0.034). However, post hoc comparisons revealed that the DL training group was the only group with a significant increase in their mindfulness score (Δ = 1.70, p = 0.015); the other groups did not show any change in their perceived mindfulness (all p > 0.54; see Fig. 4).

Fig. 4
figure 4

Change in self-reported mindfulness in daily life from pretest (t1) to post-test (t2) for each group


We found no significant correlations of neuroticism or conscientiousness with training and transfer performance (all r < 0.20).


Self-efficacy was related to higher n-back training level average (r = 0.39) and gain (r = 0.34, both p < 0.05) but not to pre–post training gains (all r < 0.22).

Emotion Regulation

There was no significant association between emotion regulation and training outcomes (all r < 0.20), but a positive association with overall training motivation (r = 0.22, p < 0.05) and training enjoyment (r = 0.26, p < 0.05).


Our study tested the generality of training benefits across two paradigms: the widely spread dual n-back training task and a new DL task. Specifically, we compared the effects of 4 weeks of training with either of these tasks on attention, memory, reasoning, self-reported memory in daily life, and mindfulness to SL training and no-training control groups. Participants were healthy adults aged 18–55 years. The results demonstrated improved performance on the trained tasks, but only weak evidence for distinct training benefits and differential effects of the n-back and listening training approaches on objective and subjective transfer measures. Nevertheless, trends suggest the benefits of n-back training on memory performance in comparison to no training. This effect, however, was no stronger than in the DL and SL training groups, which demonstrated comparable memory changes. Additionally, we found enhanced attentional control and even indications of a possible positive impact on mindfulness after the DL training.

Training and Near-Transfer Effects

Participants significantly increased their performance on the trained tasks: the participants in the n-back training group increased their level of n, and the participants in the DL and SL tasks increased their performance accuracy and reaction times. But did these enhancements transfer to broader measures of memory and processing speed?

We found marginally significant between-groups results for choice reaction and memory performance but not for our measures of task switching and processing speed. DL training led to higher choice reaction performance than the less attention-stimulating SL training, the dual n-back task training, or the no-training control group, indicating a specific near-transfer effect of the trained processes. It seems that the high demand of the DL training task on inhibitory control by focusing on one of the dichotically presented spoken words while ignoring the other improved this specific cognitive skill and transferred to the choice reaction performance (cf. Gathercole et al., 2019; Holmes et al., 2019). The improvement found here in attentional control is in line with the few relevant studies that exist, even though behavioral improvements were not consistently found (Tallus et al., 2015).

We also found increased performance in episodic memory in all the three training groups, in contrast to the no-training control group. Post hoc analyses revealed the same pattern for the improvements in WM performance, thus indicating some general training benefits on memory without specific advantages for any of the training approaches. This result is less surprising for the dual n-back task, with its high load on divided attention and temporary storage, updating, and retrieval of stimuli in two different modalities and in line with other findings demonstrating that n-back training can elicit significant improvements to attention and WM span (e.g., Harrison et al., 2013; Richmond et al., 2014; von Bastian and Oberauer, 2013). Such training effects were also found to be reflected in changes in frontal alpha power (Blacker et al., 2017) and changes in the frontostriatal system (see Salmi et al., 2018, for a recent meta-analysis). The comparable change in post-training memory performance in both listening training groups is rather surprising, as the integrated prospective memory task was the only active memory ingredient in these training tasks, keeping the memory load quite low. Moreover, we were surprised by the missing distinctiveness between the two listening tasks, since the DL training demands many more attentional and inhibitory resources than the SL. It is possible that the earphones, which shield the wearer from external stimuli, have facilitated concentration and therefore increased decoding ability. Another possibility is that the demand in all the training tasks on executive functions, such as controlled attention while temporarily storing items, may have increased individuals’ abilities to keep larger quantities of information active (cf. Chein and Morrison, 2010). This would indicate a rather general than specific improvement and would be in line with the first mechanism suggested by von Bastian and Oberauer (e.g., 2014). Alternatively, all the training tasks could have increased such general abilities as persistence, focus, willpower, and motivation to use the memory and some strategies, resulting in an unspecific enhancement of memory performance (cf. Gibson et al., 2013).

Conversely, the lack of improvement in task switching and processing speed supports the assumption that the interventions increased memory efficiency through improvements in specific skills and strategies acquired during training, which would be in line with von Bastian and Oberauer’s (2014) second suggested mechanism (cf. Fellman et al., 2020; Nutley and Söderqvist, 2017). The same is true for the effect of the DL training on choice reaction performance, which appears to be a training-specific enhancement of a specific cognitive skill. These acquired skills seem to be quite specific but not as specific as postulated by Gathercole (2019), since our WM and episodic memory measures were structurally different from the trained tasks.

It is likely that the processes suggested above interact with each other and are not mutually exclusive (cf. Söderqvist and Nutley, 2015).

Far Transfer

We found no evidence of far transfer to performance in a fluid intelligence test. We did not even find a practice effect, in that none of our experimental groups improved their performance in the test (see also Redick et al., 2013; Thompson et al., 2013). Thus, we were unable to replicate the transfer of dual n-back training to measures of fluid intelligence and reasoning reported by others (e.g., Jaeggi et al., 2008; Jaeggi et al., 2010; Jaušovec and Jaušovec, 2012; Rudebeck et al., 2012; Stephenson and Halpern, 2013). This result is consistent with previous failures to find far transfer to fluid intelligence following WM training (Chooi and Thompson, 2012; Colom et al., 2013; Harrison et al., 2013; Minear et al., 2016; Redick et al., 2013; Schwarb et al., 2016; Sprenger et al., 2013) and is in line with the conclusion that training improvements do not transfer to untrained cognitive tasks that are not closely related to the training task (e.g., Chooi and Thompson, 2012; De Simoni and von Bastian, 2018; Gordon et al., 2019; Lawlor-Savage and Goghari, 2016; Melby-Lervåg et al., 2016).

Commitment and motivation may also have influenced this negative finding. For instance, our sample, with their range of ages and backgrounds, may have invested less in their training programs than did samples of university students, who may be more achievement driven (cf. Jaeggi et al., 2014; regarding the influence of achievement motivation, see Zhao et al., 2018). The training level of our sample was indeed a little lower (level 4) than some reported levels (level 5; e.g., Jaeggi et al., 2008) but similar to others that found post-training improvements in fluid intelligence (e.g., Jaeggi et al., 2010; but see Tidwell et al., 2014, for a discussion of the significance of training gain). Our participants reported that they had been rather motivated for the training (average of 3.4 points on a scale from 1 to 5) and more or less enjoyed the training (average of 3 points). They reported that they experienced few cognitive benefits from the training (average of 2.5). The lack of comparative values in other training studies makes interpretation and conclusions about the possible modulating effects of training motivation difficult (cf. Minear et al., 2016).

Self-Reported Measures of Memory in Daily Life and Mindfulness

Following Soveri et al. (2017a) suggestion that training effects should ultimately be evaluated with measures that more closely reflect real-life experience, we assessed self-reported retrospective and prospective memory in daily life and mindfulness. Theoretical perspectives led us to expect that the dual n-back, which demands the temporal storage, update, and retrieval of stimuli, would stimulate retrospective memory. By contrast, the listening tasks require reactions to predefined items and are therefore assumed to stimulate prospective memory. However, our participants did not report any significant improvement in retrospective or prospective memory in everyday life, independent of experimental condition. This is in line with other investigations which failed to find prospective memory improvements after cognitive intervention (e.g., Zhao et al., 2019).

The DL training group was the only group to report increases in mindfulness experienced in daily life, as post hoc analyses showed, even though the effect of experimental condition did not reach significance. A speculation to this result might be that elements of DL training, especially constant selective attention and attention switching, increased participants’ experience of mindfulness after training by increasing attentional control. This is a weak but nevertheless promising exploratory finding with practical implications, given that mindfulness is linked to increased positivity, a greater sense of coherence, better quality of life, more empathy, more satisfying relationships, and greater hope (Vago and Silbersweig, 2012). Although there is evidence that mindfulness training leads to improvements in cognitive abilities (van Vugt and Jha, 2011) and stress coping (cf., Grossman et al., 2004; Ramasubramanian, 2017), further research should investigate the possibility of increasing mindfulness through specific cognitive training.

Moderating Role of Individual Differences

We found no modulatory effect for neuroticism and conscientiousness or for effortful control on training and transfer outcomes. This is in contrast to earlier studies that demonstrated an interaction between neuroticism or conscientiousness (also called grit), or the related concept of achievement motivation, and training improvements (Studer-Luethi et al., 2012; Zhao et al., 2018). However, our study is not the only one that failed to reveal any significant modulatory effects of these variables (e.g., Minear et al., 2016; Sprenger et al., 2013; Thompson et al., 2013). Possible reasons for this finding are that the tested variables are not directly associated with differences in the effort put into the cognitive training, or that the range of scores on these measures was not sufficiently large, or simply that the correlations are underpowered.

In contrast, we found some moderating effects of emotional abilities. There was a positive association between self-efficacy and n-back training performance and improvement. Generally, self-efficacy is defined as belief in the capacity to achieve desired goals in particular situations (Bandura 1997). In line with our result, other studies have shown that individuals who are high in self-efficacy are more likely to adhere to an exercise regime (Marcus et al., 1994) and that old adults with higher levels of self-efficacy showed greater responsiveness to reasoning training (Payne et al., 2012). It is possible that our participants with higher self-efficacy beliefs self-regulated their training behavior with strategies that boosted task performance and/or by allocating more effort and attentional engagement to the training (see Barnett, 2014). We also found that the better participants’ abilities to regulate emotions were, the higher were their training motivation and enjoyment. These results demonstrate that cognitive training profits from the emotional abilities of participants to cope with negative emotions and frustration. Other research has demonstrated positive effects of WM training on emotion regulation abilities (Xiu et al., 2018).

Limitations and Outlook

The most common limitation of most training studies like ours is the small sample size. According to Karbach and Verhaeghen (2014) or Lawlor-Savage and Goghari (2016), sample sizes of over 300–400 trainees are necessary to reach 80% power. Furthermore, research is still lacking on exactly what various cognitive tasks actually measure and on their reliability over time. Together, this results in weak statistical power and consequently lower chances of finding beneficial effects of training.

Another limitation of our investigation was the use of individual tasks to assess pre–post training changes, which is in contrast to the call to use construct-level variables (Chooi and Thompson, 2012; Colom et al., 2013; Shipstead, Redick, and Engle, 2012a). Basically, we faced the challenge of applying theoretical claims to assess constructs with multiple measures; this is at odds with the practical limitations on participants’ time commitment, as they had already invested a lot of time in the training.

Furthermore, the application of conceptually different cognitive training tasks makes it more difficult to compare effects and draw inferences about underlying mechanisms of transfer. Nevertheless, the approach applied here, comparing various promising training approaches and their benefits, provides practical implications for the effects of implemented cognitive training. Either way, divergent research methods make it hard to compare training results with one another (see Pergher et al., 2020).

In addition, even though we assessed self-reported data concerning memory in daily life, conclusions regarding the benefits of cognitive training in daily life must remain tentative. Our assumption is that most training-induced changes are fine-grained and not consciously experienced. Further research should try to assess memory performance applied in daily life as well as the experience of mindfulness at different time points.

Our results agree with findings from many other studies indicating that neither the efficacy of diverse cognitive training tasks nor the mechanisms underlying possible training benefits are well understood. Also, our results do not confirm a general advantage of dual-task training versus single-task training as outlined by Strobach (2020). Certainly, critical future directions for the cognitive training field are to collect data with larger samples, to compare more diverse types of cognitive training tasks, to implement these in extended interventions, to systematically investigate transfer to construct-level variables and changes in memory performance and well-being in daily life, and to identify neural changes following cognitive training.


It is important for cognitive training research to compare variants of training tasks and thus shed light on the task features and mechanism behind training benefits. Our aim was to investigate what distinguishes the effects of training tasks which stimulate simultaneous processing of two sources of stimuli and with different demands on core executive functions. We compared n-back training, DL training, simple listening training, and no-training control conditions regarding effects on both objective and self-reported cognitive performance in a sample of participants of diverse ages and backgrounds. Our evidence suggests that the effects of these types of training are neither strong nor distinct but nonetheless promising. All three training conditions showed some improvement on measures of episodic memory and WM, in contrast to a non-training control group. Additionally, DL training boosted attentional control and tended to result in increased self-reported mindfulness in daily life. Moreover, we found that participants with high self-efficacy reached higher training levels, indicating the importance of personal beliefs for successful training. Our results also confirmed the importance of the emotion regulation abilities of participants for their training motivation and enjoyment. More research is needed to explore the bidirectional relation between emotional competence and cognitive training.

The question remains which task features and cognitive demands are crucial for cognitive benefits. These may include memory and inhibition demands and the demand of simultaneous multimodal information processing. However, such simultaneous task demands do not seem to be sufficient for cognitive benefits to occur, as found in the present as well as in other studies (e.g., Jaeggi et al., 2010). Further work is needed to provide cognitive training tasks as effective tools to increase attentional control and general cognitive performance. Our results contribute to the field of WM training by demonstrating that a dichotic listening task with prospective memory ingredients is comparable in effect to a dual n-back task in slightly improving memory.