1 Introduction

Executive functions (EFs) are a family of general cognitive abilities that are important for paying attention, goal-directed behavior, and problem solving. They include components such as inhibition, working memory, and cognitive flexibility. These general cognitive abilities are often the target of research programs, as EFs are important predictors for academic success, socioeconomic status, health, and quality of life (Diamond 2013). It is therefore crucial to strengthen these abilities even in early childhood. Empirical research has shown that children’s EFs can be enhanced by different interventions (Diamond and Lee 2011). Musical training could be one such promoting intervention. Theoretically, it seems possible that learning a musical instrument could also benefit children’s EFs inasmuch as playing music requires different components of EFs (Jäncke 2009). Empirically, recent studies have indicated that musical experiences can increase EFs in children (see, e.g., Bugos and DeMarie 2017; Frischen et al. 2019, 2021; Holochwost et al. 2017). However, studies investigating the potential influence of music training on EFs differ tremendously with regard to study design (i.e., methodological quality), kind of music intervention, and EFs included. Therefore, it is important to review the existing literature systematically to provide an overview of our current knowledge about this relation, indicate research gaps, and build a basis for potential practical implications.

1.1 Executive functions: definition, components, and development

EFs are a set of domain-general cognitive abilities that enable us to pay attention, perform goal-directed behavior, and solve problems successfully (Zelazo et al. 2008). Components that can be subsumed under the term EFs include inhibition, fluency, set shifting, working memory, planning, and organization. Although there are many different EFs, it is generally agreed that they can be encapsulated in the three core EFs: inhibition, working memory, and set shifting (Diamond 2013; Miyake et al. 2000). The term EFs is primarily understood as describing pure analytical problem-solving as measured with abstract, decontextualized tasks (e.g., the Wisconsin Card Sorting Test; Milner 1963). However, it has been shown empirically that the EF family should include not only “cold” analytical problem-solving, but also problem-solving that involves emotional or motivational components (Zelazo and Carlson 2012; Zelazo et al. 2005). Such “hot” EFs have thus been integrated into the existing model of EFs. Whereas hot EFs are related more closely to decision-making in everyday life and employed when a problem or task is of emotional significance for an individual, cold EFs tend to be associated with abstract thinking (Prencipe et al. 2011).

The development of EFs begins in early childhood (Diamond 2006) and continues rapidly during preschool and early school years (Zelazo et al. 2003); their maturation, however, is not completed until adulthood (Zelazo and Carlson 2012). It is important to note that single components of EFs differ in their developmental trajectories (Best and Miller 2010). In the preschool years, inhibition seems to be the most important EF (Wiebe et al. 2008), showing strong improvement (Best and Miller 2010). Similarly, working memory also develops early in life. More complex EFs develop later, in school-age children and young adults (Diamond 2013). Set-shifting relies on inhibition and working memory and thus takes the longest to develop and is least developed in preschool-age children (Crone et al. 2004; Senn et al. 2004). Similarly, fluency and planning show slower development as well (Romine and Reynolds 2005).

Compared to cold EFs, hot EFs show a more prolonged development. For example, the results of studies on decision making indicate that performance in the Iowa Gambling Task improves with age, with best performance being achieved only in adulthood (Crone and van der Molen 2004; Prencipe et al. 2011). Given that cold, analytic processes are equally required for the task completion of hot EFs, the findings that hot EFs exhibit a delayed developmental trajectory seems plausible (Peterson and Welsh 2014). This knowledge of developmental trajectories is particularly important as it can be assumed that the effectiveness of training at a certain age is linked to the development of a given EF. This in turn makes it clear that a systematic review that includes information about different components of EFs and age groups is highly relevant for example to plan successful interventions at a particular age (but see also the Cognitive Complexity and Control theory [CCC] of Zelazo and Frye 1998).

1.2 Training of executive functions: potential contributions from musical training

The successful training of EFs in children is of high practical significance for promoting successful participation in the school classroom. Since executive functions such as inhibition undergo essential development in early childhood (Best and Miller 2010), and given that executive functions are important for educational success (Diamond 2013), effective interventions at this age are of great importance for mastering school relevant skills for children who are just beginning their school career.

According to Diamond and Lee (2011) there are some aspects that are crucial for the successful training of EFs. They point out that (a) the training should be engaging (address children’s interests and passions). Additionally, activities should (b) give children a feeling of social belonging and acceptance (foster emotional and social development). Therefore, it is important that these activities are not too narrowly focused on the training itself. Moreover, the training should (c) be challenging, which can be effected by increasing difficulty and continuity; the feeling of being challenged is important for improvement. Given that it naturally includes all components for successful training, it appears that musical training might be perfect for enhancing EFs.

Making music is a complex activity (Jäncke 2009) that draws on several different cognitive abilities. Active music-making requires selective attention as well as set-shifting. Selective attention, for example, is needed when the musician listens carefully to the sound of another player. Set-shifting is important when new rules introduced by accidentals need to be taken into account. Planning is needed for monitoring practice and achievement. And inhibition is required, e.g., to initiate the correct movements on an instrument or to use hands and feet independently for musical performances (e.g., drum set or organ).

Musical training offers an opportunity for children to decide in playful group lessons (e.g., music kindergarten) which instruments or roles they would like to play. Later, while taking music lessons, children may decide together with their teacher which pieces they would like to work on. Their interests can thus be included in training, thus fostering engagement, i.e., aspect (a) above. Playing music is a joyful activity that is often associated with playing in a group, such as an ensemble or orchestra. This automatically builds a foundation for social and emotional development and often brings the group closer together inasmuch as they share a passion, thus constituting social belonging and acceptance, i.e., aspect (b) above. The difficulty level will increase automatically so that it playing the musical instrument remains challenging, i.e., aspect (c) above. Consequently, playing music combines all the aspects that, according to Diamond and Lee (2011), account for the effective training of EFs. It therefore appears likely that instrumental musical training has the potential to benefit the development of EFs in children.

1.3 Musical training and executive functions

Indeed, as postulated above, research of the past years has demonstrated several associations between music training and EFs in children. These studies found positive relations between instrumental music training and different measures of EFs, such as inhibition (see, e.g., Joret et al. 2017), working memory (Schellenberg 2011), fluency, set shifting (Degé et al. 2011; Zuk et al. 2014), processing speed (Zuk et al. 2014), and planning (Degé et al. 2011). Although correlational studies suggest a positive association between instrumental musical training and various measures of EFs, due to their correlational design it remains unclear if musical training has an impact on EFs. Therefore, these studies have not been considered in this systematic review. In addition to correlational evidence, longitudinal studies can show a potential influence of musical training on EFs. For example, musical training seems to benefit different measures of working memory, inhibition, cognitive flexibility, and planning in children (Bugos and DeMarie 2017; Frischen et al. 2019, 2021; Holochwost et al. 2017; Jaschke et al. 2018; but see also Alemán et al. 2017). These studies differed in their study design, ranging from longitudinal data on pre-existing groups (Habibi et al. 2018) to randomized controlled trials (see, e.g., Bugos and DeMarie 2017). Some of them implemented comprehensive musical trainings consisting of multiple components, such as instrumental lessons, music theory, singing, and orchestra rehearsal, as well as instrumental training on its own and pc-based training. Additionally, they focused on different age groups, from preschoolers to secondary-school children. Moreover, different EFs were investigated either in isolation or as a group. These considerations show that to systemize the age groups, EF components, kinds of music intervention, and above all the study design would benefit our understanding of the potential impact of musical training on EFs. Therefore, we have decided for our systematic review to exclude correlational evidence (as mentioned above) and report the longitudinal evidence in a two-step manner, with the first step including all longitudinal studies and the second step only these studies that manifested a form of randomization.

There are currently no studies that review the positive effects of musical training on EFs in children. Existing published systematic reviews have concentrated more broadly on a range of different cognitive abilities (see, e.g., Jaschke et al. 2013), but not on EFs specifically, given that most of the published longitudinal studies about musical training and EFs have appeared within the last few years. Recent meta-analysis (Sala and Gobet 2017, 2020) claim that there is no general cognitive benefit from music training. However, these studies also did not focus explicitly on EFs and, moreover, did not consider every cognitive skill individually but grouped them together. Following Jaschke et al. (2013), we would suggest that it is worth considering individual cognitive skills more precisely and focusing on the different types of musical training. Moreover, as EFs may develop particularly strongly in certain time windows and not every kind of musical training may necessitate executive functions to the same extent, we will include these aspects in our synthesis.

1.4 Aims of the review

This systematic review aims to explore the effects of musical training or musical interventions on EFs in children up to the age of 10. The main rationale for this review is to report the existing research exploring the impact of musical training on EFs in order to indicate which solid findings about relations already exist and where there may be opportunities for future research, and to lay the foundation for potential practical implications. We believe that these aims can best be reached by means of our envisaged two-step analysis of the existing literature.

2 Methods

2.1 Literature search

We searched the electronic databases Pubmed, PsychInfo, Web of Science, and Scopus for published research addressing the topic of the review. For the literature search we used terms related to music instruction (“music training” or “music lesson” or “music intervention” or “music”) and executive functions (“executive function” or “executive control” or “cognitive control” and “child”), and we searched the titles, abstracts, and keywords that came up.

2.2 Inclusion and exclusion criteria

Our review includes studies involving children up to the age of 10 that address any kind of musical training and measure at least one EF component or investigate neural networks associated with executive functions. Another inclusion criterion was that the studies be published in English. Therefore, studies that investigated age groups older than ten years and/or measured cognitive abilities other than executive functions as well as studies published in other languages were excluded. We concentrated on studies dealing with normally developing children as developmental disorders are a particular challenge and could lead to different results. Moreover, we excluded all studies with a cross-sectional design given that these give no indication of potential causal effects. For the best evidence of a systematic review, it was desirable to include only studies with a rigorous experimental design. However, only a few studies used designs with random treatment allocation and appropriate control groups. For this reason, we refrained from defining the inclusion criteria too narrowly and decided to include all studies that used at least a longitudinal design. The study quality of the selected articles was assessed by performing a quality assessment (for more details, please see Sect. 2.4 Risk of Bias Analysis below).

2.3 Process of study selection and coding

In the first step, the results of the database searches were screened for duplicates. After the removal of duplicates, the remaining abstracts were transferred to the online software Rayyan (https://www.rayyan.ai/), which is a tool for conducting systematic reviews. Two independent reviewers (the authors) independently screened titles and abstracts in relation to inclusion criteria. After they were unblinded, the reviewers discussed conflicting ratings until they came to an agreement. Subsequently, the reviewers screened the full texts of the remaining articles again in relation to the inclusion criteria and assigned labels regarding study design, age group, type of musical training, and outcome measures to help sort the studies based on their criteria.

2.4 Risk of bias analysis

To assess the quality of the studies we conducted a risk of bias analysis similar to that described in Burkhardt and Brennan (2012). The two reviewers independently assessed the quality of each study using a brief checklist. The checklist consisted of five criteria addressing the study design (trained control group, random treatment allocation, blind assessment of outcomes, maximum dropout of 20%, treatment and control group comparable at pretest). For each criterion the reviewers scored one point if the criterion was met or zero points if it was not met and added all points for each studyFootnote 1. Accordingly, the maximum score that could be awarded for a study was 5 and the minimum was 0. Scores of 0–1 indicated a high risk of bias and low quality of study design; scores of 2–3 indicated a moderate risk of bias and medium quality of study design; and scores of 4–5 indicated a low risk of bias and a high quality of study-design.

3 Results

The literature search identified 278 titles as found in the four databases. After the removal of duplicates, the titles and abstracts of the remaining n = 133 articles were screened by the two reviewers with regard to the inclusion criteria. The reviewers identified n = 50 reports to be considered for inclusion in the review. Of these, two were non-peer-reviewed publications and were therefore removed. The full texts of the remaining 48 articles were again independently screened by both reviewers. Finally, the reviewers identified 21 reports for inclusion in the review. For cold EFs only reports were included that applied a task to measure EFs. In particular for the study by Alemán et al. (2017) results based on the self-report measure were not included. Since we had only few reports assessing hot EFs we kept here the ratings as well as the tasks. According to the PRISMA guidelines (Page et al. 2021) an overview of the literature search process and study selection is shown in a flowchart (see Fig. 1).

Fig. 1
figure 1

Flow diagram about the literature search strategy and study selection process

The characteristics and the results of the selected articles are summarized in Table 1.

Table 1 Overview about studies included in the review

3.1 Overview of all studies applying a longitudinal design

From the 21 studies that were finally included in the systematic review, we were able to categorize 15 as realizing a form of randomization (randomized per individual, cluster randomized). Six studies used a longitudinal design based on existing groups without randomization (Fasano et al. 2019; Habibi et al. 2018; Hennessy et al. 2019; Maróti et al. 2019; Trainor et al. 2009; Vazou et al. 2020). We will first report on all 21 studies that we included and later narrow our report down to the studies with randomization.

The age of participants in the included studies ranged from 3 to 4 years up to 6 to 14 years (see Table 1). We included studies whose subjects’ age range mainly matched our intended age range (up to 10 years).

With regard to the musical training used in the included studies, we found a huge variety. Two studies worked with a pc-based music program; eight studies reported a form of instrumental training; 11 studies were categorized as applying a multimodal music training (relying on several different skills like vocal performance skills, rhythmic skills, gross-motor skills etc.); three studies applied singing exercises; 4 studies involved a rhythm-based training; two included movement elements; 2 featured orchestral training; one entailed theory training; and 1 study explicitly incorporated Suzuki piano lessons. Some studies (those offering several training features) were included in more than one category. Although the form of music training was very diverse, most of the music training programs took place in group settings (18 out of 21). The duration of musical training ranged from a short intervention of 20 days (see, e.g., Moreno et al. 2011) to longer music programs of up to four years (Hennessy et al. 2019).

All included studies measured cold EFs. Only 4 studies additionally measured hot EFs. Surprisingly, no study focused on hot EFs exclusively. In 20 of the 21 studies at least one positive effect of music training was reported. Only in one study no effect was found (Alemán et al. 2017). Within the 20 studies with a positive effect three studies revealed null findings (Janus et al. 2016; Park et al. 2015; Vazou et al. 2020), which means that a statistical hypothesis-testing procedure resulted in no relationship being found. In such studies, the music group indeed showed improvement in an EF, but the trained control group did so likewise. To elaborate in more detail: 15 studies reported a positive impact of musical training on one or more EFs; 4 studies found brain changes that might be interpreted as a positive effect of musical training (Habibi et al. 2018; Hennessy et al. 2019; Moreno et al. 2011; Trainor et al. 2009); and three studies found an improvement in both groups (Janus et al. 2016; Park et al. 2015; Vazou et al. 2020). Note that some studies measured both neuronal and behavioral data, therefore they appear twice in the list above. When the positive effects and null effects are sorted according to EFs, musical training is most often reported as influencing inhibition. However, a few studies also found null effects for inhibition. The included studies also found mixed results for working memory, with some indicating positive effects and others suggesting no effects of musical training. Only a few (between 1 and 3) studies reported a positive effect of musical training on selective attention, delay discounting, shifting, planning, fluency, and emotion regulation. A number of studies (between 1 and 6) showed null findings for shifting, planning, fluency, and emotion regulation.

Regarding the assessed risk of bias, there were 12 studies in the low-risk group and nine in the moderate-risk group.

In general, we aimed at lowering the number of moderate-risk studies included in the review by focusing on studies involving randomization. This would help to clarify which EFs would show consistently positive effects of musical training and for which null effects might predominate.

3.2 Overview of studies with random assignment

Overall, 15 of the 21 studies were categorized as applying a form of randomization. Eight studies allocated children randomly; 4 studies reported a cluster randomization; and 3 studies reported pseudo-randomization (random at first, then with some allocations to ensure equal groups).

Notably, but not surprisingly, the range of duration of musical training was smaller (randomized groups are harder to keep in training over longer periods than are self-selected groups), lasting from 20 days (Janus et al. 2016; Moreno et al. 2011) to 36 months (Holochwost et al. 2017). Focusing on the studies with randomization yielded a better ratio of low- to moderate-risk studies, i.e., 12 low-risk to 3 moderate-risk studies. Within these studies, each EF and its results are portrayed in more detail in order to gain a systematic impression of the results.

With regard to inhibition, we found that 86% of the included studies (13 out of 15) applied a measure of inhibition. Ten of these studies reported positive effects of musical training on inhibition. One of these ten studies provided evidence of a positive effect of music training on inhibition for neuronal as well as behavioural data (Moreno et al. 2011). Six of these positive reports worked with a sample of children who were 6 years old or younger, including one study with neuronal data (Moreno et al. 2011); whereas three studies worked with participants aged 7 and older. Altogether there were slightly more studies with younger children. In three studies, null effects for inhibition were found (Alemán et al. 2017; Guo et al. 2018; Williams and Berthelsen 2019). A comparison of applied forms of musical trainings and their durations showed that the musical trainings that yielded an impact on EFs were very mixed (instrumental, multimodal, pc-based, and rhythmical). This was also true for the studies that reported null findings (instrumental, rhythmical, and multimodal). However, rhythmical exercises were included in the training programs yielding a positive impact slightly more often than the other forms of training. Six of the 10 studies with a positive effect reported rhythmical exercises either as a part of the multimodal approach or as a distinct focus. As for duration of training, it appears that the studies with null findings involved shorter training phases (a maximum of three months) as compared with the studies with positive effects (all more than 3 months; except for one study).

With regard to shifting, our systematic review showed that 53% of the included studies (8 out of 15) assessed shifting. Four studies reported positive effects of musical training on shifting. However, in one of those studies, only boys improved (Williams and Berthelsen 2019), and in another the trained control group (creative movement) also showed improvements (Park et al. 2015). The four studies were evenly distributed across younger samples (≤ 6 years) and older samples (> 7 years). This was similar for the four studies with null findings (younger sample: 2, older sample: 2). No systematic difference between applied form of musical training was evident: The studies with positive effects used instrumental, multimodal, and rhythmical training programs, whereas the studies with null findings applied multimodal, rhythm and singing, and instrumental training. Similarly, training duration did not differ systematically; the duration in studies with positive effects ranged between 8 weeks and 36 months. For the studies with null findings, the range lay between 8 weeks and 24 months.

With respect to working memory, 66% of the included studies (10 out of 15) applied at least one measure of working memory. However, the working memory component is special in that several studies measured auditory as well as visual auditory memory. This resulted in 6 measures of auditory working memory and 6 measures of visual working memory. Here it is noteworthy that the tests applied for one memory component differed between studies. There were equal numbers of positive findings and null findings in the studies. In all, two studies revealed positive effects for auditory working memory and three studies found the same for visual working memory. Four studies reported no effect of music training on auditory working memory and four found no effect of music training on visual working memory. For positive effects and null effects, the age of the samples (young or old) were distributed equally. Similarly, no systematic influence could be revealed for type of training: Positive effects were yielded for both auditory and visual working memory by instrumental training and multimodal approaches. The duration of training was heterogeneous, ranging from 6 weeks to 36 months. Studies that reported null findings applied instrumental, multimodal, and pc-based trainings (auditory working memory) or multimodal, pc-based, and rhythmical trainings (visual working memory). Here, too, the duration of training was diverse, ranging from 20 days to 36 months.

With regard to planning, only 33% (5 out of 15) of the included studies assessed planning. Three of those studies found no impact of music training on planning in older samples (> 7 years). Only 1 study reported a positive effect of music training in a younger sample (≤ 6 years) (Bowmer et al. 2018) and one in an older sample (> 7 years). The positive influence was found with 8 weeks to 24 months of multimodal training. In the studies that reported null findings, instrumental and multimodal trainings were used. Training duration ranged from 8.5 months up to 24 months.

Concerning attention or selective attention, only 2 of the 15 studies (13%) measured attention or selective attention. Both studies tested the influence of music training on attention in samples of children 6 years old and older. The study by Frischen et al. (2021) found a positive influence of 8.5 months of instrumental training on selective attention. By contrast, the study by Alemán et al. (2017) showed no influence of 12 months of instrumental training and singing on attention.

With respect to fluency, only 2 out of 15 studies (13%) applied a measure of fluency as well. The study by Janus et al. (2016) found an improvement in verbal fluency after 20 days of a pc-based training in 4‑ to 6‑year-old children. However, the control group that received language-based training also improved. The study by Frischen et al. (2021) reported no influence of 8.5 months of instrumental training on design fluency in 6‑ to 7‑year-old children.

For both measures of hot EFs (emotional control and delay discount), only one study per construct was included. In the study by Williams and Berthelsen (2019) a positive effect of 8 weeks of rhythm and movement training on emotional control in 4‑ to 5‑year-old children was found. Whereas for delay discount no influence of 12 months of instrumental and singing training in 6‑ to 14-year-old children could be shown (Alemán et al. 2017).

4 Discussion

The majority (20 of 21) of the included studies in the part of the review with all longitudinal studies found a positive effect of music training on at least one EF. This was the case despite studies of different quality and features being reviewed together. Hence, there is support for the idea that musical training can enhance EFs. The ratio of studies reporting a positive effect to studies with null findings is similar for the part of the review that comprised randomized studies only: 14 studies reported a positive effect of music training on at least one EF, only one study did not find any effect on directly measured EFs (Alemán et al. 2017). However, two studies of the 14 finding a positive effect found an improvement in EFs for the music training group as well as for the trained control group (Janus et al. 2016; Park et al. 2015) and are therefore counted further as a null finding. The part of the review concerning studies with randomization supports the hypothesis that musical training may enhance EFs, and at the same time it suggests that other forms of potentially engaging and challenging training, such as second-language training (Janus et al. 2016) or creative movement training (Park et al. 2015), can lead to similar effects. Some studies included in our review applied neuronal measures associated with executive functions and found consistently positive results in relation to musical training. It would thus appear that musical training can indeed lead to structural neural plasticity. However, most of these studies, except Moreno et al. (2011) and Park et al. (2015), used longitudinal design without randomization. Thus, we cannot make reliable statements on causal effects.

Our review further indicated that the included studies were very diverse with respect to type and duration of musical training. It would be important to systemize this in future studies. In addition, the review showed that most studies have assessed cold EFs and that measures related to hot EFs have so far been neglected. This might be due to the fact that there are less reliable, well-evaluated tests to measure hot EFs. Nevertheless, this is also something that should be addressed in future research. Our risk of bias analysis demonstrated that some studies of high methodological quality already exist. However, as mentioned above, these studies differ in many methodological details, which makes a general comparison difficult. Therefore, there is still a need for more methodical high-quality research to disentangle and better explain the partly inconsistent results. It is important that studies with low risk of bias for all EFs be conducted in order to come to a firm conclusion about causal relations and to clarify the question as to whether only individual measures or all EFs can benefit from musical training.

The EFs that have been targeted most often by research projects were: inhibition, working memory, and shifting. Within this highly researched group, the results for inhibition seem to be the clearest. The vast majority of studies (10 out of 13) in the randomization part of our review reported positive effects of musical training on inhibition. This impact was relatively independent of the type of musical training and the age of the sample, although it seems that trainings that involved rhythm-guided movement and stimulated beat synchronization could be especially effective in younger samples (Frischen et al. 2019; Williams 2018). Generally, it seems that studies investigating inhibition were more often effective with younger samples (6 years old and younger). This makes sense especially when one considers that in earlier childhood EFs tend to constitute a unitary model, with inhibition being the key factor (Wiebe et al. 2008). Furthermore, such studies showed the likelihood of shorter training durations being less effective (i.e., yielded null findings more often). In sum, the reviewed studies can be interpreted as a strong support for the positive impact of musical training on inhibition. Future research should endeavor to further systemize age and kind of training.

Results regarding working memory proved more inconclusive than those for inhibition. Positive effects and null findings were reported with nearly equal frequency, and no systematic with respect to age or type of training was observable. During the review it became clear that although the studies differentiated between auditory and visual working memory, this categorization might not be fine-grained enough. It might even be reasonable to systemize on the test level. However, many more high-quality research studies would be needed for such systematization.

Similarly, there are no clear results for shifting. Here, too, we found nearly equal numbers of positive and null findings. Overall, our review demonstrates the relative paucity of high-quality studies, while those that do exist take very different approaches to not only the age of their subjects, but to the type and duration of musical training. It is, therefore, difficult to show any systematic results at the moment.

For the non-core cold EFs, such as planning, selective attention, and fluency, very little research has thus far been carried out. Here, too, the positive and null findings occur with equal frequency. Findings concerning selective attention show that different tasks may yield different results. This is also true for the studies that measured fluency. On the task level, it is evident that design fluency (Frischen et al. 2021) may be different from verbal fluency (Janus et al. 2016), although both are applied to measure fluency.

With regard to hot EFs, our review indicated that studies measuring them were extremely underrepresented. There were only a few among the longitudinal studies and even fewer among the randomized studies. More comprehensive research regarding musical training and hot EFs should be undertaken in future studies. It would likely be a good idea to assess the impact of musical training on hot and cold EFs in conjunction with each other in order to make a comparison between the two constructs. For example, the influence of musical training on inhibition together with delay discounting could be interesting to observe inasmuch as delay discounting can be considered the emotional (hot) counterpart of (cold) inhibition.

Overall, the results of the systematic review suggest that music training may be a promising activity for stimulating cold EFs. This kind of training seems particularly suitable for the enhancement of EFs because it contains all the effective components for a successful EFs training according to Diamond and Lee (2011) and, in addition, it engages many EFs simultaneously (Jäncke 2009; Okada and Slevc 2018) and is therefore extremely attention binding. Regarding the latter, music training differs from other leisure activities, such as creative arts or sports, in which EFs are not demanded to the same extent. Furthermore, the temporal structure in music may also be an important aspect, which is e.g., less present in creative arts activities. The temporal aspect in music may also contribute to high attention retention and the high demand on EFs. Furthermore, another great advantage is that musical activities can easily be integrated into everyday life in addition to professional training (e.g. welcome songs or other short musical sessions in the school context or at home). However, in addition to the outstanding suitability of music training, some study results also indicate that music is not the only leisure activity that can promote EFs, but that other complex activities can also lead to an improvement in EFs (Janus et al. 2016; Park et al. 2015).

4.1 (Practical) implications

Musical training as an intervention to enhance EFs appears to be promising. To be more precise, musical training can be a valuable tool in promoting inhibition. This knowledge could be of practical significance, for example, in the preparation of children for the transition from kindergarten to school. Musical training could be used like phonological awareness trainings, for example, in order to promote school readiness.

We have little knowledge about the effects that the type of musical training has on EFs. The included studies were very diverse with respect to type of training, and these very different types all had an impact on EFs, e.g., whether they were instrumental or pc-based (Frischen et al. 2021; Moreno et al. 2011). The only common feature of most of the musical training programs is that they involved a group setting. This may be an important feature, an idea supported by Kirschner and Tomasello (2009), who showed that joint drumming promotes synchronization in kindergarteners. Among the other criteria for the successful training of EFs, belonging and acceptance were also shown by Diamond and Lee (2011) to be important factors. For practical purposes, this implies among other things that musical training programs for preschoolers (as mentioned above) should take place in groups.

Concerning length of music training, the review shows again a very diverse picture. However, for inhibition the so far best studied component it looks as if lengthier programs (and probably more continuous programs) are more likely to yield positive effects. The studies with a positive effect were all above 3 months in their training duration and it went up to 24 months.

Regarding the age of participants, our review demonstrated that we lack systematic research here as well and that some studies have worked with huge age ranges: 6 to 14 years, for example, in Alemán et al. (2017). This is potentially problematic as it groups together children who are in very different developmental phases of EFs. Hence, developmental trajectories might have overshadowed the effects of music training.

Our risk of bias analysis and the procedure of study selection (see Fig. 1) demonstrate that more high-quality studies with low risk of bias are needed in order to make solid inferences about EFs other than inhibition.

4.2 Limitations of the review

Our review may have been influenced by publication bias, which can be considered a major threat to this kind of approach (Dickersin 1990). It affects the sum of all papers included in the systematic review. This systematic bias emerges because of the tendency of journals to publish papers that report significant effects more often and the tendency that researchers pay more attention to significant results when it comes down to plan what will be prepared for publishing. Therefore, a (systematic) review can contain a higher proportion of significant results, which is not only due to a really existing higher proportion of significant effects compared to non-significant effects, but can also be partly due to publication practices. Thus, all in all the alpha-level of the overarching body of evidence might be higher due to this bias. Hence, it is important to be aware of this threat to validity when interpreting the results of a systematic review. However, unfortunately there is no single statistical test that can guarantee a bias free systematic review or document an existing bias with certainty (Ioannidis 2008; Peters et al. 2010). Normally, it is also impossible to find all the unpublished non-significant studies. Thus, being aware of the publication bias is a good way of dealing with this problem. Moreover, having applied the PRISMA guidelines gives at least a detailed report how the different studies found their way into the systematic review. For future reviews it would be seminal to try to research grey literature as well or to carry out the whole systematic review only on preregistered studies, which should help to minimize publication bias. Furthermore, its scope may have been limited by including only papers published in English. Moreover, we did not search for grey literature or unpublished studies; however, this was done intentionally so as to include only peer-reviewed studies, which are expected to be of better research quality.

Altogether the review part about the randomized studies included mostly studies with a low risk of bias categorization, this makes the results of the whole systematic review more reliable. However, the studies in the longitudinal part are based on self-selected groups without randomization, therefore they should be treated with more caution.

The generalizability of the reported results could be problematic for some EFs (with the exception of inhibition), as many more studies would be needed for generalization. For some of the non-core EFs, extremely few studies have been conducted so far.

Another potential limitation of our results is that in the included studies different tests have been applied to measure the same construct. Future systematic reviews should evaluate the possibility of systemizing on the test level, and future empirical research should consider this carefully.

Finally, many studies lacked information about treatment fidelity, such as checklists (about attendance and training fidelity), training manuals, or training of trainers (professional or not). Sometimes even information essential for the risk of bias analysis was not reported in detail in the studies.

5 Conclusion

Taken together, the majority of the included studies (longitudinal or RCT) point to the positive effect of musical training on EFs. However, there is no way to rule out any effect of publication bias in our systematic review. Since most studies researched inhibition, findings for the impact of musical training on inhibition are the most reliable. In all, more high-quality studies that systemize type of training and training duration, apply randomization, involve trained control groups, and provide for treatment fidelity are needed.