Executive functions (EFs) are a set of top-down mental processes that enable us to pay attention and consciously guide behavior (Diamond 2013). Although it is not defined which components exactly count as EFs, there is a general agreement about the three core components cognitive flexibility, inhibition, and working memory (Hunter and Sparrow 2012). EFs are positively associated with academic success (Borella et al. 2010), IQ (Ardila et al. 2000), health (Miller et al. 2011), and quality of life (Davis et al. 2010). While playing music, EFs are used to a large extent (Jäncke 2009). Therefore, it might be possible that playing music regularly benefits EFs. During the past 10 years there have been several studies suggesting positive associations between music training and EFs in adults (e.g. Amer et al. 2013; Okada and Slevc 2018) and children (e.g. Degé et al. 2011; Joret’et al. 2017; Zuk et al. 2014). Additionally, experimental research suggesting an influence of music training on executive functions is growing (e.g. Bolduc et al. 2020; Bugos and DeMarie 2017; Degé et al. 2020; Frischen et al. 2019, 2021). However, nearly all studies concentrate on EFs as pure analytic tasks, the so-called cold EFs. By contrast, hot EFs involve processes driven by emotion and motivation. Since learning musical instruments requires a lot of motivation and self-discipline, it is possible that there are relations between music training and hot EFs, too. However, research about this association is scarce. The few studies addressing hot EFs and music training show partly contrasting results (Hou et al. 2017; Smayda et al. 2018). Moreover, studies involving children are completely lacking. Since hot EFs are still developing till adulthood (e.g. Crone and van der Molen 2004), there is a possibility that children and adults show different patterns in their associations to music training. To make assumptions whether music training is associated with EFs in general or only with the cold, analytic aspects, it is necessary to study cold and hot measures of EFs in one sample. Therefore, the aim of the present work is to investigate the relationship between music training and hot and cold EFs in adults and children.

1 Executive functions

The concept of EFs refers to a family of top-down mental processes that we need for paying attention, concentration, and problem solving. EFs allow individuals to mentally play with ideas, to resist temptations, to give reflected responses, to finish tasks, and to pursue a long-term goal (Diamond 2013). There is a wide number of different EFs that can be summarized into the three core EFs inhibition, working memory and cognitive flexibility. Inhibition means the ability to control attention, give a reflective instead of an impulsive response and to resist temptations. Working memory is the ability to store, update and mentally work with information. Cognitive flexibility refers to the ability to shift between mental sets or tasks, change perspectives and create new ideas (Diamond 2013; Miyake et al. 2000). When using the term executive functions in a more “classical” sense, we usually refer to pure analytical problem solving measured with abstract, decontextualized tasks (e.g. the Wisconsin Card Sorting Test; Milner 1963). However, problems in the real world also contain emotional and motivational components, which may vary in their degree of significance depending on the task and context. Individuals differ in their susceptibility to these environmental hot factors, which influence problem solving (Peterson and Welsh 2014). Accordingly, the model of EFs has been expanded and next to the pure analytical cold EFs the concept of hot EFs where tasks contain affective or motivational factors has been integrated. While hot EFs are more related to decision making in everyday life and used when a problem or task is of emotional significance for an individual, cold EFs are rather associated with abstract thinking (Prencipe et al. 2011).

Typically, hot executive functions are measured with test procedures that involve affective decision-making behavior about events that are associated with significant consequences, such as a gain and/or a loss (Peterson and Welsh 2014) as for example the Iowa Gambling Task (IGT; Bechara et al. 1994). Originally, the IGT was designed to study patients with damage in the orbitofrontal or ventromedial prefrontal cortex. In lesion studies it has been shown that patients with impairment in these areas were found to have normal performance in cold analytic tasks but decreased performance in affective decision behavior tasks (e.g., Bechara et al. 1998). Meanwhile, the IGT has been used in numerous clinical and non-clinical studies to measure decision-making (Toplak et al. 2010) and seems to be an established measure of hot EFs (e.g. Kerr and Zelazo 2004; Prencipe et al. 2011; Zelazo and Carlson 2012). Other tasks to capture hot EFs include risky decision-making tasks (Rogers et al. 1999), delay discounting (Monterosso et al. 2001), and delay of gratification paradigms (Mischel et al. 1989). The different tasks all capture flexible decision-making for events that have emotionally significant consequences (Hongwanishkul et al. 2005). However, it should be noted that the organization of hot EFs has been less researched and understood compared to cold EFs, and while there is a vast number of tasks to assess cold EFs, there are less tasks to measure hot EFs. Moreover, it is debatable whether there is a clear distinction between the two constructs and an acceptable construct validity for the measurement of hot EFs (Welsh and Peterson 2014). Nevertheless, in our study, we chose to use two established measures for the assessment of hot EFs, which were applicable for adults and children.

1.1 Music training and cold executive functions

Playing a musical instrument relies on the use of many EFs such as inhibition, selective attention, flexibility and monitoring (Jäncke 2009; Okada and Slevc 2018). Therefore, it is reasonable to assume a connection between music training and EFs. Empirical studies support the theoretically suggested association between music training and several cold EFs in adults and children (e.g. Amer et al. 2013; Bialystok and DePape 2009; Degé et al. 2011; Joret et al. 2017; Zuk et al. 2014). More recent research found associations to working memory but not to other measures of EFs (D’Souza et al. 2018; Okada and Slevc 2018; Slevc et al. 2016). Therefore, it is not clear whether music training is associated with improved EFs in general or rather with specific measures of EFs. Besides the inconsistent findings of correlational research, these studies do not tell us something about the potential causal impact of music training on EFs. However, particularly in recent years, several experimental studies have been conducted, that also suggest causal relationships (Bolduc et al. 2020; Bugos et al. 2007; Bugos and DeMarie 2017; Degé et al. 2020; Frischen et al. 2019, 2021; Holochwost et al. 2017; but see also Alemán et al. 2017; D’Souza and Wiseheart 2018).

1.2 Music training and hot executive functions

Considering music training and EFs more precisely, it is striking that all studies reported above addressed the relationship between music training and abstract cognitive tasks without affective or motivational significance. However, learning a musical instrument requires a lot of motivation and self-discipline. You need to resist temptations (e.g. continuing practicing instead of watching TV or relaxing) and pursue long-term goals, because it is necessary to practice a lot to become better. Therefore, it might be possible that music training is also related to cognitive processes, which involve motivational aspects like decision-making or delay of gratification. To our knowledge there are only two recent studies addressing the relationship of music lessons and hot EFs (Hou et al. 2017; Smayda et al. 2018). Both studies investigated the association between music training and risky decision-making in adults—and came to partially contrasting results. The study of Hou et al. (2017) showed that the duration of music training is not related to decision making in general, but that musically early trained adults performed better than late-trained adults or adults with no musical training on a computerized version of the Iowa Gambling Task (IGT; Bechara et al. 1994), which assesses decision making under unknown and known risk. Contrasting, Smayda et al. (2018) found out that decision-making under known risk is better in late-trained musicians than early trained or non-musicians. Due to the inconsistent results and the limited number of studies addressing this question—especially regarding children—the relation between music training and decision making as a measure of hot EFs needs further investigation.

2 Objectives

The overall aim of the studies was to investigate the relationship between music training and hot EFs in adults and children and compare these results to the results of the relation between music training and the better researched cold EFs. Moreover, our studies were set to help to clarify previous contrasting results of music training and decision-making in young adults (Hou et al. 2017; Smayda et al. 2018) (Study 1) and to examine the association between music training and hot EFs in 9‑to-12-year-old children for the first time (Study 2). In addition to decision-making, another component of hot EFs (delay of gratification) was covered. To be able to draw comparisons with already existing studies, the Iowa Gambling Task (IGT; Bechara et al. 1994) was used to measure decision making under unknown and known risk. Furthermore, we conducted the Delay of Gratification Test for Adults (DoG‑A; Forstmeier et al. 2011) to determine delay of gratification as a component of self-control as a second measure of hot EFs. For the comparison between the constructs of EFs, the three main components of cold EFs were also assessed to allow a comparison between the constructs and their relation to music training. Based on the current state of research we assumed not to find a relationship between the duration of music training and hot EFs measured by the performance in the IGT, but a relationship between the age when participants started playing music and performance in the IGT. With respect to cold EFs, we assumed positive associations with the duration of music training in adults and children based on the findings of previous studies (e.g. Amer et al. 2013; Bialystok and DePape 2009; Degé et al. 2011; Zuk et al. 2014).

3 Methods (Study 1)

This study was performed as part of a doctoral thesis. Research was conducted in accordance with ethical guidelines granted by the ethics committee of the department of Psychology and Sports Science of Justus-Liebig-University Giessen, Germany. Informed consent was obtained for each participant prior to participation.

3.1 Participants

The sample comprised 136 undergraduate students (118 female) from a university in Germany. Participants were recruited via the University’s mailing list and ranged in age from 18 to 33 years. Most of the students (90%) had taken private music lessons for an average of M = 7.51 years (SD = 6.67 years) and 74% had played regularly in a music group (e.g. ensemble or band) for an average of M = 7.4 years (SD = 6.4 years). Only few students (6%) were still taking music lessons at the time of study participation and 19% played regularly in a music group. An overview about the sample’s demographics is presented in Table 1.

Table 1 Demographic Characteristics of Participants of Study 1 and Study 2

3.2 Measures

3.2.1 Independent variables

The total music training was assessed via a questionnaire that asked participants about all instruments (voice included) they had studied. For each instrument they reported the age when they had started taking lessons, the age when they stopped taking lessons, and for how long they took lessons (in months). In the same way group playing (e.g. orchestra, choir, or ensemble) was assessed. We calculated the independent variable “total music training” by summing up the total duration of music lessons and the total duration of playing in a musical group. Moreover, we used the age when participants started with music training as a second independent variable.

3.2.2 Dependent variables

The Delay of Gratification Test for Adults (DoG‑A; Forstmeier et al. 2011) is a behavioral measure for motivational self-regulation. Originally, it was designed for measuring delay of gratification in older adults. Nevertheless, it is suitable for all ages from nine years on (Göllner et al. 2017). The test contains four decision-making-tasks with four different kinds of rewards—hypothetical money, snack delay, real money and magazines—which are embedded in a board game. In our study we replaced the real money reward by a real present (small present now versus a bigger present in one month), because this seemed to be more appropriate for children than a money reward. As suggested in Forstmeier et al. (2011) we calculated two subscores (snack delay and hypothetical money), ranging from 0–8 (0 = never delayed reward; 8 = always delayed reward) as well as a composite score. The DoG‑A has a moderate internal consistency (α = 0.4). The criterion validity between the different subscores and Delay Discounting (another measure for self-regulation) range from r = −0.22 to r = −0.46 hypothetical money (higher Delay Discounting rates indicate lower self-regulation).

The Iowa Gambling Task (IGT) is a measure for affective decision-making. Originally, it was designed to assess decisional deficits in patients with a lesion in the ventromedial prefrontal cortex (Bechara et al. 1994), but has become a well-established measure for decision making in healthy participants, too (e.g. Smayda et al. 2018). We used a computerized version of the IGT as described in Bechara et al. (2000). Card decks in the IGT differ in their net win and net losses. Participants are unaware about the respective quality of each of the decks but should become aware of the different properties from 20 (Maia and McClelland 2004) to 40 (Brand et al. 2007) selections. Consequently, the IGT can be used as a decision-making task under uncertainty for the first trials and under known risk, when the properties about the decks are learned (Gansler et al. 2011). The outcome measure of the IGT is the number of selected advantageous cards minus the number of selected disadvantageous cards and is calculated automatically for the overall performance (100 trials). Additionally, we calculated subscores for 5 20 trial blocks as it was done in a previous study (Smayda et al. 2018) for reasons of comparability and to have a measure for decision making under ambiguity and under known risk. Higher scores indicate more advantageous choices and less risky behavior. For the IGT no quality criteria are available.

The cold EFs inhibition, flexibility and working memory were assessed with the German test battery for attention (TAP; Zimmermann and Fimm 2014), which is a fully computerized neuropsychological assessment consisting of 13 different subtests. For our studies, we chose the three subtests Go/NoGo (inhibition), working memory, and flexibility. The outcome measures differ between the subtests and are described for each test respectively. All tests were administered with a standardized instruction via a computer screen. Age-normed T‑values are provided for all outcomes and are calculated automatically for each participant.

To measure inhibition the Go/NoGo task was administered in two conditions. In condition 1, participants had to detect 1 critical stimulus out of 2 stimuli while in condition 2, participants had to detect 2 critical stimuli out of 5. We chose the reaction time as the outcome measure. According to the manual the retest-reliability is r = 0.56 and the odd-even-reliability is r = 0.92 for the median of the reaction time. For the second condition the retest-reliability is r = 0.51 and the odd-even reliability r = 0.93 for the median of the reaction time.

Working memory was assessed with an N-back task. As for the Go/NoGo we chose the median of the reaction time as the outcome measures. For all measures age-normed T‑values are provided from 19 to 89 years. The retest-reliability for this test is r = 0.60 for the median of the reaction time and the odd-even reliability is r = 0.85 for the median of the reaction time.

Flexibility was measured with a set-shifting task, which is provided in a verbal and a nonverbal condition in the TAP. We administered both conditions and chose the index of the overall performance (including errors and the median of the reaction time) as the outcome measure. Small values represent weak performance (high error rate and/or slow reactions) whereas high values indicate a good performance (small error rate and fast reactions). Age corrected T‑values are provided from 20 to 90 years. The retest-reliability is r = 0.83 for the median of the reaction time and r = 0.41 for errors for the verbal condition. The odd-even-reliabilities are r = 0.75 for the errors and r = 0.98 for the median of the reaction time. For the nonverbal condition, there are no available retest-reliabilities. The odd-even-reliabilities for the nonverbal condition are r = 0.75 for the errors and r = 0.99 for the median of the reaction time.

3.2.3 Control variables

IQ (fluid intelligence) was assessed with the revised version of the Culture Fair Test (CFT 20‑R; Weiß 2006). The CFT 20‑R consists of two parts which include the four subtests series, classifications, matrices and typologies. In our study participants had to do the first part only. The test provides standardized and age normalized values for both parts separately. The subscales of the CFT 20‑R show construct validities from r = 0.78 to r = 0.83 with the factor g and a test-retest reliability of r = 0.92 for the first part and r = 0.96 for the whole test (parts one and two).

To assess personality, we applied the German version of the Big Five Inventory (BFI; (Rammstedt and Danner 2017)). The BFI is a self-assessment questionnaire for the Big Five Factors of personality (openness, agreeableness, extraversion, conscientiousness, neuroticism) and consists of 45 items. Following the instructions by Rammstedt and Danner (2017), we calculated means for each of the personality factors. Higher scores indicate a higher development of the respective personality trait. The five scales of the German BFI show an internal consistency of α = 0.74 to α = 0.86. The test-retest reliability for the scales ranges from r = 0.78 to r = 0.93.

Socioeconomic status (SES) and non-musical leisure activity were covered via a questionnaire. Non-musical leisure activities were assessed in months. If participants were engaged in more than one activity at a time, the months of involvement were summed. SES was measured by the highest level of education reached by the parents as well as family income. Although we are dealing with adults here, we have collected the education and income of the parents, as it was shown that these factors potentially influence whether children take music lessons or not (Corrigall et al. 2013) which in turn affects instrumental playing in adulthood. Parent’s education level was coded with 2 = both parents holding a university degree, 1 = one parent holding a university degree, and 0 = no parent holding a university degree. Family income was assessed categorical ranging from 1 = less than 1000 € to more than 6 = 5000 €.

3.3 Procedure

Participants completed one single and one group test-session, administered by a master student and a research assistant well-trained in administering all the tests and blind to the amount of music lessons the participants had. In a single session we assessed hot and cold EFs as well as the questionnaire about demographic variables and musical background. The individual sessions lasted between 70 and 90 min for each participant and took place in one of our labs in the department. Personality and intelligence were assessed in a group session (10 to 20 participants), which lasted around 60 min and took place in one of the university’s seminar rooms. Every participant got three hours credited as test person as well as the presents and snacks from the DoG‑A as gratification for participating in the study.

4 Results (Study 1)

An a‑priori power analyses conducted with G*Power (Faul et al. 2007) revealed that a sample of at least 111 participants was sufficient when assuming a medium effect of r = 0.3 (α: 0.05, power (1-β): 0.95).

We performed Pearson correlations to uncover associations between independent variables (total music training, age when starting with music training) and dependent variables (hot and cold EFs). To detect potential confounding variables, we first correlated independent and control variables. In the second step we correlated dependent variables and control variables. If the results showed a correlation between the independent variable and the control variable as well as between the dependent variable and the control variable, we performed partial correlations in a third step to control the confounding variables. In addition to the correlational analyses, we performed multiple regression analyses to investigate whether music training predicts the performance in hot and cold EFs tasks. For those analyses we included identified control variables in a first step and the predictors music training or age when starting with music training in a second step. To have the option of considering both hot and cold EFs in one model, we switched predictor and dependent measures, although we were not interested in answering the question what EF profile might predict music training. In two multiple regression analyses, we included the identified control variables in the first step (if there were any), and hot and cold EFs in the second step to predict music training as well as the age of onset with music training. This procedure allows us to directly compare hot and cold EFs and helps to check whether possible correlations of the hot EFs can be explained by cold EFs.

4.1 Preliminary analysis

Average IQ was M = 112.7 (SD = 12.5), which is higher than normal, but seems likely in a student sample. IQ scores from six participants differed more than two standard deviations from the average of the sample and were excluded from further analyses. For an overview about the descriptive statistics of the dependent measures please see Table 2. Due to missing norm values for some ages the number of participants differs between dependent variables.

Table 2 Descriptive statistics of dependent measures Study 1 (adults)

Correlations between independent and control variables are shown in Table 3. The values in the table show that the amount of music experience is positively associated with parental income, parental education and the personality factors of extraversion and openness to experience. Accordingly, these control variables will be partialed out in further analyses if they are also correlated with the dependent variable.

Table 3 Correlations between independent and control variables Study 1 (adults)

Intercorrelations between dependent variables are presented in Table 4 There were positive correlations between measures of cold EFs but no correlations between measures of cold and hot EFs.

Table 4 Correlations between dependent measures Study 1 (adults)

4.2 Main analyses

4.2.1 Hot executive functions

The analyses for music training and the IGT did not show a significant correlation between music training and the total score, nor between music training and any of the 20 trial blocks (all ps > 0.05). The calculations between the age when starting with music training and the blocks of the IGT showed a small negative correlation in the fifth block of the IGT indicating that the younger participants started playing music, the more often they chose from the advantageous decks in the last block of the IGT. The results of the regression analyses showed that the age of onset with music training predicted the performance of the IGT, F(1,113) = 4.52, p = 0.036, explaining 4% of the variance, t(113) = −2.13, p = 0.036.

The analyses for music training and delay of gratification initially showed no significant correlation between music training and the total score of the DOG‑A (p > 0.05). Considering the subscales of the test, we found a small significant correlation between music training and the item “present” (r = 0.21, p < 0.05) revealing that the more music training participants received, the more often they chose the delayed reward. However, we found no significant correlation between the age when participants started with music training and any of the scales of the test (p > 0.05). Since the identified control variables were not correlated with the total score of the DoG‑A or with the individual subscales or items of the test, we did not calculate partial correlations. The additional performed regression analyses revealed a significant model, F(1,130) = 6.67, p = 0.014, showing that 5% of the variance is explained through music training, t(130) = 2.48, p = 0.014.

4.2.2 Cold executive functions

Regarding Inhibition, the analyses revealed significant correlations between music training and the first condition as well as between music training and the second condition of the Go/NoGo task. Since parental education was correlated with both music training and the first condition of the Go/NoGo task, the partial correlation was calculated by keeping the parental education constant. The result showed that the correlation became smaller and insignificant (p > 0.05). Since the second condition of the Go/NoGo task was not correlated with any control variable, there was no need to conduct partial correlations. The correlations between the age when participants started music training and both conditions of the Go/NoGo task were not significant (ps > 0.05). The additional performed regression analyses showed a significant model, F(2, 123) = 4.15, p = 0.018, indicating that music training accounts for 6% (∆R2) of the variance on the second condition of inhibition, t(125) = 2.45, p = 0.016 when parental education was statistically controlled.

The analyses for cognitive flexibility showed a significant correlation between music training and the non-verbal condition (r = 0.27, p < 0.01). Since none of the control variables correlated with cognitive flexibility, the calculation of partial correlations was not necessary. Multiple regression analyses showed a significant model, F(2, 116) = 4.39, p = 0.015 with music training accounting for 7% (∆R2) of the variance, t(116) = 2.92, p = 0.004, when parental education was statistically controlled. The analyses between music training and the verbal condition of cognitive flexibility showed no significant correlations (p > 0.05). Furthermore, no significant correlations were found between the age when participants started with music training and both of the cognitive flexibility conditions (ps > 0.05). Also, the calculation between music training and working memory as well as between the age when participants started with music training and working memory did not show any significant correlations (ps > 0.05). An overview about the correlations between independent and dependent variables is shown in Table 5.

Table 5 Correlations (and partial correlations in parentheses) between dependent and independent measures Study 1 (adults)

The additional performed regression analyses to compare hot and cold EFs in one model showed a similar picture as the previous results: We found a significant model with hot and cold EFs predicting the total amount of music training accounting for 13% of the variance, F(5,114) = 3.12, p = 0.011. Within this model we found delay of gratification, t(114) = 2.35, p = 0.020, b* = 0.21, and flexibility, t(114) = 2.11, p = 0.037, b* = 0.22, as significant predictors for music training. The model to predict age of onset with music training was not significant, F(5,99) = 1.72, p = 0.137.

5 Discussion (Study 1)

In Study 1, we investigated the associations between music training and both hot and cold EFs in a sample of young adults. We studied the relationship between music training and decision-making as a measure of hot EFs to clarify the contrasting results of previous research. As a second measure of hot EFs we assessed the DOG‑A to examine whether music training is also related to delay of gratification. Moreover, we compared the pattern of associations between music training and measures of cold EF to the pattern of associations between music training and measures of hot EF. We tested 136 undergraduates between 18 and 30 years with different amounts of music training. Regarding hot EFs we found a small correlation between music training and delay of gratification as well as a small correlation between the age of onset with music training and decision-making. Moreover, we found music training predicting delay of gratification, explaining 5% of the variance, suggesting that higher amounts of music training predict better levels of reward delay. Since learning a musical instrument requires a lot of self-discipline, focusing on long-term goals, and resisting temptations, it seems reasonable that it is associated with the ability for delayed gratifications. In addition, it was shown that the age of onset with music training explained 5% if the variance in decision-making. In sum, the pattern of relations to measures of hot EFs seems to be more complicated and relatively unclear, not only involving total amount of music training, but also age when lessons were started. For the cold EFs we found small correlations in relation with music training. Regarding cold EFs, music training was shown to be a significant predictor, explaining 6–7% of the variance, indicating that higher amounts of music training predict better performance on the cold EFs measures. For all measured outcomes the variance explanation is only small. However, if these correlations were causal, they would still be relevant, since executive functions are of enormous importance for the development of an individual (Diamond 2013). The comparison between hot and cold EFs revealed that the findings of hot EFs are independent from the findings of cold EFs. The result that we did not find inhibition as a significant predictor but only flexibility as a significant predictor of cold EFs in this model could be due to shared variance between both EFs components. In sum our data suggest that flexibility and delay of gratification predict the total amount of music training. However, we did not find any of the EFs being a significant predictor for the age of onset with music training.

In relation to previous studies, we confirmed the finding that music training in general is not related to decision-making abilities (Hou et al. 2017; Smayda et al. 2018). However, our results suggest that the age when participants started with music lessons was negatively associated with decision-making abilities in the last part of the IGT. In other words, the earlier participants started with music lessons, the more often they chose cards from the advantageous decks in the last block of the IGT. This result is consistent with the finding of Hou et al. (2017) who found out that participants with early music training performed better on the IGT than participants with late music training or no training and is contrasting to the results of Smayda et al. (2018) who reported that late-trained musicians showed decision-making advantages compared to early-trained or non-musicians. Accordingly, due to our own results and the results of Hou and colleagues we suggest that decision-making ability is associated to early musical training in young adults. Furthermore, this result goes well with previous research, which revealed that especially early music training can enhance sensorimotor abilities that persist even till adulthood (Steele et al. 2013). This finding supports the hypothesis for a “critical period”, meaning that introducing novel skills related to brain regions that are enduring development will lead to long-lasting cross-domain abilities (Hensch 2005). Therefore, it might be that early music lessons affect decision-making abilities more than late music lessons. However, due to the correlational design and the only small correlations, we need further research to clarify this question.

Regarding cold EFs, the results of this study showed that in young adults, music training is positively related to inhibition and cognitive flexibility and moreover, a significant predictor for these outcomes. These associations are small and music training accounts only for 6–7% of the variance in these measures. However, our findings support results of previous studies that also found positive associations between inhibition (Bialystok and DePape 2009; Amer et al. 2013) and cognitive flexibility (Bugos et al. 2007) in adulthood. Surprisingly, we did not find a relationship between the amount of music training and working memory in the adult sample, which is contrary to results of previous studies (e.g., Slevc et al. 2016; Okada and Slevc 2018). One possible explanation could be that our sample consisted of psychology students who are generally cognitively efficient and have very good working memory due to a lot of learning. In addition, only 20.5% of the sample were active in music practice at the time of study participation, and only 6% of the sample were still receiving music lessons, so that possible advantages from playing music may already have been declined.

6 Methods (Study 2)

This study was performed as part of a doctoral thesis. Research was conducted in accordance with ethical guidelines granted by the ethics committee of the faculty of Psychology and Sports Science of Giessen University. Informed consent was obtained for each participant prior to participation.

6.1 Participants

Study 2 involved 100 children (55 female) aged 9–12 years. Music training was assessed using the same questionnaire as in Study 1, which was filled out by the parents. According to this, 75% of the children had received music lessons with an average duration of M = 3.46 years (SD = 2.04 years). Of these children, 84% were still receiving music lessons at the time of study participation. In addition, 60% of the total sample had already played in a musical group with an average duration of M = 3.38 years (SD = 3.45 years). Of these, 67% were still active in a musical group at the time of participation in the study. An overview about the sample’s demographics is presented in Table 1.

6.2 Measures

We used the same measures as in Study 1. Therefore, no further description of those is given here. Since the TAP does not provide age normalized scores for the entire age range of 9 to 12 years, raw scores were used for the tests of cognitive flexibility and working memory. Therefore, for these measures we added the children’s age as a control variable.

6.3 Procedure

Before the start of the study, parents’, and children’s informed consent to participate in the study was obtained. The parents of the children filled out the background questionnaire and the questionnaire on their child’s personality. As in Study 1, the children completed an individual test session and a group test. As gratification for participating in the study, children received a personalized certificate as well as the present and snacks from the individual test session.

7 Results (Study 2)

All analyses were performed analogous to the analyses in Study 1.

7.1 Preliminary analyses

The average IQ of the sample was M = 108.7 (SD = 12.0). Two children showed low IQ scores (< 83) and deviated very significantly from the overall sample. Therefore, these children were excluded from subsequent analyses. An overview of the descriptive statistics of the dependent variables is given in Table 6.

Table 6 Descriptive statistics of dependent measures Study 2 (children)

The correlation matrix between the independent variables and the control variables is shown in Table 7. The results show that music training is negatively correlated with the age when children started music training. Furthermore, music training correlated positively with IQ, but not with any other of the control variables (all ps > 0.05). In the following analyses, the variable IQ is partialed out, if it is also associated with the dependent variable.

Table 7 Correlations between independent variables and control variables Study 2 (children)

The intercorrelations between the dependent variables are shown in Table 8. It was shown that the cold EFs are intercorrelated. Single intercorrelations between cold and hot EFs were found. There were no intercorrelations between the measures of hot EFs. An overview about the correlations between independent and dependent variables is shown in Table 9.

Table 8 Correlations between dependent measures Study 2 (children)
Table 9 Correlations (and partial correlations in parentheses) between dependent and independent measures Study 2 (children)

7.2 Main analyses

7.2.1 Hot executive functions

We did neither find significant correlations between music training and the total score of the IGT, nor for the individual blocks of the IGT (ps > 0.05). Analyses between the age when children started with music training and the individual blocks of the IGT showed a significant negative correlation for the third block of the IGT indicating that the earlier children started with music training the better they scored in the third block of the IGT. Additional performed regression analyses showed a significant model, F(3,72) = 3.03, p = 0.035, with the age of onset when starting with music training explaining 9% (∆R2) of the variance when age and IQ were held constant, t(73) = −2.59, p = 0.012.

The analyses between music training and the total score of the DoG‑A did not show a significant correlation (p > 0.05). Similarly, no significant correlation was found between music training and any of the subscales of the DoG‑A (ps > 0.05).

7.2.2 Cold executive functions

The analyses of music training and inhibition showed a significant positive correlation for the first condition of the Go/NoGo task, but no significant correlation between music training and the second condition (p > 0.05). Since none of the control variables were correlated with the first condition of the Go/NoGo task, there was no need to calculate partial correlations. Hierarchical regression analyses indicated a significant model, F(2,93) = 3.04, p = 0.050, showing that music training accounted for 6% (∆R2) of the variance in inhibition when IQ was held constant, t(95) = 2.46, p = 0.016.

Concerning working memory, the results showed a significant positive correlation between the amount of music training and correct answers, but no significant correlation between the amount of music training and reaction time (p > 0.05). Since none of the measured parameters of working memory were correlated with the control variables, we refrained from calculating partial correlations. The results of the regression analyses showed a significant model, F(3,92) = 3.88, p = 0.012, revealing that music training explained 10% (∆R2) of the variance in working memory when age and IQ was statistically controlled, t(95) = 3.12, p = 0.002.

The analyses for cognitive flexibility initially showed a significant negative correlation between the music training and the reaction time in the non-verbal condition, as well as between music training and the reaction time in the verbal condition. Since both the non-verbal and the verbal condition were correlated with age at study participation and IQ, partial correlations were calculated subsequently. These analyses showed that the negative correlation between music training and the non-verbal condition decreased but still remained significant. The correlation between music training and the verbal condition disappeared when the control variables were controlled (p > 0.05). The regression analyses showed a similar picture: While we found a significant model, F(3,92) = 9.62, p < 0.001, indicating that music training accounted for 4% (∆R2) of the variance when age and IQ were held constant, t(95) = −2.28, p = 0.025 for the non-verbal condition, in the verbal condition music training was no significant predictor when age and IQ were held constant, p > 0.05.

The additional performed regression analyses to compare hot and cold EFs in one model showed a similar picture as the previous results: We found a significant model with hot and cold EFs predicting the total amount of music training, accounting for 16% (∆R2) of the variance, F(7,95) = 3.45, p = 0.003. Moreover, we found working memory as the only significant predictor within this model, t(95) = 2.58, p = 0.011, b* = 0.26. The model to predict the age of onset with music training was also significant, F(7,72) = 2.19, p = 0.046, with hot and cold EFs explaining 14% (∆R2) of the variance. Within the model decision-making was shown to be the only significant EFs predictor, t(72) = −3.02, p = 0.004, b* = −0.35. The other EFs were no significant predictors for the age of onset with music training (p > 0.05).

8 Discussion (Study 2)

In Study 2 we investigated the association between hot and cold EFs in 9‑to-12-year-old children. To our knowledge this is the first study examining this association in childhood. Our results enable us to make first statements about whether music training is related to measures of hot EFs in addition to cold EFs.

Regarding hot EFs, comparable to Study 1, the results of Study 2 indicate that there is no association between music training and decision making. Also consistent with Study 1 the results reveal that the earlier children started with music training, the better they performed in decision making—at least in the third block of the task. Moreover, the regression analyses indicate that the age when beginning with music training predicts the performance in decision making and accounts for 9% of the variance, indicating that an early begin with music training predict a better performance in decision-making. These results confirm both the results from the adult sample and the results from earlier studies indicating that it is not music training per se, but the early start of music training which is associated with more beneficial choices (Hou et al. 2017). Contrary to the adults’ sample in Study 1 we found the association between the age when children started music training and beneficial choices only for the third block of the IGT whereas in adults this association was only shown for the fifth block. A possible explanation could be that adults and children generally show different behavioral patterns in the IGT. While the performance of adults increases from block to block, the performance of children aged 10 to 11 years improves only up to block 3 and is in general poor compared to older participants (Prencipe et al. 2011).

Concerning the ability to delay rewards, we found no relation to music training in the children’s sample. Since in the adults’ sample music training was only related to the present, but not to the other subscales of the DOG‑A, it is conceivable that either there is no such relation, or that it only becomes apparent after a considerable number of years of music training. Moreover, it was possible that the hypothetical items from the test do not create an extremely good incentive.

The results on cold EFs in Study 2 showed small associations with all three main components of EFs. For the subtests on inhibition and cognitive flexibility, it was found that the more music training the children had, the faster their reaction times were in these tests. In terms of working memory, there was no correlation between music training and reaction time, but a significant correlation between music training and correct answers. Additionally, the regression analyses revealed that music training was a significant predictor for all measures of cold EFs, explaining 4–10% of the variance. Our results confirm results of previous studies showing that music training in childhood is associated with inhibition (Bugos and DeMarie 2017; Degé et al. 2011; Zuk et al. 2014), working memory (Degé et al. 2011; Schellenberg 2011), and cognitive flexibility (Degé et al. 2011; Holochwost et al. 2017). As in Study 1, it could not be shown that early music training is positively associated with cold EFs. Rather, our results indicate that music training generally is related to a better performance in all three main components of cold EFs. Similar to Study 1, the found association are only small, but—as already stated—still relevant, since executive functions are important predictors of life success (Diamond 2013).

The comparison between hot and cold EFs suggest that for the amount of music training cold EFs (especially working memory) are relevant predictors, whereas for the age of onset with music training cold EFs seem less relevant. Rather, decision-making, or more precisely taking less risky decisions, seem to be more relevant in this association. Additionally, the comparison between hot and cold EFs suggest that children with good working memory are more likely to take music lessons for a longer time and that children with less risky decision behavior are more likely to begin early with lessons. However, it should be noted that these association are only small and do not allow for causal conclusions. Moreover, since we did only single tests for all EFs measures our findings are very dependent on the administered tests.

9 General discussion

Our studies were designed to investigate the relationship between hot and cold EFs in young adulthood and late childhood. To test these relationships, we chose a correlational design and measured hot and cold EFs in two samples. This allows us to draw comparisons between both constructs of EFs as well as to investigate two different age groups. Overall, our results show that the association between music training and hot EFs seems not entirely clear, while a small association between music training and cold EFs was more apparent in both samples. Regarding decision-making we found a different pattern in the children’s sample compared to the adult’s sample. However, the direction of findings was similar to those in the adults’ sample and confirms previous findings by indicating that not the number of years playing music, but the early start of playing contributes to better decision-making. Regarding cold EFs the relations found in the children’s sample are even stronger than in the adults’ sample by showing relations to all measures of cold EFs and predicting up to 10% of the variance (working memory). This could be due to the fact that most of the children with music training were still musically active during study participation whereas in the adults’ sample the regular music practice had been several years ago for most participants. However, since we still found in the adults’ sample positive relations to music training for some measures of EFs, it seems that these relations were long lasting. Schellenberg (2006) reported similar findings in a correlational study on the relationship between music lessons and IQ in adults and children. The comparison between hot and cold EFs revealed that in Study 1, both hot and cold EFs were independently significant predictors of music training. In Study 2 we found that especially working memory predict the amount of music training and that less risky decision-making predict the age of onset with training. In sum, these results suggest that the difference in findings for the single measures of EFs remains and the findings for hot EFs cannot solely explained through cold EFs.

In summary, the results are in line with the current state of research on music training and cold EFs, showing similar associations. However, regarding hot EFs the association seems less clear. This could indicate that hot and cold EFs are different constructs (Kerr and Zelazo 2004) and is underlined by the finding that measures for hot and cold EFs were not correlated in Study 1 and were in part negatively correlated in Study 2. On the other hand, the differences in the results between hot and cold EFs could also appear because of the not comparable administered tests. As stated in the introduction, the main problem with the measurement of hot EFs is that the concept of hot EFs is still not fully understood and less studied compared to cold EFs (Peterson and Welsh 2014). For this reason, there is a lack of well evaluated tests to capture hot EFs. To become a better understanding of hot EFs and its relation to music training we urgently need more well evaluated measures. As Peterson and Welsh (2014) suggest, tasks for the measurement of hot EFs could be derived from cold EFs tasks by adding an affective component such as win-loss situation (e.g. When measuring reaction times as in the TAP by giving a reward only when participants react correct and above-average fast and a loss for any other case). Besides the better comparability, this would additionally offer the possibility to change the “temperature” of the task gradually for example by increasing the gain. It might be profitable for future studies to use such an approach, which would allow a better comparison between both constructs.

9.1 Limitations

It is important to mention that our studies do not allow for causal conclusions because of their correlational design. Although we controlled confounding variables like SES, IQ, and personality, we cannot exclude that the results are influenced by another third variable we did not assess. Furthermore, it might be that participants who took high amounts of music lessons or played in a musical group for several years have per se higher levels of EFs. It was possible that high functioning participants were more likely to take many music lessons and to play in musical groups because their high EFs benefitted continuing playing. To make more far-reaching statements we need to conduct further well controlled experimental studies. Another limitation of the study is the assessment of hot EFs. For the administered tests there are no or only low reliabilities. Therefore, we cannot exclude that the different patterns found for hot and cold EFs could be due to the tests of hot EFs, which are of poorer reliability than the tests for cold EFs. However, as already mentioned, this is a general problem since there are no well-evaluated tests for the assessment of hot EFs. Though it also should be noted that the IGT is a widely used task to capture hot EFs. Additionally, due to a high number of tests and assessed variables the results could be influenced by the first type error. However, it was essential for us to include all important control variables in order to validate the results and to exclude systematic differences. Furthermore, it was crucial to collect hot and cold EFs in one sample in order to be able to compare both constructs with each other and also to ensure comparability with other studies. Accordingly, there are a lot of calculations, which increase the first type error. With an alpha correction, all results p < 0.007 would still be significant.

10 Conclusion

In summary, the results of the present studies confirm previous results showing that music training is positively related to cold EFs. Moreover, the two studies suggest that especially early music training could be beneficial for better decision making. Future studies should focus even more on the reliable investigation of hot EFs and possibly take an approach with more comparable tasks and a variable adjustment of the “task temperature”.