Introduction

Language and music acquisition have for some time been thought to have similar underlying cognitive processes; both systems have rules, are composed of basic elements that are grouped and which form complex structures. There has now been substantial research demonstrating a relationship between music training and reading skills in children (Moreno, 2009; Moreno et al., 2011a, b; Moreno et al., 2011a, b, Slater et al., 2014; Gordon et al., 2015; Flaugnacco et al., 2015; Hallam, 2019) Additionally, research suggests that musical training assists the encoding of speech (Nan et al., 2018; Patel, 2011). There is also evidence that language and music share a common neural encoding in the Broca’s area of the brain in the ability to process structured sequences (Chiang et al., 2018). Speech and literacy are of course interrelated, and since music can influence the development of speech, it seems logical to infer that it can influence literacy.

This observed link has led to the notion that language processing and music processing have shared mechanisms, for example, a neural encoding mechanism for sampling speech and a similar neural encoding mechanism to sample and track music via the same route (Hallam, 2019). Listeners may develop a mental framework that allows them to extract meaningful categories from acoustic signals (Patel, 2008). There is evidence that there is some overlap, for example, in processing rhythmic structure, rhyme awareness, or in speech perception (Corriveau et al., 2010; Liu et al., 2010; Moreno & Besson, 2006). Evidence from neuroimaging studies has demonstrated a shared mechanism of neural encoding (Cheung et al., 2017; Stewart & Williamon, 2008; Stewart et al., 2003). In a study looking at the processing of speech regularities in children aged 8–13, reading and music aptitude had positive correlations with auditory working memory (Strait et al., 2011). The authors found common cognitive markers for music aptitude and reading. Their results showed a relationship between auditory working memory and attention, with musical ability. They conclude that there is a direct relationship between musical skill and literacy-related skills as a result of having a common encoding mechanism.

This evidence of a link between the processing of language and music has led to suggestions that specific music training can improve speech and language, such as the processing of pitch patterns (Kraus & Chandrasekaran, 2010). Swaminathan and Schellenberg (2020), working with a sample of 6–9-year-olds found links between music and language but not from formal training in music. However, the musical training of their sample was based on pre-existing lessons and parental report so that it was not possible to exclude environmental factors. Research evidence from functional magnetic resonance imaging (fMRI) indicates that there are indeed changes in brain structure that result from musical training that may lead to an improved capacity to learn (Ragert et al., 2004; Stewart, 2003, 2008). These structural changes are visible in the auditory and motor areas of the brain and the area involved in the integration of sensory motor information with the limbic system, also suggesting some relation to emotional content (Hyde et al., 2009).

However, research exploring whether these structural changes in the brain have an observable impact on cognitive development, and if so how, has been mixed (Hille et al., 2011; Roden et al., 2012; Jaschke et al., 2018; Saarikivi et al., 2019). The nature of training for a musical instrument, which involves sustained attentional focussing, repetition, as well as precision, is more demanding than simply listening to ordinary speech; it is demanding on attention, short-term (working), and auditory memory. The most consistent evidence of the impact of musical training in children has been on working memory and verbal memory in particular (Ho et al., 2003; Hansen et al., 2012; Nutley et al., 2014; Talamani et al., 2016; Talamami et al., 2017; Guo et al., 2018; Saarikivi et al., 2019; Wilbiks & Hutchins, 2020, Kausel et al., 2020). Since working memory is what holds our attention, sustained attentional focussing and working memory are part of the same working process, and to retain information in working memory, we need to maintain attention, for example when reading text. What is not clear from the extant literature is what part learning to read music whilst learning an instrument, as opposed to playing by ear, plays in this observed improvement in working memory. The concept of working memory ‘binding’ cross-modal information (in this case audio and visual) may provide a clue to the involvement of working memory in the maintenance and learning of cross-modal associations (Garcia et al., 2019; Toffalini et al., 2019).

The reading of musical notation has indeed been found to show a positive impact on brain function (Stewart & Williamon, 2008; Stewart et al., 2003). Here, again there is the likelihood of a shared mechanism. Reading music requires perception of, and encoding of, visual symbols followed by the action of producing a musical response. Language reading requires perception of, and encoding of, visual symbols followed by the action of producing a verbal response. Musicians process music relationally, looking at pattern, distances between notes (intervals), and direction (up or down in pitch), and use ‘chunking’ for speed. In much the same way, fluent readers of written text will recognise whole words rather than reading individual phonemes, and familiar patterns using a similar chunking process. Reading music shares similarities with reading text; the visual appearance of symbols on a stave bear as little resemblance to their musical sounds as letters do to spoken sounds. Stewart et al. (2003) hypothesise that music reading involves sensorimotor translation in which spatial characteristics of notation guide selection of the appropriate keys. Equally, we could hypothesise that word reading is a translation in which spatial characteristics of print are used to guide the appropriate verbal response. The parallels with learning to read words are strikingly similar. In a recent ten-week study, children were taught to read music notation with a focus on maintaining attention whilst reading notation and moving in time (Hallam, 2019). The author found significant differences between children in the intervention and the control group for reading accuracy and comprehension and suggested that the quality of attention demanded by coordinating hands and body whilst concurrently reading music notation is highly demanding and may have led to the observed changes.

Pianists have been the subject of particular interest, perhaps because the keyboard has a visually linear representation of spatial relationships between pitches (Piro & Ortiz, 2009). In Stewart’s study (2003) after only 3 months of training, novice pianists had started to process musical notation automatically. Scans revealed that particular areas of the brain were activated whilst reading musical notation that could not be detected in the control group. Simply seeing musical notes after a short period of time set in motion a sequence of neural events related to the learned musical response (Stewart & Williamon, 2008).

The research study presented here used a randomised controlled trial. None of the participants had previously had formal piano lessons or learned to read music. The study aimed to confirm the impact of piano lessons, in combination with learning to read music, on working memory (visual and auditory) and to explore the impact on word recognition.

Method

Participants

Children were recruited from two neighbouring small rural Church-of-England primary (elementary) schools situated within the same education pyramid and church diocese, with equivalent socio-economic backgrounds and ethnicity. Each group had similar numbers of children with special needs (4 boys in the control group, 1 girl and 2 boys in the intervention group), and a similar gender balance, see Table 1. The sample was the whole cohort in Year 3 from each school with equivalent mean chronological ages of 7 years and 8 months. Randomisation was effected at the school level; it was not possible to randomise at the individual level. In the intervention group, one girl and one boy had already had some keyboard lessons, but this had not yet included learning to read music or playing with both hands. The piano lessons were elective, and all the children elected to participate; there were no withdrawals. In the control group, none of the children had piano lessons, but seven (4 girls, 3 boys) were having cornet lessons at school and three (1 girl, 2 boys) were having guitar lessons. None of the children in either sample had been taught to play the piano with both hands or read music for treble and bass clef prior to the intervention.

Table 1 Distribution of gender

Measures

All measures were administered at two time points. Pre-test measures were administered one month before the intervention began, and the post-test measures were administered 3 months after the intervention had been completed to give an indication of stability of any observed changes. Given that there is evidence in the literature that working memory has been found to have improved as a result of music training, we used the digit span test from the Aston Index, referred to as Auditory Sequential Memory Symbolic (Newton & Thompson, 1982). This is reported as two separate measures, digit span forward and digit span backwards as this is how it has been reported in the extant literature. This is a measure of short-term auditory memory. For the forward test, children are required to remember, in the correct sequence and in increasing numbers, from 2 digits up to 6 digits. For the backward test, children must both recall the numbers and re-order them backwards, again beginning with 2 digits and increasing to 6; there are ten items for each. Digit span is a commonly used test to assess working memory (de Carvalho et al., 2014; Hilbert et al., 2015). Children with dyslexia have been found to have deficits in serial order for digit span when compared with typically developing children (Cowan et al., 2017).

We also used the visual memory test known as Visual Sequential Memory Symbolic from the same Index. This test consists of children being shown symbols in a sequence that they are required to remember and then replicate in the correct order. The sequences increase from two to six; there are ten items. This test gives an indication of the ability of the child to order symbols in an arbitrary sequence, requiring the perception, retention, and replication of symbols in an order.

Data were collected also for sound blending and sound discrimination, in response to the prevalence in England of teaching reading through synthetic phonics. The sound blending test was included to measure the ability to reconstruct a meaningful word from its constituent sounds; the sound discrimination test was included to measure the ability to distinguish between similar sounds in spoken words.

Word recognition was assessed using the Schonell Graded Word Reading test (1971), although it is noted that this is only a measure of single word identification; it is a widely recognised guide to a child’s level of acquired word recognition but does not measure comprehension.

Materials

Children were taught on a digital keyboard. The notation was colour coded and matched with corresponding-coloured stickers on the piano keys. Notes were presented in conventional form on a stave but without indication of time value. Well-known melodies were used for the right hand and three basic chords for the left hand (I, IV, V) also in colours. At the fourth lesson, bar lines were introduced with conventional letter names of notes concurrently with colour coding. The lessons were designed for the children to make sufficient progress without needing practise between lessons.

Procedure

The piano lessons were conducted on a one-to-one basis for the first 2 months and then in pairs for the final month in preparation for a short performance. Tunes were chosen which mostly needed just five notes in order to use a five-fingered position in C (not requiring the use of black keys). The notes were colour-coded, using five colours (one for each finger). These colours were matched with corresponding-coloured stickers on the piano keys. The colours were selected in a similar sequence to the colours of the rainbow as this seemed a natural choice (red, orange, yellow, green, and blue), in order of pitch, beginning with red at the lowest pitch (C) in both hands. The notation was presented in conventional form on a stave but without any indication of time value or rhythm, only pitch (as circles on lines or in spaces). Although colours were used instead of just black and white, the shape and position of notes were conventional so as not to conflict with or create confusion with any pre-existing or future music lessons. The first lessons began with a familiar tune ‘Twinkle, twinkle’ using just the right hand (see Fig. 1). Once this was mastered the left hand was added using the tonic chord (three notes played together) which was associated with specific colours in the right hand (red, yellow, and blue). The chord for the left hand was described in terms of patterns of colour (red, yellow, and blue at the same time). Once this was mastered, a second and subsequently a third chord was added. The chords were not given technical names but descriptive names, for example ‘the stretched little finger chord’. At the point of adding chords to the first tune, the children were introduced to the right hand of the next tune. At each lesson, the children revised their existing repertoire before moving on to a new tune and only did so if they were happy to.

Fig. 1
figure 1

Music sheet for first lesson

At the fourth lesson, conventional music notation was introduced, and conventional letter names of notes were used concurrently with colour-coded notes (red = C; orange = D; yellow = E; green = F; blue = G). The conventional music was from a published beginner teaching book (Bastien, 1987). This book was chosen as a result of the researcher being familiar with the teaching style and the use of patterns in the left hand which were consistent with those being taught using the colour-coded method. Using a published book meant that the notes had extra features indicating timing, which the children were encouraged to ignore at this stage. They were, however, required to match the names of the notes to the colours and subsequently were able to locate the correct notes by using only the names of the notes in both clefs (for both hands). They continued to learn from standard notation written for both hands (both clefs), using both hands simultaneously. By using familiar tunes in the first instance, for example ‘Kum ba yah’, timing and rhythm were automatic. All the tunes used with the colour coding and chords were well-known familiar tunes. Lessons lasted between 10 and 15 min, depending on the degree of focus of the child and on a weekly basis for approximately fifteen sessions (4 h in total). The order in which children had their lessons during the day was rotated over the course of the whole term so that each child missed very little from any one area of the curriculum.

For the final four lessons, the children were put into pairs to practise for a duet. These were performed at an end-of-term assembly (one boy was happy to perform solo since we had odd numbers). The duets were selected from the existing repertoire and rearranged for two to perform. This required the children to listen to the other person and adjust their timing in a real context. Although note values were not directly taught, children were encouraged to include rhythms of tunes they were familiar with and to listen for pauses whilst playing duets.

Results

All the children in the intervention condition learned to play between 4 and 6 of the initial colour-coded notation pieces. From the conventional piano teaching book, ten completed 12 pieces, and the rest completed between 6 and 8.

For each of the reported outcomes, we used between-groups ANCOVA, using pre-test scores as the covariate in order to control for pre-intervention differences, and reporting statistical significance (p values). We also report effect sizes (Cohen’s d) calculated using the adjusted means, to indicate the relative importance of an effect not shown by p values (Baker, 2016; Nuzzo, 2014). Pre-intervention and post-intervention raw scores for all reported measures are detailed in Table 2. Group means, standard deviation, ANCOVA significance, and effect sizes are shown. The dependent variables were the post-test scores.

Table 2 Raw scores for group means for intervention and control conditions

Analysis evaluating the homogeneity of regression assumption showed no significant effect of the covariate, and thus, ANCOVA could be run for all reported measures. There was a moderate effect size and significant effect of the condition, after controlling for the covariate, for the graded word reading: F(1,30) = 4.23, p = 0.048, d = 0.38 and for digit span forward: F(1,30) = 6.71, p = 0.014, d = 0.54. However, digit span backward showed no effect of condition. The visual sequential memory test showed no statistical significance of condition; however, the moderate effect size, calculated using the adjusted means, indicates a possible trend that suggests some effect from the intervention: F(1,30) = 1.71, p = 0.201, d = 0.50.

Although scores were recorded for sound blending and sound discrimination at time 2, results are not reported due to the fact that so many of the time1 scores were at ceiling in both conditions thus rendering any results meaningless for the whole group. However, it is interesting to note that after removing the individuals who scored at ceiling in both conditions for the sound discrimination test, the difference in means of the remaining children suggests that for the children with poor scores the intervention may well have had some effect: Intervention N = 6, M t1 = 7, M t2 = 10; Non-intervention N = 4, M t1 = 8.29, M t2 = 9.62.

Discussion

Improvements in digit span as a result of musical training have been reported in studies elsewhere. We acknowledge that our sample was small; however, randomisation to intervention and the use of baseline pre-intervention scores as covariates would indicate that our results, also showing improved digit span forward and little difference in digit span backwards, are robust and support the findings in previous research. Additionally, these significant differences occurred after only one term of instruction and appeared stable over time. Digit span forward is believed to test short-term memory, whilst digit span backward tests working memory (Helland & Asbjørnsen, 2004). Since children with reading difficulty demonstrate a deficit in short-term memory and working memory (Swanson et al., 2009), we can conclude that an intervention which has a positive impact on short-term memory will also have a positive impact for children with reading difficulty.

The significant difference in word reading scores was something of a surprise as there is very little evidence of this in other studies. Reading, it could be argued, is not an auditory function but a function of visual (or tactile in the case of Braille) recognition of script (or hand signals). Thus, it is plausible that it was training in reading music notation that resulted in differences in word recognition scores. There is indeed a lack of evidence of improved word reading in studies of musical training where the focus has been on musicality rather than the notation. Although no improvements in word reading scores were found in the Slater et al. study (2014), the children who participated in music lessons maintained their age-expected levels whereas the non-intervention children showed a decline in age-related reading scores. Reading relies on cognitive functions such as working memory, and upon the ability to map visual symbols to specific responses. It is possible that learning to read music provides additional practise in mapping visual symbols to the production of sounds or strengthens the same cognitive processes involved in word reading or in recalling digit sequences, processing, speed, or temporal sequence processing. It is possible that it supports critical reading-related subskills such as selective attention, rhythm perception, phonological awareness, and auditory working memory including verbal memory encoding and recall. In our intervention, it is likely that there was a strong involvement of working memory in supporting the learning of visual-auditory cross-modal associations; it may be that cross-modal ‘binding’ may help to explain the observed effect in the word recognition task (Garcia et al., 2019; Toffalini et al., 2019).

In many of the studies exploring the effect of musical training, the types of instruments used have not been specified and it is often unclear as to whether any kind of music notation has been used (Ho et al., 2003; Moreno et al., 2011a, b; Hille et al., 2011; Roden et al., 2012; Hansen et al., 2012; Slater et al., 2014; Nutley et al., 2014; Gordon et al., 2015; Talamani, 2016; Cheung et al., 2017; Jaschke et al., 2018; Saarikivi et al., 2019; Wilbiks & Hutchins, 2020). In our intervention study, we were able to specify the instrument used (the piano), the amount of training time each child received, the type of music notation used, and that it included using two hands, reading the treble and bass clefs, with duets and an end-of-term performance. This study is thus easily replicable.

Although there was no evidence of decline in mean scores for any of our measures, we did find evidence that some individuals did indeed have lower scores at post-test, and the majority of these were in the control group. Given the evidence in other studies that children can demonstrate declining scores, as discussed earlier, we suggest that our results, in common with those of Slater et al. (2014), indicate that music training, that includes learning to read music notation, can reduce the risk or even prevent such decline. We suggest that this is an important area for future research and will be of special interest to classroom practitioners. In such future research, it may be worth considering if we need to disambiguate general music training with reading music in some way.

Overall, our results show evidence for a causal relationship between music learning and improvements in language/verbal skills. Despite the small sample size, the intervention procedure may be of interest to classroom practitioners and the analyses controlling for pre-test measures provide interesting evidence that future studies may explore.