The human nervous system makes use of sensory regularities to drive accurate perception, especially when confronted with challenging perceptual environments [1]. It is thought that the brain shapes perception according to predictions that are made based on regularities; this shaping is accomplished by comparing higher-level predictions with lower-level sensory encoding of an incoming stimulus via the corticofugal (i.e., top down) system [2]. This is a common neural feature that spans sensory modalities and can be observed in neural responses to regularly-occurring, as opposed to unpredictably-occurring, stimuli [35]. The brain's ability to use sensory regularities is a fundamental feature of auditory processing, promoting even the most basic of auditory experiences such as language processing during infancy [6, 7] and speech comprehension amidst a competing conversational background [5]. Failure of the brain to utilize sensory regularities has been associated with neural dysfunction, such as schizophrenia [8] and language impairment (e.g., dyslexia) [5, 911].

The impact of stimulus regularity on auditory processing has been well established in the auditory cortex [1, 3] and was recently documented at and below the level of the brainstem [1215]. Specifically, neural potentials to frequently-occurring sounds exhibit enhanced frequency tuning in both the primary auditory cortex [16] and in the auditory brainstem [5, 17]. This sensory fine-tuning occurs rapidly, does not require overt attention and may enable enhanced object discrimination [14, 18]. Although reference to the neural enhancement of a repeated speech sound might seem contradictory to the well-known repetition suppression of cortical evoked response magnitudes, the neural mechanisms underlying this effect remain debated. While some have proposed that stimulus repetition leads to overall decreased neuronal activity, others have suggested that repetition facilitates precision in neural representation by enhancing certain aspects of the neural response while inhibiting others (e.g., more precise inhibitory sidebands surrounding a facilitated response to the physical dimensions of a repeated stimulus) [4].

Human auditory brainstem responses (ABRs) to the pitch of predictably presented speech are enhanced relative to ABRs to speech presented in a variable context [5]. The extent of this subcortical enhancement of regularly-occurring speech relates to better performance on language-related tasks, such as reading and hearing speech in noise. This fine-tuning is thought to be driven by top-down cortical modulation of subcortical response properties [19] and its absence in poor readers is consistent with proposals that child reading impairment stems from the brain's inability to benefit from repetition in the sensory stream. Specifically, children with dyslexia fail to form perceptual anchors--a type of perceptual memory--based on repeating sounds [9, 11].

Although we have made gains in understanding the auditory processing of speech regularities in children with reading impairment (or lack thereof), we do not know how auditory expertise shapes these mechanisms. The auditory expertise engendered by musical training during childhood and into adulthood promotes the subcortical encoding of speech [20, 21] and may strengthen neural mechanisms that undergird child literacy [2224]. Although the integrative nature of music and language abilities continues to be debated [2527], a growing body of work supports shared abilities for music and reading, with music aptitude accounting for a substantial amount of the variance in child reading ability [2830] even after controlling for nonverbal IQ and phonological awareness [31]. It is thought that strengthened top-down control, which is important for modulating lower-level neural responses, unfolds with expertise [32] and, more specifically, with musical training [33, 34].

In order to define relationships between musical skill and literacy-related aspects of auditory brainstem function, we assessed subcortical processing of speech regularities, music aptitude and reading abilities in school-aged children. Our overarching goal was to define common biological underpinnings for music and reading abilities. We anticipated that music aptitude and literacy abilities would positively correlate with subcortical spectral enhancement of repetitive speech cues. We also explored relationships between musical skill and literacy-related aspects of auditory cognitive function through working memory assessments [35, 36], which included an auditory attention component. We anticipated that music aptitude and literacy abilities would positively correlate with auditory working memory and attention performance. In order to delineate and quantify relationships among variables, we applied the data to Structural Equation Modeling (SEM). SEM relies on a variety of simultaneous statistical methods (e.g., factor analysis, multiple regressions and path analysis combined with structural equation relations) to evaluate a hypothesized model [37]. Although more traditional regression analyses are useful for delineating causal relationships among variables, SEM enables more efficient characterization of complex, real-world processes than can be achieved using correlation-based analyses [38]. Specific benefits of SEM include the simultaneous analysis of multiple interrelated variables, consideration of measurement error, and inherent control for multiple comparisons. We expected SEM to substantiate our hypothesis that music aptitude predicts much of the variance in literacy abilities by way of shared cognitive and neural mechanisms.

Materials and methods

Participants

42 normal hearing children between the ages of 8-13 years (M = 10.4, SD = 1.6, Males = 26). Participants and their legal guardians provided informed assent and consent according to Northwestern University's Institutional Review Board. Because we aimed to evaluate neural function and music aptitude across a spectrum of readers, no literacy restrictions were applied but all participants demonstrated normal audiometric thresholds (≤20 dB HL pure tone thresholds at octave frequencies from 125 to 8000 Hz) and IQ (≥85 score on the Wechsler Abbreviated Scale of Intelligence) [39]. Participants also had clinically normal ABRs to 80 dB SPL 100 μs click stimuli that were presented at 31.1 Hz.

Extent of extracurricular activity was assessed by a parent questionnaire (the Child Behavior Checklist [40]). Parents rated their child's current extracurricular activities according to the frequency of the child's involvement--less than average, average, or more than average; these scores were summed to produce a single extracurricular activity score.

Good (n = 8) and poor readers (n = 21) were differentiated based on reading ability (Test of Word Reading Efficiency; see Reading and working memory, below) [5]. Children with scores ≤90 were included in the poor reading group, while good readers had scores ≥110. 13 subjects did not meet the criteria for either group and were excluded from group analyses. Good and poor readers did not differ in age (Mann-Whitney U test; z = -0.223, p = 0.83), sex (Pearson Chi-Square χ2 = 0.12, p = 0.73), socioeconomic status as inferred by maternal education [41] (Pearson Chi-Square χ2 = 1.10, p = 0.59), years of musical training (Mann-Whitney U test; z = -0.231, p = 0.82), extent of extracurricular activity (Mann-Whitney U test; z = -1.202, p = 0.23) or nonverbal IQ (Mann-Whitney U test; z = -1.834, p = 0.07). With regard to musical training histories, 36 of the 42 children had undergone no to only a few months of musical training and were not currently involved in music activities. The other six children had participated in at least one year of musical training. One of these children was categorized as a poor reader, two were categorized as good readers and three were considered average readers (as such, these three were not included in either reading group).

Reading and working memory

Standardized literacy measures assessed oral (Test of Word Reading Efficiency, TOWRE) [42] and silent (Test of Silent Word Reading Fluency, TOSWRF) [43] reading speed. The TOWRE requires children to read aloud lists of real words (Sight subtest) and nonsense words (Phonemic Decoding subtest) while being timed. The two subscores are combined to form a composite score (here referred to as the TOWRE). The TOSWRF requires participants to quickly identify printed words by demarcating lines of letters into individual words while being timed. Participants are presented with rows of words that gradually increase in reading difficulty and they are asked to separate them (e.g., dimhowfigbluedim/how/fig/blue). TOWRE ("reading efficiency") and TOSWRF ("reading fluency") age-normed scores were averaged in order to create a composite Reading variable for correlation analyses.

Auditory working memory was assessed using the Memory for Digits Forward subtest of the Comprehensive Test of Phonological Processing [44] and the Memory for Digits Reversed subtest of the Woodcock Johnson Test of Cognitive Abilities [45]. Digits forward and digits reversed age-normed scores were averaged in order to create a composite score for correlation analyses. In light of auditory attention's contribution to memory for digits forward [46], composite performance on both digits forward and reversed subtests is referred to as Auditory Working Memory and Attention (AWM/Attn).

Music aptitude

Music aptitude was assessed using Edwin E. Gordon's Intermediate Measures of Music Audiation (IMMA) [47], which measures children's abilities to internalize musical sound and compare two sequentially presented sound patterns. Tonal aptitude was assessed by the Tonal subtest, in which participants are presented with 40 pairs of musical excerpts that do not differ rhythmically but may differ melodically. Rhythm aptitude was assessed by the Rhythm subtest, in which participants are presented with 40 pairs of short excerpts that do not differ melodically but may differ rhythmically. For both subtests, participants indicate whether the two excerpts in each pair are the same or different. The subtest scores are combined to generate a composite music aptitude score. The rhythm, tonal and composite scores are normed by academic grade in order to produce percentile rankings.

Auditory brainstem measures

Brainstem responses to the speech sound /da/ were collected from Cz using Scan 4.3 (Compumedics, Charlotte, NC) under two conditions. Ag-AgCl electrodes were applied in a vertical, ipsilateral montage (i.e., FPz as ground, right earlobe as reference). Evoked potentials recorded with this electrode montage have been found to reflect activity from an ensemble of neural elements of central brainstem origin [48, 49]. In the predictable condition, the speech sound /da/ was presented at a probability of 100%, whereas in the variable condition /da/ was randomly interspersed in the context of seven other speech sounds at a probability of 13% (Figure 1). The seven speech sounds varied acoustically according to a variety of features, including formant structure (/ba/, /ga/, /du/), duration (a 163 ms /da/), voice-onset-time (/ta/) and F0 (250 Hz /da/, /da/ with a dipping pitch contour). The /da/ stimulus was a six-formant, 170 ms speech syllable synthesized in Klatt [50] with a 5 ms voice onset time and a level fundamental frequency (F0, 100 Hz). The first, second and third formants were dynamic over the first 50 ms (F1, 400-720 Hz; F2, 1700-1240 Hz; F3, 2580-2500 Hz) and then maintained frequency for the rest of the duration. The fourth, fifth and sixth formants were constant throughout the entire duration of the stimulus (F4, 3300 Hz; F5, 3750 Hz; F6, 4900 Hz). For a detailed description of the seven other speech sounds, see Chandrasekaran et al. (2009).

Figure 1
figure 1

Auditory brainstem response recording conditions. We recorded ABRs to the same speech sound in two different conditions. For the predictable condition, /da/ was repeated at a probability of 100%. In the variable condition, /da/ was randomly interspersed in the context of seven other speech sounds. We trial-matched responses to compare ABRs recorded in the variable condition to those recorded in the predictable condition without the confound of presentation order or trial event.

The stimulus was presented to the right ear via insert earphones (ER-3; Etymotic Research, Elk Grove Village, IL) at 80 dB SPL and at a rate of 4.35 Hz. This fast presentation rate limits the contribution of cortical neurons, which are unable to phase-lock at such fast rates [49]. Furthermore, the stimulus was presented in alternating polarities and average responses to each polarity were subsequently summed in order to limit contamination of the neural recording by the cochlear microphonic [51]. During recording sessions, participants watched videos of their choice in order to maintain a still yet wakeful state with the soundtrack quietly playing from a speaker, audible through the nontest ear. Because auditory input from the soundtrack was not stimulus-locked and stimuli were presented directly to the right ear at a +40 dB signal-to-noise ratio, the soundtrack had no significant impact on the recorded responses [51].

Responses were digitally sampled at 20,000 Hz, offline filtered from 70 to 2000 Hz with a 12 dB roll-off and epoched from -40 to 190 ms (stimulus onset at time zero). Events with amplitudes beyond ± 35 μV were rejected as artifacts. Responses to 100 μs clicks were collected before and after each recording session in order to ensure consistency of wave V latencies, confirming no differences in recording parameters or subject variables.

As in Chandrasekaran et al. [5], we compared the brainstem responses to /da/ recorded in the variable condition to trial-matched responses recorded to /da/ in the predictable condition (Figure 1). Specifically, neural responses in the predictable condition were averaged according to their occurrence relative to the order of presentation in the variable condition, resulting in 700 artifact-free responses for each condition.

In accordance with Chandrasekaran et al., we examined the strength of the spectral encoding of the second and fourth harmonics (H2 and H4) in average responses for each participant over the formant transition of the stimulus (7-60 ms in the neural response) via fast Fourier transforms executed in Matlab 7.5.0 (The Mathworks, Natick, MA). Spectral magnitudes were calculated for 10 Hz-wide bins surrounding H2 and H4. The differences in the spectral amplitudes of H2 and H4 between the two conditions (predictable minus variable) were calculated for each participant and normalized through conversion to a z-score based on the group mean.

Statistical Analyses

The brainstem response z-scores were compared across conditions and groups using a Repeated Measures ANOVA and correlated with the reading and music aptitude measures using Pearson's correlations (SPSS Inc., Chicago, IL). RMANOVA outcomes were further defined in a post-hoc analysis using Mann-Whitney U-tests. All results reflect two-tailed values and normality for all data was confirmed using the Kolmogorov-Smirnov test for equality.

Structural Equation Modeling

We normalized all data through conversion to z-scores based on group means. Analysis of covariance matrix structures was conducted with Lisrel 8.8 (Scientific Software International Inc., Lincolnwood, IL) and solutions were generated based on maximum-likelihood estimation. We defined the model's directions of causality in accordance with our aims, being to define common biological and cognitive factors to account for the covariance in child reading and music abilities. We selected the Root Mean Square Error of Approximation (RMSEA) in order to evaluate the model's goodness of fit, with measurements below 0.08 indicative of good model fit [52]. Lisrel 8.8 also calculates the likelihood ratio (χ2), its degrees of freedom and probability whenever maximum likelihood ratios are computed. The χ2 test functions as a statistical method for evaluating structural models, describing and evaluating the residuals that result from fitting a model to the observed data. A χ2 probability value greater than 0.05 indicates a good model fit [52].

Results

The extent of subcortical enhancement of repetitive speech cues correlated with music aptitude and literacy abilities. Common variance among subcortical enhancement of repetitive speech cues, music aptitude and reading abilities was not accounted for by overarching factors such as socioeconomic status, extracurricular involvement or IQ.

SEM indicates that, by way of common neural (auditory brainstem) and cognitive (auditory working memory/attention) functions, music skill accounts for 38% of the variance in reading performance. The resulting statistical model delineates and quantifies relationships among auditory brainstem function, music aptitude, memory/attention and literacy.

Music aptitude correlates with reading performance

Music aptitude correlated with reading performance. These relationships were largely driven by performance on the Rhythm music aptitude subtest (Rhythm-TOWRE: r = 0.41, p < 0.01; Rhythm-TOSWRF: r = 0.31, p < 0.05; Tonal-TOWRE: r = 0.16, p = 0.32; Tonal-TOSWRF: r = 0.26, p = 0.09), although the relationships between music aptitude and reading performance were strongest when considering the composite music aptitude score, which considers both Tonal and Rhythm performance (Composite-TOWRE: r = 0.45, p < 0.005; Composite-TOSWRF: r = 0.39, p < 0.01).

Subcortical enhancement of predictable speech relates with reading and music abilities

Poor readers showed weaker subcortical enhancement of spectral components of speech sounds (2nd and 4th harmonics) presented in the predictable, contrasted with the variable, condition than good readers (Figure 2a). No other significant neural differences were observed between groups, such as for the subcortical enhancement of the F0 or other harmonics. A 2 (condition) × 2 (reading group) × 2 (harmonic) RMANOVA demonstrated an interaction between condition and reading group (F = 13.33, p < 0.001). Post-hoc Mann Whitney U-tests demonstrated that good readers have a greater enhancement of speech harmonics presented in the predictable condition than poor readers (H2: z = -2.25, p < 0.05; H4: z = -2.98, p < 0.005; Figure 2a).

Figure 2
figure 2

Subcortical enhancement of predictable speech relates with music and reading abilities. (A) Good readers demonstrate greater enhancement of speech presented in the predictable condition, compared to the variable condition, than poor readers. (B) The amount of enhancement observed in the predictable condition positively correlates with reading ability and music aptitude.

The amount of enhancement observed in ABRs recorded in the predictable compared to the variable condition positively correlated with reading and music aptitude performance across all subjects. The reading composite score (produced by combining TOWRE and TOSWRF z-scores) correlated with the amount of brainstem enhancement for both H2 and H4 (H2: r = 0.44, p < 0.005; H4: r = 0.40, p < 0.01; Figure 2b). The music composite score also correlated with the amount of brainstem enhancement to both harmonics (H2: r = 0.33, p < 0.05; H4: r = 0.37, p < 0.01; Figure 2b).

Auditory working memory and attention relate with reading and music abilities

Reading and music aptitude positively correlated with performance on the auditory working memory tasks--memory for digits forward and digits reversed. Higher AWM/Attn correlated with better reading performance (TOWRE: r = 0.45, p < 0.005; TOSWRF: r = 0.38, p < 0.01). Likewise, higher AWM/Attn correlated with higher music aptitude (r = 0.44, p < 0.005). The relationship between AWM/Attn and music aptitude appeared to be largely driven by the rhythm subtest (Tonal: r = 0.203, p < 0.20; Rhythm: r = 0.49, p < 0.001; Figure 3).

Figure 3
figure 3

Auditory working memory correlates with music aptitude. Higher rhythm, but not tonal, aptitude correlates with better auditory working memory and attention (AWM/Attn) performance.

Although AWM/Attn correlated with the amount of brainstem enhancement to both harmonics (r = 0.35, p < 0.05), the covariance between these measures could be accounted for by their relationships with music aptitude. Whereas partialing for AWM/Attn did not eliminate the common variance observed between music aptitude and repetitive harmonic enhancement (r = 0.32, p = 0.04), AWM/Attn and repetitive harmonic enhancement no longer covaried when partialing for music aptitude (r = 0.20, p = 0.20). This suggests that most of the covariance between AWM/Attn and repetitive harmonic enhancement can be explained by their shared variance with music aptitude.

Consideration of overarching factors

Common variance among subcortical enhancement of repetitive speech cues, music aptitude and reading abilities could not be accounted for by overarching factors such as IQ, socioeconomic status (SES) or extracurricular involvement (ExCurr). SES and ExCurr did not correlate with any of our observed variables (Table 1). IQ, on the other hand, accounted for a significant amount of the variance in our test variables (brainstem function: r = 0.37, p < 0.02; reading performance: r = 0.45, p < 0.02; auditory working memory: r = 0.37, p < 0.001). Although IQ did not correlate with overall music aptitude or the tonal aptitude subscore (composite: r = 0.25, p = 0.11; tonal: r = 0.02, p = 0.89), it correlated with the rhythm aptitude subscore (r = 0.38, p < 0.02). Given that covarying for IQ did not eliminate the correlations observed among our test variables (music × reading: r = 0.41, p = 0.03; music × memory/attention: r = 0.47, p = 0.01; music × subcortical function: r = 0.41, p = 0.03; reading × subcortical function: r = 0.52, p = 0.004; reading × memory/attention: r = 0.43, p = 0.04), we conclude that IQ did not account for the common variance reported among music aptitude, reading ability, working memory/attention and subcortical and cognitive function.

Table 1 Subjects' socioeconomic status (SES) and extracurricular activity involvement did not correlate with the test variables of music aptitude, auditory brainstem enhancement of repetitive speech cues, reading, or auditory memory/attention

Modeling relationships among music aptitude, reading ability and subcortical function

In order to more comprehensively examine relationships among music aptitude, subcortical processing of speech regularities and reading ability, we subjected these data to SEM [37]. SEM provides a mathematical method for evaluating relationships among independent and dependent variables in a model hypothesized a priori. Our hypothesized model, depicted in Figure 4, projected that music aptitude predicts reading ability by means of subcortical processing of speech regularities and AWM/Attn function.

Figure 4
figure 4

Structural equation model (SEM) of music aptitude, reading, auditory working memory/attention and auditory brainstem function. Music aptitude accounts for 38% of the variability in reading ability through its impact on auditory working memory/attention and subcortical enhancement of predictable speech harmonics. The model demonstrates an excellent fit; values plotted represent squared correlation coefficients (r2). *p < 0.05; **p < 0.01; ***p < 0.001.

By means of subcortical enhancement of predictable speech harmonics and AWM, music aptitude accounted for 38% of the variability in reading ability (p < 0.01). The model demonstrated an excellent fit (χ2(18) = 17.64, p > 0.35; RMSEA = 0.05). All path coefficients were significant except for the path between Tonal Aptitude and Composite Music Aptitude (r2 = 0.03, p = 0.31). This model emphasizes the combined strength of relationships among rhythm aptitude, subcortical enhancement of predictable speech harmonics and AWM/Attn in predicting child reading ability.

Discussion

We observed correlations among music and literacy abilities with the extent of subcortical enhancement of predictable speech cues. As such, our data reveal common, objective neural markers for music aptitude and reading ability and suggest a model for the relationships that have been documented between music and literacy performance [2831, 53].

Our data also reveal common cognitive markers for music aptitude and reading ability. Auditory working memory and attention are driving components of child literacy [35, 36], and relationships between auditory working memory and attention and musical skill have already been established [33, 54]. Not only do musicians demonstrate better verbal memory than nonmusicians, but this advantage can be seen with as little as one year of musical training [55]. Our results demonstrate a similar relationship between auditory working memory and attention and music aptitude in children, although this relationship is observed regardless of musical training backgrounds.

The role of the descending auditory system

As in Chandrasekaran et al., we observed subcortical enhancement of a predictable, contrasted with a variable, speech presentation [5]. This enhancement was specific for frequencies integral to the perception of pitch (H2 and H4). Similar repetition-induced frequency enhancement has been observed in the primary auditory cortex, where neurons exhibit sharpened acuity to stimulus frequency [16]. This tuning occurs without overt attention, is stimulus specific and develops rapidly [3, 56]. Not surprisingly, enhanced neural tuning with stimulus repetition has been proposed to relate with improved object discrimination [16, 18].

The ability of the sensory system to automatically modify neural response properties according to expectations in a dynamic and context-sensitive manner is thought to have evolved to infer and represent the causes of change in our environment [1, 57]. This modification may occur in a descending fashion, beginning in extra-sensory cortices where predictions are developed based on prior experience (such as with repetition) and sequentially tuning lower level response properties to heighten sensory acuity [2, 32, 57, 58]. The descending nature of this neural tuning is supported by observations from cortical work showing decreased onset latencies from 120 ms (after two repetitions) to 50 ms (after 30 repetitions) [56] and is thought to represent the strengthening of the stimulus-specific memory trace at earlier and earlier processing stages [3]. The correlations reported here between music aptitude and reading ability with subcortical fine-tuning to predictable speech sounds may indicate stronger top-down modulatory systems in individuals with better music aptitude and reading performance.

Musical experience boosts sensitivity to sound patterns

Our data demonstrate that diminished subcortical enhancement of predictable speech sounds relates with reading impairment. Similar observations have been made in poor readers, in addition to children with poor perception of speech presented in background noise [5]; we extend these findings to the domain of music. This relationship is not surprising given the importance of sound repetition and sequencing for music perception. Specifically, repetition and regularity lends to the perception of tonality [59], rhythm and meter [60, 61] and the structural use of musical themes. Deviations from predicted patterns result in impaired music production and perception [6264] and can be flagged by the auditory cortex in both musically trained and untrained individuals, as measured by auditory evoked potentials [6567]. Increased sensitivity to deviations from patterns in musical sound is thought to reflect enhanced sensory memory and discrimination abilities as well as more firmly established categorical boundaries [68].

It is not surprising that we observed correlations between music aptitude and subcortical spectral enhancement of predictable speech sounds given that musical expertise increases one's sensitivity to sound patterns not only in music, but also in speech [34, 69]. Although the argument can be made for a genetic contributor to musicians' enhanced sound processing, this increased sensitivity can be modulated, at least in part, by one's method of musical practice and training [70]. Furthermore, diverse methodological approaches consistently reveal correlations between the extent of structural and functional neural enhancement observed in musicians and their years of musical practice or age of practice onset [7174]. Such observations suggest the substantial contribution of experience-induced neuroplasticity to musicians' enhanced sound processing and may be attributed to the strength of top-down contributors to auditory processing [33, 69].

Subcortical enhancement of predictable speech: implications for reading impairment

Due to its multisensory nature, attentional demands and reliance on rapid audio-motor feedback, music is a powerful tool for engendering neural plasticity, particularly for auditory processing [34, 7578]. This plasticity is not constrained to the brain's music networks but applies more generally to auditory functions [27, 69, 72, 7982]. Clinicians and researchers involved in the treatment and assessment of reading dysfunction have long held interest in the potential for musical training to strengthen neural networks for reading. Wisbey was one of the first to formally propose that music, by facilitating the development of multisensory awareness and auditory acuity, could promote reading in impaired children [83]. This proposal has been verified by a number of experiments [84, 85] (c.f. Morais et al., 2010 [25]), with relationships between music and reading abilities observed in many more [2830, 53, 86].

Definition and characterization of common neural mechanisms for music and reading skills may enable the development of a biological assessment of reading impairment and improve the efficacy of remedial attempts. Reading performance is known to rely on a chorus of multifaceted and complex processes that have proven difficult to disentangle; here, we find that subcortical function serves as a significant and accessible factor in reading impairment, accounting for 44% of the variance in child reading ability. The use of auditory brainstem measurements to assess learning and reading impairment has emerged in recent years [21, 87, 88], is being adapted for the clinic and can provide an objective index of the success of auditory [89, 90] and music training [21]. In light of the high test-retest reliability of the speech-evoked ABR [91], individual responses are highly replicable and can be meaningfully compared to group means or established norms. Identification of common neural markers for music and reading skill, such as those reported here, may lead to the biological assessment of music-associated learning abilities in children and encourage the employment of music as a technique for literacy remediation.

Musical training during early childhood may be particularly important for the advancement of music and reading aptitude. Although the music test employed here is thought to measure music aptitude, being one's inherent ability for music, the creator of this measure, Edwin E. Gordon, has long emphasized the impact of music education during early childhood on music aptitude scores. Gordon makes this claim in light of his extensive longitudinal work showing that music aptitude can improve with musical training, particularly during early childhood [92]. The importance of an early onset of music activities is more directly supported by outcomes from neuroscientific research, in which many of the neuroplastic changes associated with musical training are more extensive in individuals who began training earlier in their lifetimes [71, 72, 9396]. With regard to auditory brainstem processing, we found that ABRs in young adult musicians who began musical training prior to age 7 were distinct from those in musicians who began training between the ages of 7-13 [72, 93]. Whereas musicians who began training prior to age 7 demonstrated enhanced ABRs to the spectral components of communication sounds compared to nonmusicians, those who began later in life did not. Observations such as this reflect a critical period for musical training-associated neural plasticity [97] and may speak to the importance of initiating musical training during early childhood for bringing about the greatest impact on music aptitude or, we propose, reading ability.

It remains undetermined whether reading abilities are impacted alongside music aptitude with musical training during childhood or whether the neural mechanism reported here is affected by musical training. Also undetermined is whether relationships between music and reading work in reverse, with language-based literacy remediation leading to improved music aptitude. More work (notably, longitudinal work) is necessary in order to define relationships between music aptitude, literacy and the auditory brainstem response to speech as well as to determine the impact of formal training, the efficacy of specific training approaches and/or literacy remediation programs.

Conclusions

Reading relies on a complex and multifaceted combination of processes that have proven difficult to disentangle. In light of correlational and structural modeling analyses, we conclude that subcortical function serves as a significant and accessible factor underlying reading ability and impairment, predicting 44% of the variance in reading ability. Further outcomes reveal direct relationships between musical skill and literacy-related aspects of auditory brainstem and memory/attention function, revealing common neural and cognitive mechanisms for reading and music abilities that may operate, at least in part, via corticofugal shaping of sensory function. By way of auditory brainstem spectral enhancement of predictable speech and auditory working memory/attention, music skill predicts approximately 40% of the variance in reading performance. Definition of common neural and cognitive mechanisms for music and reading skills may support the usefulness of music for promoting child literacy, with the potential to improve the efficacy of remedial attempts.

Appendix A

Grouping according to good and poor music aptitude

The extent of brainstem enhancement of predictable speech in subjects with high (IMMA ≥70th percentile; n = 18) and low (IMMA ≤30th percentile; n = 9) music aptitude patterned with the results observed when subjects were divided into good and poor readers. A 2 (condition) × 2 (music group) × 2 (harmonic) RMANOVA demonstrated an interaction between condition and music group (F = 6.17, p < 0.02). Post-hoc Mann Whitney U-tests demonstrated that subjects with high music aptitude have a greater enhancement of the second harmonic of speech presented in the predictable condition compared to the variable condition than subjects with low music aptitude (H2: z = -1.96, p < 0.05; H4: z = -1.29, p = 0.19).