Introduction

The role of phonological awareness (PA) in reading is well known in the literature, although some studies show that the PA—reading relationship is complex, bidirectional, and varies depending on many factors, particularly the orthographic complexity of the language (Landerl et al., 2019). In addition to PA, recent studies have considered prosodic sensitivity (PS; i.e., sensitivity to prosodic cues in spoken language) as an additional predictor of reading skills (for an overview, see Thomson & Jarmulowicz, 2016; Wade-Woolley et al., 2022). One crucial question is whether PS predicts reading over and above PA. Empirical results on this issue are contradictory, probably due to the use of different methodological approaches to measure PA, PS and reading (Wade-Woolley & Heggie, 2016). The present study investigates the effects of both PS and PA on basic reading skills in a shallow orthography (German), where PA seems to play a less important role than in deep orthographies (Landerl & Wimmer, 2008; Schabmann et al., 2009). This study also examines whether the effect of PA/PS on reading is influenced by the use of different reading measures (word-level — less semantically demanding tests vs. sentence-level – more semantically demanding tests).

Prosodic sensitivity and reading

PS can be defined as the sensitivity to rhythmic patterns in spoken language. There are three different components of prosody (Holliman et al., 2014): stress, intonation and timing. Stress can be further divided into metrical and lexical stress, although the two overlap in some cases (Gutiérrez-Palma et al., 2016). Metrical stress captures the rhythmic pattern at the level of phrases and sentences (Goodman et al., 2010; Holliman, 2016). Lexical stress has a lexical contrastive function. For example, in English, lexical stress is important for distinguishing grammatical categories (Record—reCORD; capital letters denote stressed syllables) (Critten et al., 2021; Lin et al., 2018) and semantic meanings (e.g., in English DEserts vs. deSERT; e.g., in German: MOdern [to molder] vs. moDERN [modern], Sauter et al., 2012). Additionally, stress at the level of phrases/sentences marks the important information within a sentence (e.g., HE is in the house vs. He is in the HOUSE) (Holliman et al., 2010a). Intonation helps distinguish between questions (ending with a rise in intonation), utterances (ending with a fall in intonation) and sarcasm (Holliman et al., 2010b). The prosodic cue of timing is necessary to distinguish whether word strings are compound words, noun phrases or adjective-noun couples (bedroom vs. bed, room; blackboard vs. black board) (Kitzen, 2001).

There is evidence that PS shares common variance with word reading from both group comparison and correlational studies. In group comparisons, individuals with dyslexia have been shown to have weaker PS compared to individuals without dyslexia (Cuetos et al., 2018; Goswami et al., 2010, 2013; Leong et al., 2011). English-speaking dyslexic children (Goswami et al., 2010, 2013) as well as dyslexic adults (Leong et al., 2011) exhibited lower performance than non-dyslexics on various PS tasks at the word level (e.g., decide whether two words have the same stress) as well as at the phrase level, e.g., the DEEdee task, in which participants heard sequences of stressed and unstressed ‘dee’ syllables and had to decide which DEEdee sequence of stress patterns matched the target phrase (e.g., “Bob the Builder” matches [DEE dee DEEdee]; Whalley & Hansen, 2006). Studies in more transparent orthographies than English have reported similar results. For example, Cuetos et al. (2018) compared dyslexic Spanish-speaking children with a group of age-matched (11 years old) and a group of reading-matched controls. The dyslexic group exhibited lower performance not only in PA but also in PS compared to age-matched controls. In one PS subtest, they also performed worse than the reading-matched children.

Of particular interest are correlational studies examining the effect of PS after controlling for PA. In their review, Wade-Woolley and Heggie (2016) summarized the findings of 10 studies. In five out of seven studies in which isolated word reading was used as the reading outcome, PS remained a predictor of word reading after PA was included in the regression equation. This was also the case in one out of three studies in which nonword reading was used as the reading outcome. Moreover, numerous studies published after Wade-Woolley and Heggie’s (2016) review showed that PS predicted word reading after controlling for PA (Arciuli, 2017; Chan & Wade-Woolley, 2018; Critten et al., 2021; Enderby, 2021; Holliman, et al., 2017; Lin et al., 2018; Wade-Woolley, 2016).

There is also evidence that individual differences in PS are related to reading in more transparent orthographies than English (Spanish: Calet et al., 2015, 2020; Defior et al., 2012; Gutiérrez-Fresneda et al., 2021; Gutiérrez-Palma et al., 2009; Greek: Anastasiou & Protopapas, 2015; Italian: Caccia et al., 2019; German: Obergfell et al., 2021, 2022). Gutiérrez-Palma et al. (2009) found a relationship between stress sensitivity and text reading, but not word and pseudoword reading. In contrast, Defior et al. (2012) found that stress awareness explained variance in word reading. Calet et al. (2015) examined different types of stress (lexical and metrical) in a longitudinal study. Lexical stress at the beginning of Grade 1 predicted reading at the end of Grade 1. Metrical stress at the end of Grade 1 predicted reading at the beginning of Grade 2, and metrical stress at the beginning of Grade 2 predicted reading at the end of Grade 2. The authors argued that metrical stress is more important when reading becomes more fluent. This argument is in line with findings by Miller and Schwanenflugel (2008) and Gutiérrez-Palma et al. (2009).

Unique and common effects of PA and PS

As our primary interest in this paper is the effect of PS on basic reading skills, we focus on the well-known dual route model of reading (Coltheart, 2005). This model distinguishes between a lexical route via which words are retrieved as a whole from a mental lexicon (which not only has entries about the written form of a word and its phonology, but also about the word meaning) and a non-lexical route via the phonological recoding of words. Words are connected with semantic meanings, while nonwords are not. In our working model (Fig. 1), we assume that PS helps to identify words via its lexical contrastive function based on stress patterns stored in the lexicon, which help to distinguish grammatical categories and semantic meanings. In fact, there is evidence of such a direct effect of PS on reading (Wade-Woolley & Heggie, 2016) when PA is taken into account (Arciuli, 2017; Calet et al., 2015; Gutiérrez-Palma, Defior, et al., 2016a, 2016b; Holliman, et al., 2017; Lin et al., 2018; Obergfell et al., 2021; Wade-Woolley, 2016; Wood, 2006a). This direct effect concerns both fluency and reading comprehension (Kuhn & Stahl, 2003; Schwanenflugel & Benjamin, 2017). One explanation for this effect is that lexical stress patterns are stored in the mental lexicon. Although little is known about the role of PS in the mental lexicon, some authors argue that sensitivity to the suprasegmental properties of words (stress) is an important prerequisite for their later representation in the mental lexicon (Mendonça Alves et al., 2015). Accordingly, making a correct stress assignment should make it easier to retrieve words quickly and accurately from the mental lexicon (Lin et al., 2018; Lindfield et al., 1999), because mis-matching words with the same beginning but a different stress pattern can be eliminated (Protopapas, 2016).

Fig. 1
figure 1

Working model of (common and unique) effects of PS and PA on word reading

The other way PS might influence reading is indirectly via a common effect with PA. Wood (2006b) argued that PS may be an underlying process of PA because PS explains a significant amount of variance in PA. PS influences rhyme awareness. In speech, vowels are marked with a loudness peak (Goswami, 2003; Wood & Terrell, 1998). Sensitivity to vowel occurrence facilitates the detection of onset-rime boundaries, which is crucial for reading (Goswami et al., 2002; Holliman et al., 2012; Wood et al., 2009). Onset-rime is more regular in English than grapheme-phoneme correspondence (Ziegler & Goswami, 2005); therefore, awareness of onset-rime contributes to reading development by facilitating the use of phonemic similarities/analogies across words (Holliman, et al., 2017; Wood et al., 2009). In particular, recognition of prosodic cues (such as duration and loudness) in stressed syllables might be of some importance for phoneme identification. It might be easier to identify phonemes (especially vowels) in stressed rather than in unstressed syllables, which often consist of vocals with reduced quality, such as schwa (Holliman, 2016; Holliman et al., 2012; Wood et al., 2009).

PS and PS in German

The model presented above should — in principle — also hold for German’s regular orthography. However, some methodological issues must be considered. In most studies examining the PS-reading relationship, PS was measured on the word (e.g., Chan & Wade-Woolley, 2018; Defior et al., 2012; Goodman et al., 2010; Gutiérrez-Palma, Defior, et al., 2016a, 2016b; Holliman et al., 2010a, 2010b; Lin et al., 2018; Wade-Woolley, 2016) or short phrase level (i.e., DEE-dee task; e.g., Enderby et al., 2021; Goswami et al., 2010; Whalley & Hansen, 2006). In all of these studies, reading was measured using accuracy scores (i.e., number of words read correct); two studies (Chan et al., 2020; Goswami et al., 2010) also used combined accuracy-fluency measures. The latter is also the case in some studies in more regular orthographies, which likewise used (combined) fluency measures (e.g., Caccia et al., 2019; Calet et al., 2015, 2020; Obergfell et al., 2022).

These combinations of word-level PS and reading accuracy might be of limited relevance in German. First, in German, the effect of PS should be on the sentence level rather than on word level. There are few stress variations at the word level in German. About 86% of bisyllabic nouns are trochaic (first syllable stressed) and only 14% have a final stress (Beyermann, 2013). Furthermore, German has rules determining whether a syllable is stressed or unstressed (e.g., for compound words or prefixes). For instance, the rule for compound nouns (Féry, 1996) alone leads to an accurate pattern in 99.5% of bisyllabic German noun compounds in the CELEX database (Beyermann, 2013). These regularities in German suggest that measuring PS at the word level would be too simple and therefore not informative. Furthermore, since German is a stress-timed languageFootnote 1 (stressed syllables appear approximately periodically), it has relatively high prosodic variability in its structure of syllables (Sauter et al., 2012). Within a sentence, certain syllables are longer and louder at almost identical intervals, and alternate with a variable number of unstressed syllables. Consequently, measuring PS on the sentence level captures not only perception of stress syllables but also perception of the complete rhythm of a sentence (Schmidt et al., 2016).

Secondly, the indirect path via PA might be of minor importance in German third-graders. PA is a better predictor of reading accuracy than of reading fluency (Landerl et al., 2019), and it is well-established that PA is only a weak predictor of reading in this age group (Landerl & Wimmer, 2008). Due to German’s very transparent grapheme-phoneme correspondences (GPC), children tend to make relatively few reading accuracy errors in this language, and reading problems tend to concern reading fluency more than reading accuracy, making reading fluency the more useful measure in German (Klicpera & Schabmann, 1993; Landerl & Wimmer, 2008; Wimmer & Schurz, 2010). For instance, in Klicpera and Schabmann’ s (1993) study, German-speaking children made fewer than 5% reading errors on average one year after the onset of instruction, and even the weakest readers in Grade 2 (those performing below the 5th percentile) read eight out of ten words correctly.

Current study and research questions

The purpose of the present study was to investigate the relationship of PS and word, non-word and sentence reading. We aimed to answer the following research questions:

  1. 1.

    Does PS explain unique variance in reading fluency after controlling for PA?

  2. 2.

    How much of PS’s effects on reading fluency are shared with PA?

  3. 3.

    Do the effects of PS vary depending on the type of reading outcome (sentences, word lists, nonword lists)?

Regarding the first and second research questions, our working model (Fig. 1) predicts that PS affects word reading via its contrastive function and as a prerequisite for lexical representation. Furthermore, based on the model, we expected shared variance in PS and PA due to the facilitation of sublexical parts of the word.

Turning to the third research question, we expected that the effect of PS (metrical stress) on reading after controlling for PA should be stronger for a reading test involving evaluating sentences and to a lesser extent for a reading test encompassing real word lists than for a nonword reading test. As stated in the model, assigning stress correctly facilitates quick access to mental representations of words, which improves reading. Stress is also related to semantically meaningful content, which might help to activate the word in the lexicon (during reading). Semantics are present to the greatest extent when reading is tested on the sentence level, to a lesser extent when real word lists are used, and not at all when nonwords are used.

Methods

Participants

The sample comprised 207 third graders (49.8% male). They ranged in age from eight to ten years old (M = 8.62, SD = 0.56). They came from middle-class areas in and surrounding Cologne (Germany). As their parents highest level of education achieved, 5% of the parents had completed lower secondary education, 29.7% a non-university-preparatory high school, 26.7% a university-preparatory high school, and 38.1% university. All children were native German speakers and had not been diagnosed with any impairment (e.g., a specific language or hearing impairment). The participants were recruited from 12 primary schools that have cooperated with our lab for many years. School principals were informed about the study and asked if they would be interested in participating. In addition, each child’s parents gave their written consent for their child to participate.

Measures

Intelligence

Nonverbal intelligence was measured using the standardized Grundintelligenztest Skala 1-R, a German Version of the Culture Fair Test (CFT 1-R; Weiß & Osterland, 2012). This language-free test measures fluid intelligence in the sense of Cattell (1950) and consists of six subtests: substitution, i.e. matching pictograms to shapes (e.g., a watch is a circle), mazes (children had to find their way through a maze drawing), similarities (one stimulus had to be matched to another from among five options; e.g., a rectangle of the same size), sequences (children had to find out rules to complete a sequence of stimuli; e.g., cones of increasing size), classification (children had to identify the stimulus that differed from four others; e.g., a fish compared to four birds), and matrices (children had to select which one out of five figures completes a two x two matrix). Internal consistency ranged from 0.94 to 0.97. The test has high factorial validity in that all subtests load on a single factor (g-factor sensu Cattell). Its correlation with the HAWIK intelligence test (Wechsler, 1956), a widely used intelligence test in Germany, ranges from 0.60 to 0.75.

Phonological awareness

PA was tested using a deletion task developed by Klicpera et al. (1993). Children had to delete one or more phonemes from one- to three-syllable words and nonwords (37 words, 20 nonwords) and then say what was left. All words were part of the basic German vocabulary for primary school children (Plickat, 1983). Cronbach’s alpha for the test was α = 0.90. It had a correlation of r = 0.36 with other measures of word synthesis and r = 0.42 with different tests of phoneme deletion in a sample of young adults (Schmidt et al., 2016).

Prosodic sensitivity

PS was measured using the piano task, which was developed by Sauter et al. (2012). The piano task aims to examine how well stress patterns are recognized. The piano rhythm encompasses patterns of strong (more prominent in length and loudness) and weak beats. This pattern matches the stress patterns of one of three written target sentences. The piano sequence was played three times (students could ask for the sequence to be played additional times) from computer speakers, after which the students had to select the prosodically-matching sentence out of the three presented ones. Twelve piano sequences were presented. The target sentences were read by the instructors with normal speech rhythm (without special emphasis) and also presented in written form. Two practice items were given, and feedback was provided. Both the target sentences and the distractors had the same number of syllables. Therefore, the solution could not be identified simply by counting the number of syllables. Cronbach’s alpha for the test was α = 0.70.

Reading (sentences)

Sentence reading was measured with the 3-min Salzburg Reading Screening (Salzburger Lesescreening—SLS 2–9, Wimmer & Mayringer, 2014), in which children read sentences silently and evaluate whether the meaning of each sentence is true or false. Retest reliability for this test ranges from 0.90 to 0.92. In terms of validity, the SLS’ correlation with reading aloud lists of words ranges from 0.80 to. 90; low SLS scores are associated with slow reading and a high number of long fixations.

Reading (words/nonwords)

Single word reading was measured with the reading section of the Salzburg Reading and Spelling Test II (Salzburger Lese- und Rechtschreibtest II, SLRT II; Moll & Landerl, 2014). Two lists of words and nonwords had to be read aloud as quickly and correctly as possible within one minute. Scores on this test reflect the number of words that were correctly read and are thus composite scores of reading fluency and accuracy. Retest reliability for this test ranges from 0.90 to 0.98. Correlations of the SLRT-II with other reading tests range from 0.69 to 0.92.

Procedure

The children were tested at their respective school over three sessions. One session took place in a classroom setting (intelligence, sentence reading test). The word reading test, phonological awareness tests and the piano task were administered in two individual testing sessions. The children had the opportunity to stop the test at any time. If a child made five consecutive errors, testing was discontinued by the instructor so that the children would not become frustrated by tasks that were too difficult for them. The tests were administered by one of the authors as well as by trained master’s degree students.

Statistical analysis

For the statistical analyses, we performed hierarchical linear regression analyses as well as commonality analyses. Commonality analysis makes it possible to decompose the total explained variance (R2) of a regression model into components that are either unique to a given independent variable or common to two or more independent variables (Nimon & Reio, 2011; Ray-Mukherjee et al., 2014). Statistical analysis was performed using the R package “yhat” (Nimon et al., 2008). Due to a lack of normality and indications of residual autocorrelations, we checked the results using bootstrapping and robust Newey-West standard errors (Newey & West, 1987), which are given in Table 2 along with Durbin-Watson statistics and the variance inflation factors. The only missing data was that one student had missing SLS scores. This child was excluded from the respective analyses.

Results

Descriptive statistics and correlations of measures

Table 1 shows means, standard deviations, zero-order and partial correlations (controlling for IQ) for all measures. No variables were significantly correlated with age. We found significant correlations between IQ and PS, IQ and PA and between IQ and sentence reading. However, controlling for IQ only marginally changed the coefficients of the subsequent analyses. Overall, the correlations between PS, PA and reading were low, but statistically significant in all cases except for nonwords.

Table 1 Means, standard deviations, and zero-order correlations for PS, PA, and reading measures (with theoretical minimum and maximum if appropriate)

Research Question 1: Does PS account for unique variance in reading fluency after controlling for PA?

To determine the specific contribution of PS to reading fluency, we conducted a hierarchical linear regression analysis with two steps each for each measure of reading fluency (sentences, words, nonwords). Model 1 sought to confirm that PS makes a significant contribution to reading, so PS was the only variable entered. In model 2, we entered PA in step one and PS in step 2 to see whether prosodic sensitivity accounts for unique sensitivity in reading after controlling for PA. The results of the regression analyses showed that only 8% (word reading) to 12% (sentences) of reading variance could be explained by PS and PA. PA was a significant predictor of all reading measures, whereas PS made unique contribution only when predicting sentence reading (Table 2).

Table 2 Hierarchical regression results for the influence of PA and PS on reading

Research Question 2: How much of PS’s effects on reading fluency are shared with PA?

In a second step, we performed a commonality analysis to determine how much of the unique and common variance in sentence reading could be attributed to PS and PA. Table 3 presents the commonality coefficients and respective structure coefficients. The squared structure coefficients indicate each predictor’s contribution to the model in terms of percentage (Thompson, 2006).

Table 3 Communality Coefficients (partitioned R2) of the regression model for sentence reading

The results showed that PS’ unique contribution to predicting variance in sentence reading was 0.02 (Table 3). PA’s unique contribution to sentence reading was 0.07. The shared components (PS and PA together) explained 0.03 of the variance in sentence reading.

Discussion

The aim of this study was to examine the relationship of prosodic sensitivity and reading in German over and above phonological awareness. We compared a sentenced-based reading measure (3-min reading test) with measures based on lists of words and nonwords. The difference between these two types of reading measures is that the 3-min reading test — while not a reading comprehension test — requires semantic processing in order to evaluate the sentences, while reading from lists does not require semantic processing to the same extent (or not at all in the case of nonword lists). Therefore, we hypothesized that PS might be more strongly related to reading when a task requires (low-level) semantic processing, and that (metrical stress-related) PS would have a stronger effect on sentence reading than on reading words or nonwords from lists.

Overall, the influence of PA and PS on reading was low. Eight (word level) to 13 percent (sentence) of the variance in reading could be explained by PS and PA in our sample. Regarding research question 1, after controlling for PA, PS accounted for unique variance only in sentence reading. This is in accord with other studies finding that the effect of PS on reading varies across different reading tasks. In particular, nonword reading might not be associated strongly with PS (e.g., Whalley & Hansen, 2006; for a review, see also Wade-Woolley & Heggie, 2016).

For PA, our results are in line with findings showing that PA might be of minor importance in shallow orthographies like German due to the transparency of the grapheme-phoneme mapping system (Defior et al., 2012; Landerl & Wimmer, 2008; Landerl et al., 2019; Schabmann et al., 2009; Verhagen et al., 2008). PA is only important at the beginning of the reading process (Defior et al., 2012; Wimmer et al., 1991), i.e., during the first few months of reading instruction, in which children have to decode most words via the non-lexical route (Landerl & Wimmer, 2008). Later on, including the age from which we drew our sample, most German-speaking children have no problems with grapheme-phoneme mapping (Klicpera & Schabmann, 1993). Therefore, reading problems among German-speaking children are primarily related to slow reading fluency and a poorly automatized reading process.

The use of commonality analysis allowed us to gain insight into unique and common predictors explaining variance in sentence reading. Although the overall amount of explained variance was relatively small, as stated above, the relative effects of PS and PA on sentence reading were largely confirmed. PS was uniquely responsible for about 18% of the total explained variance in reading sentence, while PA was uniquely responsible for 60% of the total explained variance. The shared component of explained variance was around 22%. These results support the assumption that both PA and PS make unique contributions to sentence reading (Wade-Woolley & Heggie, 2016), despite being related constructs, and despite the fact that the effects in this study were considerably lower than those reported for English-speaking children. For instance, in Holliman et al. (2017), PA accounted for 7.3% and PS 3.8% of the total explained variance after controlling for other variables.

The pattern in our results is consistent with the idea presented at the beginning of this article that PS primarily influences reading by helping to activate a semantic word representation in the mental lexicon; i.e., facilitating quick access to the lexicon. However, our data show that this might not be only a matter of understanding sentences via their prosody, as a total of approximately 34% of the explained variance in word list reading can be attributed to PS (commonly: 21%; uniquely: 13%), even though the regression coefficient was not significant. Our explanation is that in the case of reading word lists, the semantic properties of the word are activated more or less automatically, even though this is not necessary for the task. For nonwords, no unique variance was explained by PS. This is not surprising because nonwords cannot be directly retrieved from the mental lexicon (Zaric & Nagler, 2021).

Consistent with our results is also the idea that PA might (in part) depend on PS and the effect of PS on reading might be mediated by PA. PS might help to activate sublexical elements in the reading system and therefore trigger the further development of PA. PS having an indirect effect on reading via PA might explain why in German, the total effect of PS on reading is not very pronounced, as the effect of PA is low after initial reading instruction in languages with regular orthographies (Georgiou et al., 2008; Landerl et al., 2019). For sentence reading, on the other hand, the effect is somewhat higher and significant due to the semantic components discussed above.

Limitations and further directions

Some limitations of our study should be reported. First, one might argue that the higher effect of PS on reading sentences is simply an artefact of the similarity of the tasks. In both the PS and the 3-min reading task, children had to read sentences; therefore, reading might have had an influence on the results. However, in the PS task, the instructor first read the sentence aloud to minimize the influence of reading on the piano task.

Furthermore, in both the PS and the 3-min reading task, children had to hold the sentences in memory for a short time; therefore, working memory might be a common source of variance. We did not test working memory. However, the CFT-R intelligence test is correlated with (verbal) working memory (Vock, 2004). Therefore, we ran an analogous regression analysis as reported above, but entered in a first model intelligence, then PS. In a second model we entered intelligence, then PA, and then PS (not reported in detail here). The results remained essentially the same: For sentence reading, PS had a small but statistically significant effect (2.2% explained variance) over and above intelligence and PA. This was not the case for word and nonword reading.

For further research, it would be interesting to examine whether PS is more important in multisyllabic words (as compared to the mono- and bisyllabic words that were the primary focus of the German word reading test). At least for English, there is some evidence that PS is more important in reading multisyllabic words (Enderby et al., 2021; Holliman et al., 2017).

Conclusion

Our conclusion is that in a highly transparent orthography with relatively clear rules for the assignment of word stress (as is the case in German), the effect of PS on reading is relatively small and only relevant if the retrieval of semantic information via PA facilitates quick access to words. Thus, the results are similar to those concerning the effect of PA on reading, which might be minor in shallow as compared to opaque orthographies.

In addition, our study highlights the importance of the respective measure used. If a reading measure is used that requires (or automatically induces) semantic processing, PS is more important. If only decoding is required, PS is less important. This goes somewhat beyond Wade-Woolley’s (2016) differentiation of reading measures into reading comprehension and decoding. In order to obtain a better understanding of the role of PS/PA, there is a need to investigate the implications of specific reading tests in more detail.