Most models of English word reading development emphasize the role of segmental phonological awareness, or the children’s ability to perceive and manipulate the segments of spoken words, such as phonemes and syllables (e.g., Ziegler & Goswami, 2005). This makes sense given widespread evidence of the role of segmental phonological awareness in English word reading development (e.g., Dickinson, McCabe, Anastasopoulosm, Peisner-Feinberg, & Poe 2003; Ehri, 2005; Perfetti, 2011; Seymour, 1999). And yet these models overlook the potential impact of prosodic/suprasegmental sensitivity, or the ability to perceive and manipulate phonetic distinctions realized through pitch, duration, or amplitude (Arciuli, Monaghan, & Seva, 2010). Recent studies show that children’s prosodic sensitivity accounts for substantial variance in gains in English word reading (e.g., Goswami et al., 2013; Holliman, Wood, & Sheehy, 2010). Intriguingly, there is also emerging evidence that prosodic sensitivity transfers to reading across languages; Mandarin/Cantonese tone sensitivity contributes unique variance to English word reading in Mandarin/Cantonese–English bilingual children (e.g., Wang, Perfetti, & Liu, 2005; Wang, Yang, & Cheng, 2009; Zhang & McBride-Chang, 2014). These findings are surprising because these two languages are prosodically distinctive, with lexical tone and lexical stress in Mandarin/Cantonese and English, respectively, as “the two principle methods by which languages use prosodic features to distinguish one word from another” (Cutler & Chen, 1997, p. 165). Our study is designed to test two prominent hypotheses as to how Cantonese tone sensitivity transfers to English word reading (Wang et al., 2005; Zhang & McBride-Chang, 2014).

Understanding the basis of tone transfer: two hypotheses

Our study builds on the findings of a pioneering study, in which Wang et al. (2005) demonstrated that 8-year-old Mandarin–English bilingual children’s Mandarin Chinese tone sensitivity accounted for 8 % of unique variance in English pseudoword reading, after controlling for English phoneme awareness. A subsequent study with children ages 6 and 7 years demonstrated that this relationship remained beyond the variance explained by English vocabulary, phoneme awareness, orthographic processing, and morphological awareness (Wang et al., 2009). More recently, Zhang and McBride-Chang (2014) demonstrated that there is a direct path from Cantonese tone sensitivity to English word reading in children 6 to 10 years of age, even after taking into account the effects of Cantonese segmental phonological awareness and general auditory processing. As such, the transfer of Mandarin/Cantonese tone sensitivity to English word reading appears to be robust, even after controlling for a diverse set of variables in studies of children ages 6 to 10 years.

There are two plausible theoretical explanations for these findings: the prosodic transfer hypothesis (Wang et al., 2005) and the segmental phonological awareness transfer hypothesis (Anderson & Wong, 2012; Tong, Tong, & McBride-Chang, 2015). The prosodic transfer hypothesis argues that “sensitivity to prosodic features of languages may be responsible for contribution of Chinese tone sensitivity to English word reading” (Wang et al., 2009, p. 308). This hypothesis makes the testable prediction that Cantonese tone sensitivity facilitates English stress sensitivity which, in turn, facilitates English word reading (see Model A in Fig. 1). In contrast, the segmental phonological awareness transfer hypothesis proposes that the transfer of Cantonese tone sensitivity to English word reading is mediated through the shared segmental processes involved in both Chinese tone and English phonemic processing (see Model B in Fig. 1). A testable prediction derived from this hypothesis is that Cantonese tone sensitivity contributes to Cantonese segmental phonological awareness (McBride-Chang et al., 2008), which, in turn, supports English segmental phonological awareness (e.g., McBride-Chang & Kail, 2002), a key contributor to English word reading (e.g., for a review, see Perfetti, 2011). Although there is no direct test of prosodic transfer and segmental transfer hypotheses to date, several lines of theoretical and empirical evidence make these compelling hypotheses particularly plausible and worthy of investigation.

Fig. 1
figure 1

Three assumed models. (a) A model testing prosody hypothesis; (b) a model testing segmental transfer hypothesis; (c) an integrated model that incorporates prosody hypothesis (depicted by bolded lines) and segmental transfer hypothesis (depicted by dashed lines). Chinese SegPA = Chinese segmental phonological awareness; English SegPA = English segmental phonological awareness

A theoretical foundation of prosodic transfer hypothesis lies in the prosodic bootstrapping hypothesis, which assumes that young children rely on prosodic cues to segment words from a continuous speech stream (Gleitman, Gleitman, Landau, & Wanner, 1988; Gleitman & Wanner, 1982; Morgan, 1986), and to bootstrap their way into grammatical, morphological, and semantic analyses of a language (Morgan, 1986). According to the prosodic bootstrapping hypothesis, sensitivity to the acoustic saliency of prosodic patterns, such as English lexical stress and Cantonese lexical tones, serves as the foundation for the development of language skills that are crucial to word reading. Also, the structural and functional similarities between Cantonese lexical tones and English lexical stress make the tone–stress association possible. As shown in Fig. 2, the phonetic realization of Cantonese lexical tones and English lexical stress both involve the modulation of voice pitch (i.e., fundamental frequency) (F0). Functionally, Cantonese lexical tones and English lexical stress can distinguish minimally contrastive forms (e.g., /fu2/虎 (tiger) versus /fu6/ 父 (father) for tones, and PERmit /ˈpɜr-mɪt/ “licence” versus perMIT /pər-ˈmɪt/ “to grant” for stress); however, the size of words that can be used to distinguished by tones are much larger in Cantonese than the ones that can be distinguished by stress in English. In addition, there is empirical evidence showing that tone language speakers tend to “tonalize” English, and they treated stress lexically and perceived stress as an essential component of phonological representation (Archibald, 1997; Chen, 2013; Luke, 2000; Lai, 2003). Thus, it seems very sensible to hypothesize the transfer of Cantonese lexical tone to English lexical stress and then to English word reading.

Fig. 2
figure 2

a. The normalized F0 trace of six Cantonese tones of the syllable /fu/ produced by a female Cantonese–English bilingual speaker. b . The normalized F0 trace of two English stress patterns produced by a female Cantonese–English bilingual speaker

Indeed, our hypothesis of tone–stress association is supported by a recent study showing that Cantonese–English bilingual children used shared common cues (i.e., F0) to distinguish Cantonese lexical tones and English lexical stress, and their performance on Cantonese tone perception and English stress perception was significantly correlated (r = .73) (Tong et al. 2015). Choi, Tong, and Cain (2016) obtained a similar finding that sensitivity to Cantonese tones predicted Cantonese–English bilingual children’s sensitivity to English lexical stress. Similarly, there is also compelling evidence for an association between English stress sensitivity and English word reading. For example, Whalley and Hansen (2006) found that 8- to 10-year-old English children’s sensitivity to stress patterns at the phrase level predicted concurrent English word reading. A more recent study confirmed this relationship in 6-year-old children (Holliman et al., 2014). It also seems that early stress sensitivity is related to later reading success; Holliman et al. (2010) found that 5- to 8-year-old children’s English stress sensitivity was uniquely related to English word reading one year later after taking segmental phonological awareness, age, and vocabulary into account (see also Goswami et al., 2013, with dyslexic children).

Together, both theoretical and empirical research lend the support to our first testable hypothesis that the transfer of Cantonese tone sensitivity to English word reading may be, at least in part, due to the transfer of prosodic sensitivity across languages. Specifically, Cantonese tone sensitivity might facilitate English stress sensitivity which, in turn, supports English word reading, as depicted in Model A in Fig. 1. This is the first model that we will test in this study.

The segmental transfer hypothesis also derives from both theoretical and empirical evidence. According to a recent model of speech perception (i.e., TTRACE; Tong, McBride, & Burnham, 2014), tonemes and phonemes are essential structural elements of phonological representation of a tonal syllable, and they represent integratedly. The lexical access of a tonal syllable involves the mutual activation of three phonological units (i.e., syllable onset, rime, and tones). This is intuitively sensible given that tone cannot stand alone without its segmental bearing (i.e., vowel). This also has been supported by a recent study showing the impacts of varying segmental information on Cantonese tone perception (Tong et al., 2014). Thus, it is very likely that Cantonese tone perception contributes to Cantonese segmental phonological awareness, and then contributes to English word reading through English segmental phonological awareness.

Our hypothesis regarding the connection between suprasegmental and segmental phonological awareness has been reported in recent reviews and empirical studies. For English, several prominent reviews suggest that suprasegmental sensitivity serves as a basis for the development of segmental phonological awareness for both monolingual English speakers and English language learners (e.g., Kuhl, 2004; Wade-Woolley & Wood, 2006). For Chinese, recent studies suggest a similar relationship within Mandarin Chinese for Mandarin–English bilingual children. There is a moderate correlation between Mandarin lexical tone sensitivity and Mandarin segmental phonological awareness (e.g., Wang et al., 2005; Wang et al., 2009; see also Tong et al., 2015, with Cantonese children with reading difficulties).

On the other front, there is a concurrent association between Cantonese and English segmental phonological awareness and English word reading (e.g., Bialystok, McBride-Chang, & Luk, 2005; McBride-Chang, Bialystok, Chong, & Li, 2004). Cantonese segmental phonological awareness has been found to be related to both English segmental phonological awareness (Wang et al., 2005) and to English word reading (McBride-Chang & Kail, 2002; McBride-Chang et al., 2008) for Cantonese–English bilingual children. For example, McBride-Chang and colleagues (2004) showed that Cantonese segmental phonological awareness contributed to English segmental phonological awareness and that English segmental phonological awareness contributed to English word reading in Cantonese–English bilingual children.

Taken together, these recent findings support the hypothesis that the transfer of Cantonese tone sensitivity across languages to English word reading may be due to, at least in part, the transfer of segmental phonological awareness across languages to reading among Cantonese–English bilingual children. That is, Cantonese tone sensitivity is associated with Cantonese segmental phonological awareness, and Cantonese segmental phonological awareness facilitates English stress sensitivity which, in turn, supports English word reading, as depicted in Model B in Fig. 1. This is the second model that we test in this study.

As we test these two hypotheses, we are cognizant that they are not mutually exclusive (see Model C in Fig. 1). Specifically, Cantonese tone sensitivity might predict English stress sensitivity (a key component of the prosodic transfer hypothesis), and English stress sensitivity might, in turn, be related to English word reading. At the same time, Cantonese tone sensitivity might be related to Cantonese segmental phonological awareness that might, in turn, be predictive of English segmental phonological awareness and, hence, to English word reading. As such, these two pathways could coexist. This possibility is supported by suggestions that word reading is a complex process involving both segmental and suprasegmental levels of phonological mappings with orthography (e.g., Arciuli et al., 2010; Goswami et al., 2013). Given this speculation, we test a model that integrates these two hypotheses; this has the additional advantage of evaluating the relative importance of one possible route from Cantonese tone sensitivity to English word reading while taking into account the variance of the other route.

The present study: testing three models to explain phonological transfer

We test prosodic transfer hypothesis and segmental phonological awareness transfer hypothesis separately as well as together in three models with latent variable structural equation. This approach enables us to evaluate the best fitting model and also compare the standardized estimates of path weights, which indicate the relative unique contribution of each predictor factor to the outcome factor (Bentler, 2006). This approach also allows us to evaluate the relative effect of two possible pathways in explaining the transfer of Cantonese tone sensitivity to English word reading. We also compare the integrated model with two models, each established separately for prosodic transfer and segmental transfer hypothesis.

We conducted this study with Cantonese–English bilingual children who are learning to read Chinese and English in parallel in Hong Kong. We tested the children at the age of 7–8 years and again at the age of 8–9 years. Measures of Cantonese tone sensitivity, Cantonese segmental phonological awareness, and nonverbal ability were administered at the age of 7–8 years because empirical evidence indicates that 7–8-year-old Cantonese children are able to identify six different lexical tones (e.g., Tong et al., 2014). English measures were administered at the age of 8–9 years because Cantonese–English bilingual children showed stress sensitivity at this age (e.g., Choi, Tong, & Cain, 2016; Tong et al., 2015). We measured our outcome variable at the age of 8–9 years to capture the progress in English word reading skill up to this point.

Method

Participants

A total of 123 second-grade children (59 girls) from four Hong Kong elementary schools participated in this 1-year longitudinal study. The mean age for the first and second testing points was 7.75 years (SD = 13 months), and 8.75 years (SD = 15 months), respectively. According to a parent report, all children were native Cantonese speakers and learned English as a second language, beginning on average at 3.5 years of age (SD = 1.7 years). Parents reported the children’s frequency of use of English at home as 1.93 on a 5-point Likert scale ranging from least to most frequent. All children were enrolled in Cantonese–English bilingual instruction at school from 6 years of age. In this instruction, most classes were taught in Cantonese, with the exception of eight English classes (40 minutes each) per week. All children were from families in which average monthly family income ranged from HKD 20,000 to 30,000, which is considered to be a medium family income according to the Hong Kong Census and Statistics Department (2011). All children passed a pure-tone hearing screen (250, 500, 1000, 2000, 4000 Hz at 20 dbHL).

Measures

Measures of nonverbal ability, Cantonese tone sensitivity, and Cantonese segmental phonological awareness were administered at Time 1, while English lexical stress sensitivity, English segmental phonological awareness, and English word reading were administered at Time 2.

Nonverbal ability

The Block Design subtest of the Wechsler Abbreviated Scale of Intelligence (Wechsler, 1999) was administered to all children in Cantonese and assessed nonverbal ability. Children were asked to assemble blocks to match a design. The number of blocks varied from two to nine, depending on the difficulty level. There were 13 items; testing was halted following two consecutive incorrect responses.

Cantonese tone sensitivity tasks

Two tasks successfully used in previous research (e.g., McBride-Chang et al., 2008; Tong & McBride-Chang, 2010) were selected to assess children’s ability to distinguish Cantonese tones: tone identification and tone discrimination.

The tone identification task was comprised of 48 test items and three practice items. The 48 test items were created with two Cantonese syllables (/ji/ and /fu/). Combining each syllable with one of six tones resulted in 12 different words: /ji1/衣 (clothing), /ji2/ 椅 (chair), /ji3/ 意 (the first character of spaghetti in Cantonese), /ji4/兒 (son), /ji5/耳 (ear), and /ji6/ 二 (two); /fu1/ 膚(skin), /fu2/ 虎 (tiger), /fu3/ 褲(trousers), /fu4/ 符 (symbol), /fu5/ 婦 (woman), and /fu6/ 父 (father). There were eight minimal pair tonal contrasts: T3–T6, T2–T5, T1–T3, T1–T6, T5–T6, T4–T6, T4–T5, and T1–T2 carried by the syllables /ji/ and /fu/, respectively. Each tone contrast was repeated three times, for a total of 48 items (2 syllables /ji/ and /fu/ × 8 tone contrasts × 3 repetitions). In the testing, children were auditorily presented with a target tonal syllable via headphones, along with two pictures presenting a tone contrast that differed in tone only—for example, /ji1/衣 (clothing) vs. /ji2/椅 (chair). The children were asked to identify the picture that matched the target tonal syllable—for example, /ji2/椅 (chair). The maximum possible score on this task was 48.

The tone discrimination task is a 24-item oddity task that has been successfully used to assess tone sensitivity in same-age children in previous research (Tong & McBride-Chang, 2010; Tong et al., 2014). There are 24 test items. The task was created on the basis of eight minimal tone contrasts: T3-T6, T2–T5, T1–T3, T1–T6, T5–T6, T4–T6, T4–T5, and T1–T2. We used three repetitions for each contrast. In each test item, children were presented with four tonal syllables. Three of these syllables carried the same tone, and one a minimally different tone. In testing, the four prerecorded tonal syllables were presented to children auditorily, such as /gɐu2/ 狗 (dog), /tsɐu2/ 酒 (wine), /lɐu5/ 柳 (willow), and /hɐu2/ 口 (mouth), along with a picture to illustrate each. Children were first asked to name the picture to ensure that they knew each word represented by the picture. They were then asked to select the picture of the word that differed in tone from the other three; the correct answer for this example is /lɐu5/ 柳 (willow). The four words for each item are commonly used words that represent objects encountered in daily life. The maximum possible score on this task was 24.

Cantonese segmental phonological awareness tasks

To assess children’s segmental phonological awareness, we used a syllable deletion task and a phoneme-onset deletion task that have been successfully used with Cantonese–English bilingual children of the same age (McBride-Chang et al., 2004). These were created in spirit and format of the subtests of the same names from the Comprehensive Test of Phonological Processing (CTOPP; Wagner, Torgesen, & Rashotte, 1999).

In the syllable deletion task, children were asked to delete one syllable from a three-syllable sequence. For example, the child was asked to “say /kʰy1/ /pʰɐk1/ /jɐŋ4/. Now say /kʰy1/ /pʰɐk1/ /jɐŋ4/ without /pʰɐk1/,” with the correct answer being /kʰy1/ /jɐŋ4/. There were 29 items (15 real words and 14 nonwords) in this task.

In the phoneme-onset deletion task, each child was asked to produce a spoken utterance without the initial sound, for example, “say /tʰa:p3/ (which translates as tower). Now say /tʰa:p3/ without /t/,” with the correct answer: /a:p3/ (which translates as duck). There were 22 items, with 10 real words and 12 nonwords in this task. After deleting the initial sound for each real-word item, the remaining word was also a real word. Similarly, for the nonwords, the word left after deleting the initial sound was also a nonword.

We used a basal-ceiling testing procedure on the basis of normative performance from the data of a large number of children in Hong Kong who have completed these tasks from kindergarten to the fifth grade (McBride-Chang et al., 2008). In both tasks, the basal rule was that testing continued until children made five correct responses for a given block. The ceiling rule was that testing stopped following six consecutive errors in syllable deletion and four consecutive errors in phoneme onset deletion.

English stress sensitivity tasks

Children’s sensitivity to English stress patterning at the word level was assessed with revised stress mispronunciation task adapted from Holliman et al. (2010). This task consisted of 19 bisyllabic words, with one practice item and 18 test items.

To ensure that these words would be known to the children, we chose commonly used words from the English textbook for students in Grades 1–3 and the Wordlists for the Primary English Language Curriculum for Hong Kong Cantonese–English bilingual children (HKSAR Education Bureau, 2009). Following Holliman et al. (2010), all items were administered in both baseline and experimental conditions. The baseline condition served as a control for the effect of vocabulary knowledge for the items in the task, which was particularly important given the English-as-a-second-language status of the sample. In the baseline session, each trial consisted of a prerecorded bisyllable word properly spoken (e.g., /'spaɪdə/) and presented with four colorful pictures denoting common words that shared the same initial letter or sound (e.g., spider, swinging, snowman, sandwich). One was the target while the other three were distracters that shared the same initial phoneme as the target and matched it in frequency (p = .969). Children were then asked to identify which of the four pictures best represented the word they heard.

In the experimental condition, children were presented auditorily with a mispronunciation of the target words where the stress was reversed and manipulated. For example, for the word /'spaɪdə/, the sound /spə'dɜ/ changes both lexical stress (i.e., from strong-weak to weak-strong stress patterning) and the quality of two vowels (i.e., the reduction of the first vowel and the full articulation of the second vowel). Children were instructed that they would listen to some words that were not properly spoken, and they were then asked to identify the referent for the mispronunciation by pointing to one of four visually presented pictures. One score was credited for each correct response for both baseline and experimental conditions. The maximum possible score of this task was 18. The baseline score was used to check whether children knew the real words of these items.

To ensure that children did not remember the items from the baseline condition, the baseline and experimental conditions were administered at least 1 month apart, and the order was counterbalanced.

English segmental phonological awareness tasks

The Elision and Blending Words subtests of the CTOPP (Wagner et al., 1999) were used to measure children’s segmental phonological awareness in English. In the Elision subtest, the child was asked to repeat a word and then say the word that would be left after taking away a specific syllable (e.g., Say spider. Now say spider without der; correct answer: spy) or phoneme (e.g., Say farm. Now say farm without /f/; correct answer: arm). The Blending Word subtest asked the children to combine two sounds into a word (e.g., “What word do the sounds /s/ and /ʌn/ make?; correct answer: sun). There were 20 items for each subtest, with a maximum possible score of 20 for each subtest. Testing stopped following three consecutive errors for each task.

English word reading

The English Word Reading task and Word Identification subtest of the Woodcock Reading Mastery Test–Revised (WRMT-R; Woodcock, 1998) were used to assess children’s ability to accurately pronounce single words when presented in print. The English Word Reading task consisted of 60 items, and this task has been used successfully to assess Cantonese–English children’s word reading ability in previous studies (e.g., Choi et al., 2016). The Word Identification task was administered according to the manual’s instructions. As such, testing discontinued following four consecutive errors.

Procedures

Children were tested individually in a quiet room in their schools. Each child was assessed in a single session at Time 1 that included nonverbal abilities, Cantonese tone sensitivity, and Cantonese segmental phonological awareness, and in two separate sessions at Time 2 that included the baseline condition of English stress sensitivity in the first session, and English stress sensitivity, English segmental phonological awareness, and English word reading in the second session. The testing was conducted by proficient Cantonese–English bilingual research assistants. Cantonese instruction was provided for Cantonese tasks while English instruction was given for English tasks.

Results

The descriptive statistics and correlations among all variables are presented in Table 1. We see moderate correlations between Cantonese tone sensitivity measures and English word reading, and between English stress sensitivity and English word reading. Moreover, Cantonese tone sensitivity measures were significantly correlated with English stress sensitivity measures, as well as with Cantonese syllable deletion and English segmental phonological awareness measures. The overall pattern of the correlations suggests that the measures of Cantonese tone sensitivity, English stress sensitivity, segmental phonological awareness in both Cantonese and English, and English word reading are all closely related, and that they share common variance for the structural equation modeling analyses.

Table 1 Means, standard deviations, reliabilities, and correlations of all variables

Our structural equation modeling analyses were specifically designed to evaluate the two hypotheses (i.e., prosodic transfer hypothesis and segmental transfer hypothesis) as to the prediction of Cantonese tone sensitivity to English word reading ability (see Fig. 1). To do so, the structure and the interrelationships among the latent variables as shown in Fig. 1 were estimated and compared in three different models. These three models reflect the prosodic transfer route model (Model A), the segmental transfer route model (Model B), and the integrated model incorporating both the prosodic transfer and segmental transfer routes (Model C).

According to Bollen and Long (1993), the measurement model and structural model should both be emphasized in evaluating the theoretical significance and explanatory power of any model when inferring structural relations. Thus, we conducted latent variable structural equation modeling (SEM) of the covariance matrix with the LISREL 8.80 program (Joreskog & Sorbom, 2007). This powerful approach yields a precise estimation of the structural relationships among latent variables and the relationships between observed variables and latent variables. A number of goodness-of-fit indices were used to evaluate goodness of fit of the data to the model: chi-square, comparative fit index (CFI), nonnormed fit index (NNFI), and root-mean square error of approximation (RMSEA). A good fit model should have CFI, NFI, and NNFI values above .95 and a RMSEA that is close to or less than .06 (Hu & Bentler, 1999).

In the measurement model, as depicted in Fig. 3, each latent factor was modeled on the basis of the covariation of two indicators for the majority of our variables of interest. For example, Cantonese tone identification and Cantonese tone discrimination were loaded on the latent factor underlying Cantonese tone sensitivity. Note that there was only one measure of English stress sensitivity (i.e., revised stress mispronunciation variable) and so English stress sensitivity was modeled with this task only. Nonverbal ability was included as a control variable in the model. The results of the test of the measurement model showed that our measurement model fit the data well, χ 2(22, N = 123) = 40.29, CFI = .96, NNFI =.92, RMSEA = .08. The significant standardized estimates of factor loadings of each measure on its associated latent variable, as shown in Fig. 3, further confirmed that the measurement model was very strong.

Fig. 3
figure 3

Stuctural equation analysis showing the prosody route and segmental transfer route of the trasnfer of Chinese tone sensivity to English word reading for Chinese–English bilingual children. The oval shapes represent the latent variables of Chinese tone sensitivity, Chinese segmental phonological awareness (Chinese SegPA), English stress sensitivity, English segmental phonological awareness (English SegPA), English word reading, and the control variable of nonverbal IQ. The rectangles represent observed variables of tone identification (Tone ID), tone discrimination (Tone DI), revised mispronounced stress sensitivity (MSS), syllable deletion task (Syllable) and phoneme-onset deletion task (Phoneme), elision (Elision), blending words (Blending), Block design, English word identification (Word ID), English Word reading test (EWRT). Note. a = fixed at 1 because there is only one indicator. *p < .05. ***p < .001

Testing the basis for tone transfer: model comparisons

We posited three possible models for the transfer from Cantonese tone sensitivity to English word reading, each reflected in a model (A, B, and C, depicted in Fig. 1). For the prosodic transfer model, we posited that Cantonese tone sensitivity would longitudinally predict English stress sensitivity, which would predict English word reading. This model was tested by controlling for segmental phonological awareness in Cantonese and English (see Model A). Our analyses showed that the goodness-of-fit indices for Model A were χ 2(26, N = 123) = 44.68, CFI = .96, NNFI = .94, RMSEA = .08.

For the segmental transfer model, we posited that Cantonese tone sensitivity would facilitate Cantonese segmental phonological awareness, which would longitudinally contribute to English segmental phonological awareness and, in turn, English word reading. This model was tested by controlling suprasegmental phonological awareness in English (i.e., English stress sensitivity; see Model B).

Our analyses showed that the goodness-of-fit indices for Model B were χ 2(29, N = 123) = 46.59, CFI = .96, NNFI = .94, RMSEA = .07.

Finally, we tested the integrated model by evaluating the contribution of prosodic transfer pathway and segmental transfer pathway in an integrated model (see Model C). Our analyses showed that the goodness-of-fit indices for Model C were χ 2(30, N = 123) = 48.49, CFI = .96, NNFI = .95, RMSEA = .07.

Given that these three models were not nested in each other, we employed the AIC (i.e., Akaike information criterion) approach commonly used to select the best model among nonnested models (Akaike, 1973, 1974, 1983; Burnham & Anderson, 2002; Kail & Ferrer, 2007; Utsumi, 2011; Wagenmakers & Farrell, 2004). Under the AIC model selection approach, several values are computed for the evaluation and comparison of the model fits. One was the corrected version of AIC for small sample size (AIC c ) (Hurich & Tsai, 1989; Sugiura, 1978). The smaller AIC c value indicated the better model fit. The second value was Δ i (AIC c ) that directly measured the difference in AIC between a given Model i and the model with the minimum AIC c (Akaike, 1978; Burnham & Anderson, 2002; Wagenmakers & Farrell, 2004). The third value was Akaike weight w i (AIC) or the model probability that measured the chance of a Model i being the best fitting model among all candidates given the data (Burnham, Anderson, & Huyvaert, 2011). The fourth value was the evidence ratio, which provided the probability ratio of Model i being better than Model j. We provide the computation formulas of these four values in Appendix 1. These four values for our three models (Models A, B and C, respectively) are shown in Table 2.

Table 2 Tests of integrated model, singe prosodic transfer route model, and single segmental transfer route model

As shown in Table 2, Model C had the smallest AIC (=98.49) and AIC c (=11.89) as compared with Model A (AIC = 102.68, AIC c = 121.39) and Model B (AIC = 98.59, AIC c = 113.22), indicating that Model C was the best fitting model. This was also confirmed by the results of Akaike weight. According to the Akaike weight, the probability of Model C being the relative best model among the three models given the data was .656; it was 1.94 times more likely to be the best model than the next-best model of Model B (probabilities of Model A and Model B were .006 and .338, respectively). Finally, the evidence ratio of Model C versus Model A reached 115.45, suggesting that the probability of Model C being the best fitting model was 115.45 times that of Model A. These results consistently point to Model C, the integrated model, as better fitting than the two single pathway models.

The integrated model provides the good fit for the current data, χ 2(30, N = 123) = 48.49, CFI = .96, NNFI = .95, RMSEA = .07. The model accounted for 71 % of variance of Cantonese children’s English word reading. Figure 3 shows the standardized estimates of path weights. With relevance to the primary research questions, we evaluated the significance of the structural paths.Footnote 1 We did so with the standardized estimate of the path weight, and the z value associated with their unstandardized estimates (when z > 1.96, the path was statistically significant). The standardized estimate of the path weight indicates the relative unique contribution of the outcome factor that can be accounted for by the predictor factor (Byrne, 2006).

In the integrated model, there were significant paths from Cantonese tone sensitivity to English stress sensitivity (z = 2.48, p < .05) and from English stress sensitivity to English word reading (z = 6.82, p < .001). These paths indicate a significant indirect effect of Cantonese tone sensitivity on English word reading via English stress sensitivity. This path is a key piece of evidence in support of the prosodic transfer hypothesis.

Meanwhile, the path from Cantonese tone sensitivity to Cantonese segmental phonological awareness was significant (z = 4.14, p < .001). There were significant paths from Cantonese segmental phonological awareness to English segmental phonological awareness (z = 3.88, p < .001) and from English segmental phonological awareness to English word reading (z = 5.12, p < .001). These results indicate that paths key to the segmental transfer hypothesis are also significant.

Collectively, the integrated model showed that the longitudinal prediction from Cantonese tone sensitivity to English word reading passed through two parallel routes of transfer, namely, the prosodic transfer and segmental transfer.

Discussion

This study set out to examine the pathways through which Cantonese tone sensitivity is related to English word reading in Cantonese–English bilingual children. Our study was designed to clarify the mechanism by which Mandarin/Cantonese tone sensitivity transfers to English word reading (Wang et al., 2005; Wang et al., 2008; Zhang & McBride-Chang, 2014). To do so, we included comparable measures both segmental and suprasegmental sensitivity in both English and Cantonese in a longitudinal study of Cantonese–English bilingual children. We tested an integrated model incorporating prosodic transfer hypothesis and segmental transfer hypothesis against two models that separately represented the prosodic transfer and segmental transfer hypotheses, respectively. Structural equation modeling results revealed that the integrated model fit the data best. Specifically, Cantonese tone sensitivity predicted English stress sensitivity, which, in turn, contributed to English word reading. In parallel, the path between Cantonese tone sensitivity and Cantonese segmental phonological awareness was significant. Cantonese segmental phonological awareness predicted English segmental phonological awareness and in turn contributed to English word reading. In each case, these paths take into account the other set of phonological variables, as well as nonverbal abilities. Taken together, our results indicate that the transfer of Cantonese tone sensitivity to English word reading is mediated through both prosodic and segmental transfer pathways.

Bilingual’s tone transfer to English word reading: an integrated model

Our results confirm two competing speculations as to the basis of the transfer of Mandarin/Cantonese tone sensitivity to English word reading and further suggest that the transfer of Chinese lexical tone sensitivity to English word reading is operated as a joint contribution of segmental and suprasegmental connections between Chinese and English. As we noted in the introduction, prior studies have provided evidence for the components of each of these pathways (e.g., Choi et al., 2016; Holliman et al., 2010; McBride-Chang et al., 2014; Tong et al., 2015; Wade-Woolley & Wood, 2006; Whalley & Hansen, 2006). In our study, we have provided a comprehensive test of the significance of each of these paths in light of the others. Certainly, the existence of these two paths can be interpreted in light of their original speculations; that these are two routes for transfer to occur, ones that are clearly not mutually exclusive, as demonstrated in this study.

We also think that the coexistence of these two pathways between Cantonese tone sensitivity and English word reading reflects the multilevel interactive processes involved in bilingual phonological processing, with mutual influence across segmental and suprasegmental pathways and across languages. In terms of segmental phonological awareness, for example, Wang et al. (2005) showed that Mandarin Chinese segmental phonological awareness was associated with English segmental phonological awareness in Mandarin–English bilingual children. In terms of suprasegmental phonological awareness, Choi et al. (2016) revealed evidence that a moderate association between Cantonese tone sensitivity and English stress sensitivity was observed in Cantonese–English 6- to 8-year-old bilingual children. As we explore these interconnections, we note that several studies, including our own, have shown that, despite the associations across tasks and languages, there are separate contributions to English word reading (e.g., Wang, Anderson, Cheng, Park, & Thomson, 2008). Thus, the integrated model explicitly acknowledges the existence of segmental and suprasegmental pathways, while it also emphasizes the shared and unique contribution of these two pathways to English word reading.

In addition, our integrated model suggests that both prosodic skill and phonemic processing skills are both responsible for the transfer of Cantonese tone sensitivity to English word reading. This further implies that tone processing operates across both segmental and suprasegmental levels, which is consistent with the TTRACE model postulating the interdependence between tonemes and phonemes (Tong et al., 2014). The tone–stress association can be partly because Cantonese–English bilingual speakers tend to tonalize English, and they perceive English lexical stress as high tones (Chan, 2007; Lai, 2003; Luke, 2000). Our integrated model provides an explanation why Chinese tone processing skills can be used to distinguish native English-speaking children with and without reading difficulties (Anderson & Wong, 2002). Our integrated model, which is an integration between segmental and suprasegmental levels, further suggests that tone processing skills may involve much more than prosodic or phonemic information.

Theoretical implications for models of English word reading

Our results suggest that suprasegmental phonological awareness has a role in English word reading, just as segmental phonological awareness does. Clearly, models of English word reading have long been focused on the role of segmental phonology. These models paid little attention to suprasegmental phonology or prosody (e.g., Perfetti, Liu, & Tan, 2005; Seidenberg & McClelland, 1989; Ziegler & Goswami, 2005). Our findings, along with data from English monolingual readers (e.g., Holliman et al., 2010; Wang et al., 2005; Wood et al. 2009), suggest the necessity of adding prosody as one structural component to these models in order to adequately account for the data showing the role of prosody in English word reading.

Furthermore, our findings are informative to the current bilingual word recognition model (i.e., the bilingual interactive activation model; e.g., BIA+; Dijkstra & van Heuven, 2002). The BIA+ model assumes a nonselective and an integrated mental lexicon for bilingual word recognition. According to the BIA+ model, the feature of visual input, such as letter position in English, first activates letters containing the same features, then leads to the activation of words having letters in the same position across two languages, and, finally, language nodes become activated due to the activation of word nodes. The language nodes inhibited words activated in other languages. In the BIA+ model, a nonselective word recognition process in two languages becomes selective through its filter-language node.

Our findings provide additional support to the BIA+ model in which bilingual word recognition involves the interaction between L1 and L2 systems even at the suprasegmental level. Moreover, the coexistence of prosodic transfer and segmental transfer routes suggests that interaction between L1 and L2 occur across different phonological levels. However, the cross-boundary (Cantonese lexical tone and English lexical stress) facilitation and the cross-level integration (suprasegmental segmental and segmental phonological awareness) found in the present study may also inform the BIA+ model that there is no clear-cut nonselective or selective process in bilingual word recognition, and that the interaction between L1 and L2 can occur at every level. In particular, the BIA+ model is developed on the basis of alphabetic language pairs that both involve the use of letters or letter clusters as orthographic units. This is strikingly different from nonalphabetic–alphabetic language pairs, such as Chinese–English. The visual features of word units between Chinese and English are very different. Thus, it is possible that selective processing may occur when visual input unfolds (e.g., Chinese use of square-shaped characters while English uses letters). Thus, there is a need for the BIA+ model to revisit the nonselective process hypothesis.

Also, like other English monolingual word recognition models, the BIA+ model assumes that “word recognition is the retrieval of orthographic representation from mental lexicon corresponding to the input letter string” (Dijkstra & Heuven, 2002, p. 176), and it does not specify how some orthographically unmarked features, such as Cantonese lexical tone and English lexical stress, become activated in the BIA+ model. In addition, the BIA+ model is developed on the basis of adult bilingual language users, and it also remains unclear how developmental factors affect the process of bilingual word recognition. Thus, it is important for BIA+ to revisit its nonselective hypothesis and further clarify the role of lexical prosody in bilingual word recognition.

Our findings also advance our understanding of language-general skills involved in word reading development. Segmental phonological awareness has been considered to be one such language-general skill, based on the evidence that segmental phonological awareness transfers to reading across diverse language pairings, such as French and English (e.g., Comeau, Cormier, Grandmaison, & Lacroix, 1999), Spanish and English (e.g., Durgunoğlu, Nagy, & Hancin-Bhatt, 1993), and Mandarin Chinese and English (Gottardo, Yan, Siegel, & Wade-Woolley, 2001). These findings of transfer to reading across languages, especially those that are quite distant, lead to the conclusion that “phonological awareness is a general (not language-specific) cognitive mechanism” (Comeau et al., 1999, p. 39). Our findings of transfer of suprasegmental phonological awareness to reading across languages represented by entirely different orthographic systems supports the nomination of suprasegmental phonological awareness as another such language-general skill in reading development. Given the wide differences in the manifestation of suprasegmental information in English and Cantonese/Mandarin, the transfer of suprasegmental phonological awareness is likely to operate at the level of a language-general skill; it seems that there is little, if any, specific knowledge common to the function of distinguishing lexical items that are fulfilled by lexical tone or lexical stress. In particular, the scope of the use of lexical prosody to distinguish word meaning is very different in Cantonese and English: Cantonese lexical tones distinguish a large set of words, such as /seoi55/ 需 (need), /seoi25/水, /seoi33/碎 (break), /seoi21/ 谁 (who), /seoi23/绪 (the end of a thread), /seoi22/睡 (to sleep); whereas there are a small number of words, such as CONtent \'kɒntɛnt\ (components) versus conTENT \ kənˈtɛnt \ (satisfied), that can be distinguished by lexical stress. However, a recent perceptual study has shown that Cantonese–English bilingual children’s perception of Cantonese lexical tones and English lexical stress rely on one common acoustic cue, i.e. F0 (Tong et al. 2015). Thus, it seems very likely that the Cantonese–English bilingual children’s perceptual system might become more greatly attuned to prosodic distinctions that are common to both languages, such as pitch (f0) in Cantonese/Mandarin and English, thereby facilitating the acquisition of prosodic distinctions in the second language, which could, in turn, support second language reading development.

Limitations and future directions

As we consider the implications of our study, we also need to weigh these in light of its limitations. First, we chose our measure of suprasegmental sensitivity in English on the basis of prior work; this task has been successfully used to assess the same age of children in multiple previous studies (e.g., Holliman et al., 2010; Holliman et al., 2012). That said, it is a single measure and we included only bisyllabic words with strong-weak (trochaic) patterns (e.g., SPIder /'spaɪdə/). We did so on the basis of pilot testing of Cantonese–English bilingual readers, which suggested that they had floor effects with weak-strong (iambic) bisyllabic words (e.g., perMIT/ pə'mit/). This restricted range of bisyllabic items limits the representativeness of measurement of suprasegmental sensitivity in English. Future research may consider using multiple measures of English stress sensitivity, with each including both trochaic and iambic stress patterning items. Second, despite our inclusion of a wide set of controls, we did not test for general auditory processing or working memory. Choi et al., (2016) has recently found that the association between Chinese lexical tone and English stress was independent of general auditory processing and working memory (but see Wang et al., 2008). Furthermore, Wang and colleague’s (2009) results suggested that the general auditory processing account alone cannot adequately explain the transfer of Chinese tone sensitivity to English real-word reading. Nevertheless, it would be worthwhile for future research to include additional measures in evaluating the integrated model in Chinese–English bilingual readers.

In addition, it should be noted that this study examined the route of tone transfer to English word reading in Cantonese–English bilingual children whose native language has the most complex tonal system. Unlike Cantonese, Mandarin has four lexical tones (i.e., high level, high rising, low dipping, and high falling), and the degree of tonal crowding in tone space is lower than Mandarin (Barry & Blamey, 2004). Also, previous studies with Mandarin–English bilingual children also reported the association between Mandarin Chinese tone sensitivity and English word reading. Thus, future research may set out to compare the extent to which the integrated model proposed in the present study accounts for the tonal transfer in Mandarin–English and Cantonese–English bilingual children. Such a study may shed light on the issue of whether tonal system difference impacts the degree of transfer of lexical tone to English word reading.

Finally, although our study provides evidence showing that the transfer of Cantonese lexical tone to English word reading is in part due to the lexical prosody transfer between Cantonese and English, the nature of the design of this study is correlational, and we cannot make any causal inference about the mechanism that drives the transfer between Cantonese lexical tone and English lexical stress. In particular, Cantonese measures and English measures were administered 1 year apart, and a more stringent longitudinal design is needed to include both Cantonese and English measures across different times. Thus, it would be valuable to explore the cause underlying lexical prosody transfer in Cantonese–English bilingual children in future research.

To conclude, this study tested an integrated model hypothesizing parallel segmental and suprasegmental pathways underlying the transfer of Cantonese tone sensitivity to English word reading against two competing single (i.e., suprasegmental and segmental) pathway models. Structure equation modeling (SEM) was used to test these three models with a group of 7- to 8-year-old Cantonese–English bilingual readers. It was revealed that the integrated model was the best fitting model, indicating that the relation between Cantonese tone sensitivity and English word reading was mediated through two parallel pathways (i.e., suprasegmental and segmental pathways). These results add to the body of work showing that suprasegmental and segmental phonological processing both play a role in English word reading for Cantonese–English bilingual readers. The transfer of Cantonese tone sensitivity to English word reading reflects a multilevel interactive phonological processing in bilinguals.