1 Background

The tonal inventory of Middle Chinese (600-900) is standardly reconstructed (E. Pulleyblank 1984 a.o.) with four categories: level (píng), rise (shǎng), fall (), and entering (). The entering tone is restricted to ‘checked’ syllables with a coda stop. In its evolution from Middle Chinese (MC), the Wu variety split the pitch space into upper (yīn) and lower (yáng) regions based on the laryngeal category of the onset consonant of the syllable bearing the tone. Voiced consonants (especially obstruents) were associated with the lower register and voiceless consonants (either aspirated or unaspirated) with the upper register; sonorants could occur in either register. But in many varieties the voiced obstruents have merged (possibly through an intermediate stage of breathy voicing, Pulleyblank 1978), primarily with voiceless unaspirated obstruents, thereby phonologizing the register distinction. As seen in Table 1, some contemporary Wu dialects such as Wǔyì preserve the eight-way shape plus register distinction; but others have neutralized one or more of the contrasts, especially in the lower register. One of our goals in this paper is to determine where the Lóngyóu tonal inventory fits in this typology. Middle Chinese coda stops have merged to a glottal stop in Wu dialects.

Table 1 Tonal Inventories of Several Wu Dialects

The four-way distinction in tonal shape inherited from Middle Chinese suggests that modern Sinitic tones can be characterized as two endpoints in the five-point pitch scale introduced by Chao (1930). In this transcription system, 5 and 1 indicate the highest and lowest pitch points, respectively. Yet many Chinese languages have developed more complex tones with fall-rise (concave) and rise-fall (convex) trajectories. For example, Standard Beijing Mandarin Tone 3 is commonly transcribed as [214]. But in sandhi contexts such complex tones often simplify to a rise or fall as most famously illustrated by Standard Mandarin Tone 3: it appears as a [21] fall in Taiwanese Mandarin and as a [21] contextual variant in Beijing Mandarin.

Tonal typology has uncovered two common differences in the phonetic realization of rising versus falling tones that can impact their phonological behavior and distribution. First, a rise in F0 often requires more time to execute than a fall (Sundberg 1971; Xu and Wang 2001), as exemplified by Standard Mandarin Tone 2 [35] versus Tone 4 [51] (Xu 1997). Second, the peak in an HL fall is frequently realized at a higher F0 value compared to an H followed by another H or in isolation (Xu 1997). Complementarily, the peak of an LH rise is often realized at a lower F0 value compared to an H preceded by another H or in isolation and is grammaticalized as downstep in many tonal and pitch accent languages such as Yoruba and Japanese.

The location of the tone in the phonological word and phrase can be critical in determining its overall distribution as well as its behavior in sandhi. Based on a cross-linguistic survey of 187 languages, Zhang (2002, 2007) showed that complex tones are biased towards the right end of a phrase—a context typically associated with prepausal phonetic lengthening that allows more time to realize pitch changes. The phrase-final restriction of Beijing Mandarin Tone 3, mentioned above, is a prime example of this generalization. In another important study, Zhang (2008) connected the distinction between left versus right-dominant sandhis to tone spreading versus tonal neutralization or paradigmatic substitution. Exemplars of the former are the Shànghǎi broad sandhi (Duanmu 1995) where all but the first tone in the sandhi domain is deleted and the initial tone is then decomposed into simple noncontour tones that are mapped to the first two syllables of the domain. Wǔyì (Fu 1984) and the Amoy Hōkkièn tone circle (e.g. Chen 2000) exemplify the right-dominant sandhi patterns: the former with neutralization of tonal contrasts in non-final syllables and the latter with paradigmatic tonal substitutions in these positions. Duanmu (2008) and others have connected the right versus left-dominant sandhi to an abstract prominence/stress distinction with the sandhi change targeting the weak/nonprominent position in a metrical foot.

The yīn-yáng division of the pitch space has motivated the incorporation of register as a distinct dimension in the representation of the contrasts among Chinese tones in models such as those proposed by Bao (1990, 1999), Meredith (1990), Yip (1980, 1989), and others. These studies show that some sandhis modify or neutralize distinctions in tonal shape while preserving the upper versus lower-register contrast, while other sandhis do the opposite, preserving a rising or falling contour but shifting the tone to the upper or lower region of the pitch space.

Finally, some tonal alternations can be characterized as dissimilations in which adjacent tones alter their shape or their register in order to avoid successive instances of the same phonological element (Leben’s 1973 Obligatory Contour Principle, OCP). The ban on successive high tones is found in Bantu languages such as Shona (Odden 1980, 1986) and in the tonal polarity of many West African languages where an affix takes the opposite value (high or low) from the tone of the adjacent stem (Pulleyblank 1986; Kenstowicz et al. 1989). In the Sinitic context, such OCP-motivated changes can provide crucial evidence for tonal representation (Hsiao 2008).

In this paper, we document the tones of Lóngyóu and then discuss the evidence they provide concerning the typological and theoretical issues just reviewed. Our paper is organized as follows. Section 2 briefly reviews a prior description of Lóngyóu’s segmental and tonal categories. In section 3 we describe our corpus of data and the correspondences of the Lóngyóu tones with Middle Chinese. Section 4 documents the phonetic characteristics of the Lóngyóu tones exhibited by our two speakers: their F0 contours and durations. It then compares them with a couple of earlier descriptions. In section 5 we describe the sandhi changes found in the compound nouns comprising our corpus. Section 6 provides an OT analysis of these sandhis. Section 7 is a discussion and summary of the main findings of our study.

2 Lóngyóu dialect: background

Prior research on Lóngyóu is minimal. The Qúzhōu Government Record (QGR 1994) provides a chart of the segmental phonemes and the tonal inventory, shown in Table 2.

Table 2 Lóngyóu Segmental and Tonal Inventories (QGR)

Lóngyóu’s maximal syllable template CGVX expands the rhyme into a medial on-glide, an obligatory vocalic nucleus, and a single coda consonant or off-glide. The glides contrast as front [i̯] versus back [u̯] while consonantal codas are restricted to a dorsal nasal or glottal stop. The nasal coda is typically realized as nasalization of a preceding low vowel. Examples of these syllable types appear in Table 3 with the QGR tonal transcriptions.

Table 3 Attested Lóngyóu Syllable Inventory

3 Data for the study

3.1 Our corpus

In order to investigate the existence of word-level tone sandhi in Lóngyóu, we constructed a corpus of 8*8 disyllabic nominal compounds comprising all possible tonal combinations. We selected our data so that each element of the compound also occurs as a free-standing lexical item whose citation tonal category could be determined. See the Appendixes for the list of compounds (Appendix 2) and their constituents (Appendix 1). Although we did not systematically vary the tones as a function of syllable type in constructing the set of monosyllables, most of the tones occurred with multiple syllabic structures (except for the entering tones, which are restricted to checked syllables with a coda glottal stop). The frequency of the syllable shapes appears to vary as a function of syllable complexity, as shown by the counts in Table 4. CV is the most frequent, followed by syllables with two rhymal positions. Rhymes with both a medial glide and a coda are the least represented in our corpus.

Table 4 Syllable Types and Counts in our Corpus

3.2 Middle Chinese correspondences

The lexical items composing our corpus correspond regularly with the Middle-Chinese (MC) tonal categories reconstructed by Baxter and Sagart (2014). Table 5 provides a sample of our data along with their Standard Mandarin and Cantonese cognates (see Appendix 1 for the full set). In this table the tones are transcribed in accordance with the QGR system in Table 2.

Table 5 Lóngyóu, Mandarin and Cantonese Cognates Illustrating the Eight Middle-Chinese Tones

The following observations can be made concerning the correspondences. The MC píng (level) category is reflected in the T445 and T211 tones of Lóngyóu. As suggested by the triple-digit QGR transcriptions, these tones are realized as relatively long (see 4.1 below). Lóngyóu T5 and T2 regularly correspond to MC entering tones with a coda stop realized as [ʔ]; cf. the Cantonese cognates where MC stop consonants are preserved in the coda. As suggested by the single-digit transcriptions, they are realized as short (see 4.1). The Lóngyóu upper-register T45 rise versus T53 fall contrast cleanly aligns with the corresponding shǎng rise versus fall categories of Middle Chinese. But in the lower register, Lóngyóu T13 and T31 have partially overlapping cognates in MC. As we shall see, these are the tones that are the most unstable in Lóngyóu and participate in sandhi changes. In addition, they have particular durational and laryngeal characteristics (discussed in 4.1) and are the tones that have fallen together in the speech of younger Níngbō speakers (Lyu 2019) as well as in Shànghǎi, as seen in Table 1.

3.3 Onset and register

As indicated earlier, the four Middle Chinese tonal categories have split into upper and lower-register variants in Lóngyóu. This division of the pitch space is customarily attributed to a contrast in the voicing of the MC onset consonant. For our Lóngyóu data there is a very strong correlation between the register specification of the tone and the onset category in the MC reconstruction of Baxter and Sagart (2014). For example, there are 15 items in our corpus whose MC cognate is reconstructed with a voiced obstruent onset. All appear with a lower-register Lóngyóu tone: T211 (N = 6), T13 (N = 3), T31 (N = 3), and T2 (N = 3). There are seven items in our corpus that are reconstructed with voiceless aspirated onsets in MC. All appear with upper-register Lóngyóu tones: T445 (N = 1), T45 (N = 2), T53 (N = 3), and T5 (N = 1). In our recordings (see below) the lower-register tones that are reconstructed with MC voiced-obstruent (stop) onsets are never pronounced with closure voicing, even when they occupy the second position of the compound and hence are normally intervocalic. This point diverges from what is indicated in Liu (2020: 190) based on data from Cao (2002) where voicing is reported, although whether it is closure voicing or some other phonetic reflex of phonological voicing is not stated. They differ from the onset stops followed by rhymes in the upper tonal register in terms of VOT and in many cases in voice quality as well.Footnote 1 Further study of the phonetic correlates of the Lóngyóu register distinction, with more data, is clearly called for.

4 Phonetic correlates of the Lóngyóu tones

Our study examined the phonetic correlates of the Lóngyóu tones based on two separate recordings, each provided by the two female native speakers. Our first speaker (Sp1) is c. 30 years of age and the second (Sp2) is c. 50 years of age. The first recording consists of the 77 individual components of the compound structures (see Appendix 1). In this file, each of the eight Lóngyóu tones is represented by between seven and twelve specimens: T334 (N = 10), T45 (N = 10), T53 (N = 8), T5 (N = 9), T211 (N = 12), T313 (N = 10), T131 (N = 8), and T2 (N = 10). These items were randomized and recorded with the speakers pronouncing each word in isolation and then in a frame sentence ‘e ge ze X (Mandarin zhè gè shì X) 'this is X'. This procedure was then repeated yielding four data points for each of the 77 lexical items for a total of 308 items per speaker. For the 64 disyllabic compounds (see Appendix 2), these words were also randomized and each compound was elicited in citation form followed by the same framing sentence. The compounds were repeated as well to yield four data points for each speaker, for a total of 256 items per speaker. Sp1 was recorded in a sound-insulated booth with a Shure SM10A Unidirectional Head-Worn Dynamic Microphone and a USB Pre 2 Preamp at a sampling rate of 44.1 kHz, 16 bits. Sp2 was recorded in a quiet room in Lóngyóu city with a Samson Q2U microphone. The recordings were analyzed with Praat scripts based on manually constructed and labeled textgrids.

4.1 Citation forms

In order to get a sense of the accuracy of the traditional tone labels with the Chao numbering in Table 2, we show in Fig. 1 a plot of the pitch contours for the eight citation tones obtained from the first of the two data sets comprising our corpus. The data are pooled from both speakers. The F0 measurements were normalized with logs and then averaged across the two repetitions for each lexical item and each speaker and are displayed with the help of R’s (version 4.1.3) ggplot function.

Fig. 1
figure 1

F0 Contours of Eight Lóngyóu Tones (Citation Form)

The following observations can be made concerning this plot. First, the eight tones separate into the upper and lower regions of the pitch space rather cleanly. The only exception is the rise-fall yángqù tone whose peak rises above the sagging yīnpíng and penetrates the upper zone. The checked-tone syllables are short with yángrù showing a rise, perhaps reflecting a tightening of the vowel folds for the coda glottal stop. The yīnpíng and yángshǎng tones are the longest. The latter exhibits an irregularity at the 3/4 point and more generally some jitter throughout the dipping portion reflecting the laryngealization that was found in both speakers’ pronunciation of this tone. Yīnshǎng is a rise and yīnqù an early fall. The yángpíng is a gradually falling tone with some laryngealization at its bottom end.

Our results are largely congruent with the transcriptions reported in three previous studies of the Lóngyóu tones. With respect to the QGR triple-digit transcriptions of the yīnpíng [445] and yángpíng [211] tones seen in Table 2, we also find these tones to be relatively long, especially yángpíng. The only difference is that our speakers’ yīnpíng is at the lower end of the upper pitch space suggesting a [334] transcription rather than QGR’s [445].

Table 6 shows the transcriptions reported for a speaker from Lóngyóu county in Cao’s (2002: 100) monograph on Wu dialects. Here the yīnpíng [434] is transcribed with three digits while yángpíng is shorter [21], a difference that more closely aligns with our results compared to the QGR. Also, the lower-register checked tone [ʔ23] is a rise while its upper-register counterpart [ʔ5] is a single-digit flat tone at the ceiling of the pitch space. These transcriptions closely approximate the trajectories seen in the plot of our data in Fig. 1. In addition, the yángshǎng and yángqù tones are transcribed with internal turning points [213] and [231] that agree with our speakers’ concave and convex trajectories.

Table 6 Lóngyóu (龙游) Tonal Inventory (Cao 2002: 100)

Finally, Fig. 2 reproduces the citation form tones reported in Rose (2021: 37) based on the recording of a Lóngyóu county speaker made by William Ballard in the 1960s. For this speaker, the yīnpíng and yángshǎng tones lack the extra duration reported in the studies mentioned above. However, the peak of the yángqù (‘low convex’) tone rises above the midpoint of the yīnpíng (‘mid level’) tone, comparable to our results.

Fig. 2
figure 2

F0 Contours of Eight Lóngyóu Tones from Rose (2021: 37). The Wu tonation template exemplified with acoustics of eight citation tones from a speaker of Longyou. Vertical axis = fundamental frequency (Hz). Horizontal axis = duration (csec.)

In sum, our findings largely agree with what is reported in earlier literature. Given that our study and the Rose-Ballard one are supported by phonetic instrumentation and are largely congruent with the transcriptions reported in Cao (2002), and to a lesser extent the QGR, we can place some confidence in their overall accuracy.

A couple of additional points are worth making concerning our data. First, Table 7 shows the average F0 values across the entire syllable rhyme for each of the citation tones. By this measure as well, the Lóngyóu tones clearly divide the pitch space into the two registers.

Table 7 F0 (Hz) Means for the Syllable Rhymes of the Eight Lóngyóu Tones

Second, in their evolution from Middle Chinese, the Lóngyóu lower-register reflexes of the MC shǎng rising and falling tones have added an initial polar tone (13 > 313 and 31 > 131) that can be viewed as an enhancement strategy to help distinguish these tones from their level counterparts as well as directing attention to the turning point that defines the onsets of the rise and the fall of these two contour tones. See Evans et al. (2018) for discussion of initial polar tones in Punjabi and Pittayaporn (2018: 259) for ‘initial-drop’ as an enhancing mechanism for a rising tone in Bangkok Thai. Thus, for Lóngyóu, the MC lower-register rising and falling tones have enhanced their saliency with the addition of a polar onset at the cost of creating a more complex articulation with an internal turning point. On the other hand, the Shànghǎi dialect followed a different evolution by merging these tones with the yángpíng category and thus simplified the overall Wu tonal inventory from eight to five elements (Table 1).

Table 8 shows the values of the eight Lóngyóu tones in the five-point Chao scale (averaged citation form) for each speaker.Footnote 2

Table 8 Chao-scale transcriptions of Lóngyóu tones

Compared to Speaker 2, Speaker 1 has one-point higher values for most of the tones. The F0 ranges for the two speakers are quite close, however: Sp1 386 Hz max, 156 Hz min; Sp2 404 Hz max, 164 Hz min. Most importantly, the gross shapes of the tones are the same for both speakers.

Figure 3 shows the mean durations of the eight Lóngyóu tones. This data pools the scores of both speakers across the isolation and sentence-frame contexts for the noncompound, citation tones. The data indicate that the Lóngyóu tonal inventory aligns with the cross-linguistic tendency for rising tones to be longer than falling tones: cf. yīnpíng T334 versus yángpíng T211 and yīnshǎng T45 versus yīnqù T53. This generalization extends to the checked-tone syllables if the T2 yángrù is counted as a phonetic rise. It also holds for the concave T313 yángshǎng versus convex T131 yángqù tones if they are classified as rise versus fall based on the medial and final points of their trajectories that correspond to their MC sources as shǎng (rise) versus (fall) tones, possibly motivating, in turn, a right-branching 3(13) and 1(31) representation in the synchronic grammar. Indeed, the dipping T313 is the longest of the eight Lóngyóu tones and parallels, in this respect, the fall-rise citation form [214] of Standard Mandarin Tone 3. Lóngyóu T313 also parallels Beijing Mandarin Tone 3 in having significant medial laryngealization—an enhancement strategy accompanying the dip in F0.

Fig. 3
figure 3

Mean Durations of the Eight Lóngyóu Tones (Citation form); error bars are standard errors

A mixed-effects linear regression in R (version 4.1.3) on duration regressed by Rise (yes for T334, T45, T313 versus no for T211, T53, T131) as a predictor, found the rising tones to be significantly greater (beta = 90.07, t = 6.08).

With regard to the relation between tone and syllable rhyme duration, seen in Fig. 3, one might wonder whether syllable shape could be a confounding factor. To pursue this point, mixed-effects linear regression tests were run over models that included both tone and syllable shape (CV, CVN, CVG, etc.) as predictors of rhyme duration vis-à-vis a model with just tone. An anova test revealed that adding syllable shape as a factor did not significantly improve model fit (Chisq = 0.61 for the model that included the tone syllables and Chisq = 0.77 for a model that excluded the -tone syllables). More generally, sandhi rules and constraints in Sinitic languages are typically expressed with tonal categories that abstract away from syllable shape. For example, the Standard Mandarin Tone 3 sandhi rule banning successive Tone 3s does not care whether a given Tone 3 is drawn from a CV versus CVN versus CGV syllable. In the few cases where syllable shape seems to matter, such as in the checked syllables, syllable shape is often treated as a tonal category (e.g., ‘entering tone’).

In sum, Lóngyóu is conservative and has abstained from the tonal splits and mergers that are found in the citation forms in some other (northern) Wu dialects. The upper-register reflexes of the four MC categories preserve the level, rise, and falling shapes commonly reconstructed for these tones. The lower register has been restructured such that the MC shǎng (rise) tone now has a two-part falling and then rising trajectory that we transcribe as T313 while the MC (fall) has acquired a convex rise-fall shape whose apex crosses into the upper pitch space and is henceforth transcribed as T131. For these two lower-register tones, the original MC shape is preserved in the second part of the contour while the initial portion is an innovation with a falling or rising trajectory that polarizes with respect to the second part of the tone.

5 Sandhi tones

Our data reveal three category-changing tonal alternations that target the first (A) position in the compound. In addition, there are some clippings as well as spillovers of the A tone that we treat as co-articulatory phonetic changes. Finally, our data reveal a lexically-determined metatony replacing the tone of the second member of the compound with the T53 yīnqù. We briefly illustrate the second and third of these changes before focusing on the first type.

5.1 Minor changes

The final rises seen in the citation forms of the yīnpíng [334] and yángrù [2] tones were missing when the noun appeared in the first position of the compound. The screenshots in Fig. 4 of the Praat spectrograms for the isolation forms of [than334] ‘soup’ and [ni̯eʔ23] ‘hot’ and their flattened contours in [than33-su̯i45] ‘soup-water’ and [nieʔ2-than334] ‘hot soup’ illustrate this difference. It is not clear whether the flat variant seen in the sandhi position should be considered the basic tone with the prepausal rise added in the citation context or, alternatively, as the product of truncation of the final rise in the sandhi position. The former option reflects the presumed level shape of the MC source. Following customary practice in the description of Chinese tone sandhi, our OT analysis in 6.2 takes the citation form as basic and treats the flat shape as derived by truncation.

Fig. 4
figure 4

Praat Screenshots of yīnpíng [than] ‘soup’ and yángrù [ni̯eʔ] ‘hot’ in Citation and Sandhi Positions

In our corpus the metatony alternation was regularly exhibited by the yángpíng word nen [211] ‘person’; it substitutes the [53] yīnqù tone in the compounds lɔ-nen [21-53] ‘old person’, bin-nen [21-53] ‘sick person’, and di̯e-nen [21-53] ‘enemy person’. The screenshots for nen [211] and lɔ-nen [21-53] in Fig. 5 illustrate this alternation. Other items in the corpus with this tonal substitution include di̯e-nɔ [21-53] ‘electric-brain’ (‘computer’, cf. T313 ‘brain’), leʔ-ni̯eʔ [2-53] ‘sixth-month, June’ (cf. T2 ni̯eʔ ‘moon, month’), and bin-si [21-53] ‘sick-history’ (cf. T45 si ‘history’). This tonal change can probably be traced back to the chiuhsheng metatony of Ancient Chinese discussed by Downer (1959) based on observations from the earliest descriptions of Classical Chinese which posit a suffixal -s in the derivational morphology. The fact that the Lóngyóu metatony targets the B member of the compound might reflect the suffixal nature of the original derivation. When a checked-tone syllable like ni̯eʔ ‘moon, month’ undergoes this tonal change it appears to retain the coda glottal stop, suggesting that the tone and the syllable structure of the rù category are separate phonological dimensions. More data is needed to investigate this phenomenon.

Fig. 5
figure 5

Praat Screenshot of the T53 Metatony for nen ‘person’

5.2 Initial sandhi 1

The first regular Lóngyóu sandhi truncates the final rise from the T313 yángshǎng dipping tone when it appears in the first (A) position of the compound. Table 9 summarizes this sandhi for the data in our corpus.

Table 9 Lóngyóu Tone Sandhi 1: 313 → 31/ ____ T

The Praat F0 tracings in Fig. 6, for the citation and sandhi forms of ‘old’, from Sp1, illustrate this alternation. In the isolation form of this T313, the dip in pitch is accompanied by laryngealization leading to the gap in the F0 tracing. The truncated T31 sandhi form lacks this feature, as seen in lɔ-ti̯e ‘old store’. The pitch tracings are transcribed in pinyin.

Fig. 6
figure 6

Praat F0 Tracings for lɔ ‘old’ in the Citation and Sandhi Contexts

This Lóngyóu sandhi is comparable to the so-called ‘half-third sandhi’ in Standard Mandarin where the [214] isolation form of Tone-3 appears without the terminal rise in the non-final position of the phrase. According to Yang and Xu (2019), truncation of the final component of a complex tone is one of the most frequent sandhi processes found in East-Asian languages.

5.3 Initial sandhi 2

The dome-shaped yángqù T131 appears regularly in the citation form and the second position of a compound, but in the initial (A) position it takes on a shorter form that coincides with its MC source as a falling tone. The data in Table 10 summarize this sandhi.

Table 10 Lóngyóu Tone Sandhi 2: 131 → 31 / ___ T

The pitch tracings in Fig. 7 from Sp1 show the F0 contours for T131 fan ‘rice’. The rise-fall trajectory appears in the citation fan and prepausal second position (B) of the compound beʔ-fan ‘white rice’ while the initial rise is truncated in the first half of the compound fan-ti̯e ‘rice store’.

Fig. 7
figure 7

Praat Tracings for T131 fan ‘rice’ in Citation and Compound Positions

The Lóngyóu T313 and T131 sandhis converge on a lower-register fall output that coincides with the T211 yángpíng tone leading to a neutralization in the A sandhi position of the compound that mimics the lower-register mergers seen in the Shànghǎi inventory of Table 1. Both of the T313 and T131 sandhis are reductions of complex tones with an internal turning point that are typically motivated by their duration requirements (see Yip 1989 for a comparable case from Wúxī). In order to check this point, we measured the durations of the Lóngyóu tones in our entire corpus as a function of three positions: citation, first (A) position in the compound, and second (B) position in the compound. As shown in Fig. 8, the A position is in fact the shortest overall. A mixed-effects linear regression over the pooled duration data with speaker and word as random intercepts found significant differences between the base-line sandhi A position and the sandhi B position (t = 3.18 for raw data and 3.85 for log-transformed data) and between the base-line sandhi A position and the citation form (t = 18.02 for raw data and 16.05 for log-transformed data). This sandhi behavior contrasts with the Northern Wu dialects of Shànghǎi and Níngbō where it is noninitial tones that are suppressed or modified while initial position is the site of contrast.

Fig. 8
figure 8

Duration (msecs) of Lóngyóu Tones (Syllable Rhymes) by Speaker and Context for Three Critical Positions (error bars are standard errors)

Both the dipping T313 and the dome-shaped T131 converge on a shorter falling tone, T31, that is not found in the inventory of citation tones. For the latter case, this output corresponds to the etymological source as an MC tone; but for the dipping T313, it does not. It is reasonable to suppose that the sandhi processes producing these tonal alternations are motivated by simplification under time pressure. The reduction of both the complex fall-rise [313] and rise-fall [131] tones to a falling [31] sandhi contour, rather than to a [13] rising contour, can be attributed to the relative markedness of rising tones vis-à-vis falling tones, especially when duration is minimized: *Rise » *Fall (see OT analysis below). The reduction of both T313 and T131 to T31 does not consistently involve suppression of the initial (enhancing) tone that was added in Lóngyóu’s historical evolution from Middle Chinese. While the T131 → T31 process would be consistent with Hirayama’s (1998) generalization that the sandhi tone reveals the etymological source of a tone, the T313 → T31 sandhi does not accord with this principle. Nor do they fall under Yang and Xu’s (2019) generalization that sandhi truncation targets the final portion of the tone. Rather, both align with the output-based *Rise » *Fall markedness preference.Footnote 3

5.4 Initial sandhi 3

The third Lóngyóu sandhi found in our data changes the upper-register rising T45 to a lower-register fall [31] when the following syllable contains an upper-register tone. This sandhi process was quite consistent for both speakers and is summarized in Table 11.

Table 11 Lóngyóu Tone Sandhi 3: 45 → 31 /____ H

The Praat F0 tracings in Fig. 9, for the isolation and sandhi forms for T45 khou ‘mouth’ from Sp1, illustrate this sandhi. The T45 tshɔ of tshɔ-pin ‘meadow’ does not change, since pin ‘plain’ is in the lower register. The tracing for Sp2 shows the same rising contour for this compound.

Fig. 9
figure 9

Praat F0 Tracings for yīnshǎng T45 of khou ‘mouth’ and tshɔ ‘grass’

5.5 A Taiwanese parallel

A yīnpíng sandhi analyzed by Hsiao (2008), for the Sixian dialect of Taiwanese Hakka, bears a striking resemblance to the Lóngyóu T45 process that is worth mentioning here. According to Hsiao, Sixian has the six citation tones shown in Table 12. The MH rising tone is changed to a lower-register low tone when followed by one of the upper-register tones MH, H, or Hʔ. The fact that the Sixian process applies to a different etymological class from the Lóngyóu T45 sandhi (Sixian yīnpíng and Lóngyóu yīnshǎng) indicates that these are separate developments—not to mention their different geographic loci. Both the Hakka and the Lóngyóu sandhis change an upper-register rise to a lower-register tone before an upper-register tone. They demonstrate the relevance of the register distinction for defining the context as well as the ‘repair’ in an OCP-motivated regressive dissimilation for the upper-register in the synchronic phonology of the two languages.

Table 12 Sixian (Hakka) Sandhi of Rising Tone (Hsiao 2008)

‘pig liver’

‘go to the class’

‘returned’

tsu-kon

song ko

kong fuk

MH MH >

MH H >

MH Hʔ >

L MH

L MH

L Hʔ

‘desk’

‘cloth closet’

‘returned’

su-tsok

sam-tshu

sien-tshau

MH Mʔ >

MH L >

MH ML >

MH Mʔ

MH L

MH ML

5.6 Interim summary

The plot in Fig. 10, summarizes the F0 trajectories for all eight of the underlying Lóngyóu tones in the sandhi (A) position of the compounds in our corpus. It is based on time-normalized measurements derived with the help of Prosody-Pro (Xu 2018), averaged over both speakers. The display of the yīnshǎng T45 rise is split between its realization before underlying upper versus lower-register tones in the second position of the compound. Before a lower-register (B) tone, the underlying rising trajectory of yīnshǎng is maintained in the upper region of the pitch space (grey line). But before an upper-register (B) tone (orange line), yīnshǎng T45 drops to the lower register where it merges with the other lower-register tones in a falling shape transcribed here as [21]. The chart reveals that the yángrù tone has a flat trajectory while the yángpíng, yánshǎng and yángqù tones converge on a falling shape that mimics a shortened version of the yángpíng. The sandhi tone derived from the upper-register yīnshǎng is slightly higher than the other sandhi tones. More data is needed to tell if this is a real difference. The upper-register tones remain distinct in the A position; the yīnpíng tone has lost its final rise, while yīnshǎng-L and yīnqù retain their rising versus falling trajectories.

Fig. 10
figure 10

Time-normalized F0 Contours in Sandhi (A) Position

The chart in Fig. 11 summarizes the duration of the tones in the sandhi (A) position. The checked-tone rù syllables remain distinctly shorter, while the other tones converge on a duration of intermediate length that is consistent with the two-digit transcriptions indicated.

Fig. 11
figure 11

Duration of Sandhi (A) position Syllable Rhymes

Finally, Fig. 12 shows the time-normalized F0 contours of the eight Lóngyóu tones in the second (B) position of the compound.

Fig. 12
figure 12

Time-normalized F0 Contours in (B) Position

Compared to the A (sandhi) position in Fig. 10, the Lóngyóu tones in the second (B) position of the compound largely preserve the eight distinct shapes seen in the citation forms in Fig. 1. In particular, the lower-register yángpíng, yángshǎng, and yángqù tones remain distinct. Yángshǎng has the dipping contour with the bottom of the curve reflecting laryngealization. Yángqù preserves its dome shape whose apex passes through the sagging yīnpíng. One anomaly is that the yángpíng tone, while maintaining its falling contour, has raised its starting-point, presumably reflecting carry-over co-articulation from the preceding A tone. In Fig. 12, the yīnqù tone is segregated as a function of whether it is the allomorphic substitute (yīnqù-D) or the realization of the underlying tone of the second syllable. Both have the same early fall trajectory, but the derived tone is slightly lower in the pitch space. Many of these cases were instantiated by the morpheme -nen ‘person’. The lower F0 might reflect a difference in register-phonation. More data are needed to pursue this question.

6 OT analysis

6.1 Tonal representation

For our OT analysis, we adopt the phonological representation for Chinese tone introduced by Bao (1990, 1999). This scheme formally distinguishes register from tonal shape: Tone = Register + Contour. The register specification divides the pitch space into two zones and is designated here by upper-case H and L, which may be thought of as comparable to Yip’s (1980, 1989) [± upper]. The Contour node is an ordered sequence of higher and lower pitch points within each register indicated with lower-case h and l, respectively. The combination of Register {H,L} and Contour {h,l} generates four possible tone levels, which is sufficient to characterize most (Chinese) tonal languages, such as Cantonese.Footnote 4 Given this framework, the eight contrasting citation tones in Lóngyóu are represented as in Fig. 13. We also include the [31] sandhi tone. In this representational scheme, the Register (R) and Contour (C) nodes are constituted by the [± upper] and [± high tone] features while the dominating root node T plays more of an organizing role comparable to prosodic categories such as the mora or syllable. The Register and Contour nodes are not ordered with respect to one another but rather define two separate dimensions of contrast mirroring, in this respect, the relation between the manner and place of articulation nodes in classical models of feature geometry, such as Halle (1983) or Clements (1985).

Fig. 13
figure 13

Lóngyóu tones in Bao (1990, 1999) notation

We adopt, in part, the transcriptions of the QGR report in which the MC píng (level) tones have lengthened to take on three terminals in Lóngyóu. However, we transcribe the yīnpíng as [334], rather than [445], reflecting the lower position of this tone in our speakers’ pitch space seen in Fig. 1. Recall also, from Fig. 3, that the píng tones are the longest in the citation inventory aside from the dipping T313. The Lóngyóu Contour node thus permits a sequence of l’s but no successive h’s, suggesting the OCP ranking *hh » *ll.

A few additional points concerning the overall analysis are worth mentioning before considering the description of the sandhi changes. First, the initial syllable in a compound is the position where the systematic tonal changes occur. For our data, the citation form and the second position of the compound both precede pause, a context typically associated with phonetic lengthening and the preservation of more complex tones (Zhang 2002, among others). Phrase-final, prepausal position is thus the prosodic analogue of word and phrase-initial position, which typically licenses more phonemic contrasts in the segmental domain. As suggested in Fig. 3, these contexts are associated with greater duration compared to the A compound position. Accordingly, we assume an undominated positional faithfulness constraint (Beckman 1998) protecting the Contour tone’s terminals from deletion, insertion, reordering, or feature change when the tone occupies a prepausal position. (The yīnqù [53] tonal allomorphy takes place in the lexicon.) Second, we assume that Ident-t is undominated. As a result, the only options to repair a markedness violation are the deletion, insertion, or reordering of the Contour node terminals. And given that the initial polar tones of T313 and T131 are part of the input, none of the sandhis involve the insertion of a tone and so Dep-t is also undominated. The possible repairs are thus restricted to deletion (violating Max-t) and reordering (violating Linearity-t). Finally, we assume an undominated constraint restricting rù tone syllables with the glottal stop coda to just one terminal tone.

6.2 Complex tone simplification

Putting aside the yīnshǎng sandhi of section 5.3 for the moment, the remaining tone sandhis are summarized in Table 13.

Table 13 Lóngyóu Sandhi Alternations

Each change involves a reduction of the Contour node from three terminals to two, motivated by the reduced time available in the first (unstressed) position of the compound. The choice of which terminal to delete is determined by the markedness dispreference for rising tones *lh and, in the case of yángpíng, an OCP-t dispreference for successive l tones: *ll.

The constraints at play in our analysis and their rankings are shown in Table 14.

Table 14 OT Constraints and Rankings for Lóngyóu Tone Sandhis

The OT tableaux in Table 15 show the input-output mappings for the yángshǎng /hlh/, yángqù /lhl/, yángpíng /hll/, and yīnpíng /llh/ sandhis. The *C-3 constraint eliminates the fully faithful candidates, forcing the deletion of one of the terminals of the Contour node. The remaining candidates tie on Max-t and so the markedness constraints penalizing a rising tone or a sequence of identical tones eliminate the remaining competitors. The mapping of /llh/ to ll for the yīnpíng tone motivates the *lh » *ll ranking: avoidance of a rising tone is preferred to a succession of l tones.

Table 15 OT tableaux for tritonal sandhis

More intricate is the register-changing yīnshǎng sandhi of khou-su̯i ‘mouth-water’ [45]-[45] → [21]-[45]. This sandhi inverts a lh rise to a hl fall, but only in the context of a following upper-register tone that has triggered an OCP-H dissimilation of the first tone to the lower register. Outside of this narrow context, the upper-register rising tone T45 is faithfully realized in the first position of the compound: recall tshɔ-pin ‘grass-plain’ [45-211] from Fig. 9. The distinction between khou-su̯i and tshɔ-pin can be drawn by subcategorizing the Linearity-t faithfulness constraint for register, as stated in Table 16.

Table 16 Additional OT constraint and ranking

The tableaux in Table 17 show how the yīnpíng llh tone of /than334/ truncates in [than33-su̯i45] ‘soup-water’ while the yīnshǎng lh of /tshɔ45/ is faithfully realized in the A position of the compound tshɔ45-pin211 ‘grass-plain’.

Table 17 OT Tableaux showing different mappings of an upper-register rise in tri- and bi-tonal sequences

6.3 OCP and constraint conjunction

We now consider in more detail the third Lóngyóu sandhi which changes the upper-register yīnshǎng rise T45 into a lower register [31] fall in the sandhi position, as in khou-su̯i ‘mouth-water’ [45]-[45] → [31]-[45] and tshɔ-weʔ ‘grass-cottage’ [45]-[5] → [31]-[5]. This alternation differs from the sandhis of 6.2 in several respects. First, rather than deleting a Contour-node terminal, it inverts the lh rise to an hl fall, and thus violates Linearity-t rather than Max-t. Second, while the yīnshǎng sandhi also changes the first element of a compound, it is not motivated by a reduction of three terminals to two. Rather, the hl falling shape is being preferred to the lh rise. However, duration is still arguably a motivating factor when we recall, from Fig. 3, that rising tones are longer than corresponding falling tones in Lóngyóu. If the Lóngyóu compounds are assumed to have a weak-strong iambic stress contour, then the switch from lh to hl is also optimizing the tonal contour by changing from the more costly rise to a fall in the unstressed syllable, where duration is independently at a premium. Third, the yīnshǎng sandhi changes the register specification of the tone from H to L and so aligns with the cross-linguistic affinity between stressed versus unstressed syllables and high versus low tone (de Lacy 1999). Fourth, the yīnshǎng change of register from H to L only occurs when the second syllable of the compound is also in the upper register. Recall, again, the contrast between khou-su̯i ‘mouth-water’ [45]-[45] → [21]-[45] vis-á-vis tshɔ-pin ‘grass-plain’ [45-211] → [45-211] in Fig. 9. The H-H to L-H register change thus instantiates the OCP-H phenomenon found in many (African) tonal languages.

The Lóngyóu yīnshǎng sandhi can be expressed as the conjunction of two markedness dispreferences: a rising tone in an unstressed syllable () and two successive upper-register tones (*H-H). These two markedness factors join forces in a locally-conjoined constraint (Smolensky 1995) to ban “the worst of the worse”: a rising contour in the unstressed position of two successive H-register tones. In the OT model, a locally-conjoined constraint schema typically consists of two markedness constraints that are individually held in check by faithfulness (F1 » M1 and F2 » M2). But when conjoined, the markedness constraint ensemble may rank above F1 and F2 (M1&M2 » F1, F2) to induce a change. A violation is assessed only when a candidate violates both conjuncts of the conjoined constraint. Furthermore, constraint conjunction requires specification of the domain over which the conjoined constraint is defined. In our case, the domain is the compound. Figure 14 shows the tonal configuration that is being modified in this sandhi.

Fig. 14
figure 14

Tonal configuration modified by the Lóngyóu yīnshǎng sandhi

Table 18 contains the ingredients of the proposed analysis. *H-H (16a) and (16c) are the two markedness constraints at play. Both are individually dominated by faithfulness: *H-H by Ident-H (16e) and by the Max-t, Dep-t constraints that penalize deletion and insertion of the terminals of the Contour node (18g). In (18d) we state the conjoined markedness constraint. (18f) and (18h) show the ranking of the conjoined constraint over the relevant faithfulness constraints that allow a penalty to be assigned in the compound.

Table 18 OT constraints for Lóngyóu yīnshǎng sandhi

We now turn to the input–output mappings constituting the Lóngyóu OCP sandhis seen in khou-su̯i [45]-[45] ‘mouth-water’ → [31]-[45] and tshɔ-weʔ [45]-[5] ‘grass-cottage’ → [31]-[5] and the analytic challenges they present. The first observation is that the sandhi change again targets the first position in the compound. We assume that undominated faithfulness in pause protects the second (B) position of the compound from either a change of register or a change of pitch shape: Ident-H/pause, Max-t/pause, Dep-t/pause, Linearity-t/pause » *H-H & . The tableau below, in Table 19, shows how the attested output for khou-su̯i beats the fully faithful candidate (19b) that violates the conjoined constraint as well as alternatives that alter the register specification (19c) or the contour shape (19d) of the second element of the compound.

Table 19 /khou-su̯i/ [45]-[45] ‘mouth-water’ → [21]-[45] (second position)

We now turn to the behavior of T45 in the first position in the compound. The tableau in Table 20 shows the evaluation of possible candidate outputs by the relevant constraints in our analysis. The fully faithful candidate (20a) violates the conjoined markedness constraint. Candidate (20d) violates the Linearity-t constraint protecting the Contour node in the upper register. Candidates (20b) and (20c) are not penalized by this constraint since their initial syllables are in the lower register. They tie on faithfulness to register (Ident-H). So long as *lh dominates Linearity-t, the candidate that inverts its contour from lh to hl will beat the candidate that preserves the rising tone.

Table 20 /khou-su̯i/ [45]-[45] ‘mouth-water’ → [21]-[45] (initial position)

Finally, the tableau in Table 21 shows the behavior of the T45 before a lower-register tone. Recall that, in this context, no modification of either tonal register or tonal shape is found. The tableau shows that the fully faithful candidate (21a) is optimal. It is not penalized by the conjoined markedness constraint, since it does not contain two successive upper-register tones. And faithfulness to contour in the upper-register (Linearity-t/H) blocks the change to a falling tone in candidate (21c).

Table 21 tshɔ-pin ‘grass-flat’ [45-211] → [45-211]

6.4 Final rankings

The Hasse diagram in Fig. 15 summarizes some of the critical rankings in our OT analysis of the Lóngyóu sandhis. The markedness constraint *C-3 penalizing a tonal contour node with three terminals dominates the Max-t faithfulness constraint; this ranking instigates the simplification of the sandhi tone. The *lh » *ll ranking of markedness constraints implements the dispreference for a rising tone and truncates the yángpíng llh to ll, instead of lh. The conjoined constraint *H-H & dominates Ident-H permitting the register change of an H-H compound when it contains a rise in the unstressed first position. The rising tone in the initial syllable of the compound that has descended to the lower register, in virtue of this ranking, is inverted to a fall in compliance with *lh at the cost of violating Linearity-t. The undominated Linearity-t/H blocks any tonal inversion in the upper register, accounting for the /khou-su̯i/ [45]-[45] ‘mouth-water’ → [21]-[45] versus tshɔ-pin ‘grass-flat’ [45-211] → [45-211] contrast.

Fig. 15
figure 15

Critical OT Constraint Rankings for Lóngyóu Sandhi

7 Summary

This paper has reported the results of an investigation of the tonal inventory and compound tone sandhi in the southern Wu dialect of Lóngyóu (Zhèjiāng Province, China). Our discussion is based on a corpus of recorded data from two female speakers analyzed with Praat scripts. The most significant findings concerning the tonal inventory, sandhi changes, and OT analyses are as follows.

With regard to its inventory of tones, Lóngyóu is conservative. In comparison to some northern Wu varieties, it preserves the eight-way tonal contrast formed by crossing the Middle Chinese level, rising, falling, and checked-syllable tones with the upper versus lower division of the pitch space that is characteristic of Wu. Second, for our data, the correspondences between the Middle Chinese reconstructions in Baxter and Sagart (2014) are systematic. Third, Lóngyóu has developed dipping and peaked versions of the lower-register yángshǎng rising and yángqù falling tones of Middle Chinese by adding an initial polarizing pitch point. Fourth, the Lóngyóu tones conform to the generalization that rising contours are associated with longer duration compared to falling tones. This generalization extends to the dipping [313] versus convex [131] tones of the lower register, if they are analyzed as right branching.

Our research documented three tone sandhis that target the first position of the compound. The notable results are as follows. First, Lóngyóu parallels Wǔyì and differs from Northern Wu varieties, such as Shànghǎi, that neutralize contrasts in the second syllable of the compound. Second, duration measurements uncovered a citation > second-syllable > first-syllable hierarchy that elucidates the loci of the sandhi neutralizations as positions associated with the shortest durations. Third, the sandhis that target the initial syllable of the compound converge on a lower register falling contour that approximates the yángpíng citation tone. Fourth, one of the sandhis is defined over successive high-register tones. This OCP-H restriction motivates a formal representation of register as distinct from contour in the synchronic system of tonal contrasts. Finally, Lóngyóu compounds manifest a lexically determined metatony in which the yínqù upper-register fall replaces the lexical tone of the second element.

The Lóngyóu sandhis were analyzed with ranked markedness and faithfulness constraints of an OT grammar. The main points of the analysis can be summarized as follows. First, the dispreference for rising over falling tones played a key role in the sandhis and was attributed to a *lh markedness constraint on the tone’s contour node. Second, the loci of the sandhi changes align with the cross-linguistic duration hierarchy proposed by Zhang (2002) and target positions associated with relatively shorter duration. Third, the 45 → 31 sandhi change before a high-register tone presented some analytic challenges. We proposed a conjoined-constraint analysis combining OCP-H (dissimilation for register) with the markedness dispreference for a rising tone in the initial unstressed position of the compound where phonetic duration is minimized.

It is noteworthy that the three sandhis targeting the lower register recapitulate the tonal mergers that reduced the eight-way MC inventory to just five tones in the Shànghǎi variety of Wu. Is this just a coincidence? If not, does Lóngyóu represent an intermediate stage in the evolution of Shànghǎi? In northern dialects the initial syllable is preserved while the second (more generally, noninitial) syllable is the site of neutralization in compounds. Is this dialect difference most properly characterized as a change from a weak-strong to strong-weak stress contour comparable to English compounds? How could the stressed syllable of a compound become the base-form context from which the tonal inventory is projected? Since the Lóngyóu tonal mergers occur in the lower register, is the tonal space of this zone more compact compared to the upper register? Or does the lower register recruit phonation as a supplementary dimension of contrast and hence deploy tones with more complex phonological structure and phonetic articulation that make them prone to simplification and merger? These are some of the questions our results raise for future research.