1 Introduction

This paper shows that in the ergative-marking, purportedly non-tonal language Samoan, high edge tones reliably co-occur with absolutive arguments,Footnote 1 which have previously been thought to be unmarked (Chung 1978:54–56; Ochs 1982:649; Collins 2014:94). To illustrate: consider (1) from Collins’s (2016) paper providing a VP-fronting account of verb-initiality in Samoan. My empirical claim entails that a high edge tone (annotated as Habs) reliably appears preceding the object in a basic VSO transitive clause, e.g., preceding [le maile ula] ‘the mischievous dog’ in (1a). An Habs also reliably appears preceding the subject in [VO]S word order, e.g., preceding [le teine] ‘the girl’ in (1b). No high edge tone appears before the bare NP object [maile ula] ‘mischievous dogs’ in (1b).

  1. (1)

    Absolutive H in VSO/[VO]S word order alternation under a VP-fronting account of VSO (Collins 2016: (3))Footnote 2

    figure a

The first part of this paper supports my empirical claim of an Habs in Samoan with converging distributional evidence from the phonetic and phonological analysis of intonational patterns in the spoken utterances of a systematically varied set of syntactic structures. The second part of this paper argues that the analysis of the Habs that best fits the currently available data is that it is a tonal case marker inserted in spellout as a reflex of the structural configuration of absolutive case. Neither the empirical claim of an Habs nor the theoretical claim that it is a tonal case marker is new: both were first proposed in Yu (2011). And since Yu (2011) was published, Calhoun (2015), Yu and Özyıldız (2016), Calhoun (2017), and Yu and Stabler (2017) have continued to address and discuss both claims.

What isn’t under discussion is the implicit claim that tone can mark case: tonal morphemes are not uncommon in natural language and can signal morphosyntactic relationships such as tense/aspect, gender, number, and case (Hyman 2011b). In particular, tonal case markers have been reported to occur in Somali (Saeed 1993:148), Maban, Nilotic languages, e.g., Maasai and Shilluk (Dryer 2013), Tibeto-Burman languages, e.g., Loloish/Yi and Kuki-Chin languages (Henderson 1967; Sun 1996; 2010), and languages of West Africa, e.g., Igbo (Hyman 2011b:203-204). (See supplementary material in OSF repository for details.) Rather, the discussion in the literature on Samoan prosodic interfaces has centered on two related questions that probe the nature of the syntax-prosody interface and the relation between tone and intonation: (i) what is the relation between the Habs and other case markers in Samoan, which are all segmental?, and (ii) what is the relation between the Habs and other high edge tones in Samoan that co-occur with fronted expressions (Hfront) and coordination (Hcoord)?

This paper builds on Yu and Özyıldız (2016), Yu and Stabler (2017) in proposing that: (i) Habs is a tonal case marker that may be related to an apparently moribund stressed, segmental absolutive particle [ˈia], and that (ii) Habs, Hcoord, and Hfront are all syntactically determined and each inserted in the spellout of distinct syntactic configurations (absolutive case, fronted expressions, coordination). What this paper contributes beyond Yu and Özyıldız (2016), Yu and Stabler (2017) is the actual empirical data that a high edge tone co-occurs with absolutive arguments, which is taken for granted in those papers (unpublished, previous versions of this paper are cited as Yu (2016) in Yu and Özyıldız (2016) and Yu (2017) in Yu and Stabler (2017)), as well as a detailed discussion showing that the proposal here fits the current data better than other alternatives. These alternatives include proposals from Calhoun (2015, 2017) that: (i) there is no relation between Habs and other case markers because Habs is not a case marker; (ii) instead, Habs  Hfront, and Hcoord are some among many sentence-medial prosodic boundary tones (‘H-’) unified by their association to the right edge of phonological phrases. Calhoun (2015) proposes that these phonological phrases come from mapping between syntactic and prosodic constituents, while Calhoun (2017) updates the proposal, implying that they are mapped from information structure rather than syntax.

In adjudicating between these proposals, I challenge common working assumptions about the syntax-prosody interface: that (i) edge tones are invariably prosodic boundary tones that come into the syntax-phonology interface via the mapping between syntactic and prosodic domains, and (ii) a unified analysis of edge tones is prima facie preferred.

First, I point out that an edge tone is not necessarily triggered by a prosodic domain edge, although it may happen to appear at the periphery of a prosodic domain such as the edge of a prosodic word. The term ‘edge tone’ is often taken to be interchangeable with the term ‘(prosodic) boundary tone’ and associated with a “major prosodic boundary,” i.e., above the level of a prosodic word—a phonological phrase boundary or intonational phrase boundary (Ladd 2008:44, 47, 100). But in this paper, I use ‘edge tone’ purely descriptively to refer to what conditions where the tone appears in the surface realization: to roughly distinguish between tones whose phonetic alignment is determined by stress position (pitch accents) and tones whose alignment is determined by morphosyntactic/prosodic word edges (edge tones) (Bruggeman et al. 2017, Sect. 1.1), see Sect. 4.6. Free of the assumption that edge tones are necessarily triggered by prosodic domains, we are free of the task of forcing a very general, unified characterization of the prosodic and/or syntactic environments where high edge tones appear, e.g., “high edge tones mark phonological phrase edges” or “high edge tones mark XP edges, which in turn are mapped to phonological phrase edges.”

Secondly, this paper points out that relations between syntactic and prosodic domains are not the whole of the syntax-phonology interface. A classic divide between theories of syntax-prosody mapping is between ‘direct reference’ and ‘indirect reference’ theories. The key point of contention is whether prosodic structure mediates the effect of syntactic structure on phonological processes; the key point of consensus is that category-specific information is not passed from syntax to phonology. But the question of how Samoan high edge tones fit in the syntax-phonology interface is not limited to deciding whether they can be accounted for by a ‘direct reference’ or ‘indirect reference’ theory. There are other aspects to the interface. As stated in Selkirk (2011:435): “Two further core aspects… [are] the phonological realization (spell-out) of the morphosyntactic feature bundles of morphemes and lexical items that form part of syntactic representation and the linearization of syntactic representation which produces the surface word order of the sentence as actually pronounced.”

My current analysis of syntactically determined high edge tones in Samoan situates them in these “further core aspects” of the syntax-phonology interface. The point that there are multiple places in the syntax-phonology interface where the high edge tones might be situated leads to the most general point I make in this paper: if there are different factors that underlie high edge tones in Samoan, then these need to be recognized to further understanding of the phenomenon. I show that a fine-grained analysis fits the current data. Syntactically determined high edge tones in Samoan are inserted in the spellout of specific, distinct configurations (absolutive case marking (Habs), coordination (Hcoord), and fronting (Hfront)). Crucially, these are not inserted by the phonological grammar as tones marking prosodic domains, and therefore, their presence and placement are completely determined by syntactic factors and not conditioned by prosodic factors.

There is also evidence of sentence-medial prosodic boundary tones in Samoan. Calhoun (2017) and Yu and Stabler (2017) uncovered that there are high (and low) edge tones that variably appear in variable syntactic environments, and that typically co-occur with pauses. I hypothesize that these tones, annotated here as H% and L%, come into the syntax-phonology interface not in spellout, but in marking prosodic constituent domains. Note that to keep the different kinds of high edge tones in my proposal straight, I don’t use ‘H-’ as a unified annotation for all high edge tones in this paper, unlike previous literature on Samoan prosody, although I retain ‘H-’ notation when citing previous literature. Instead, I use a bare ‘H’ as an annotation for a high edge tone, i.e., ‘Hs’ refers to high edge tones, with the syntactic configuration proposed to be conditioning the tone subscripted, if relevant, e.g., Habs. Prosodic boundary tones typically occurring with pauses are annotated with ‘%’ diacritics, which are commonly used in prosody to denote intonational phrase tones.

The rest of this paper is organized as follows: Sect. 2 presents background on the Samoan language and Samoan prosody; Sect. 3 describes the design and procedure in elicitations and data analysis. Then, Sect. 4 presents distributional data showing that an Habs  co-occurs with absolutive arguments in a variety of syntactic structures. These include transitive and intransitive sentence frames, with varying word orders, where absolutive arguments tested include singular and plural, specific and non-specific nominals, pronouns, and arguments internal to nominalizations. I also present evidence that the Habs, Hcoord, and Hfront are edge tones. Following this presentation of distributional data, I lay out and defend my proposal that Habs, Hcoord, and Hfront are inserted in the spellout of specific syntactic configurations in Sect. 5 and show that my proposal fits the current data. I then discuss alternative analyses of high edge tones in Samoan in Sect. 6 and show that they do not fit the current data as well as my analysis and conclude with Sect. 7. Supplementary materials for this paper can all be found at the OSF repository: https://osf.io/8cvg5/?view_only=e9be8cb15097493897b826f53487e345.

2 Language background

Samoan is an Austronesian language from the Independent State of Samoa and the (U.S.) Territory of American Samoa, with about 413,000 speakers in all countries (Lewis et al. 2014). It is in the Polynesian family in the Samoic-Outlier branch (Pawley 1966, 1967), which has a number of ergative-marking languages, including Samoan.

2.1 Segmental phonology and word stress

All Samoan examples in this paper are given using IPA symbols and appear in square brackets when in-line in the text. In-line in the text, I occasionally use Samoan orthography (always italicized), where [ŋ] is written as g, length as a macron, e.g., ā, and [ʔ] as ˋ.

The inventory of phonotactically licit syllable shapes in Samoan is limited to those in which every consonant is followed by a vowel: monomoraic [(C)V], and bimoraic [(C)Vː] and [(C)VV]. The basic footing pattern, as observed in monomorphemes, consists of a moraic trochee at the right edge of the word (Zuraw et al. 2014). Primary stress is on the final vowel if it’s long, and otherwise on the penultimate vowel. Further details on Samoan stress assignment are in Zuraw et al. (2014).

Primary stress is associated with a pitch accent, which is consistently phonetically realized with increased relative amplitude, longer duration, and a rise in fundamental frequency (f0). However, the presence of pitch accents associated with secondary stress is inconsistent. In this paper, I refer to both morae and syllables interchangeably; when I refer to syllables in the context of figures (where I often annotate syllables, e.g., ‘S1’ for first syllable), I always mean light syllables.

2.2 Case-marking and word order

Samoan has default VSO word order and marks ergative case on the subject of a verb-initial transitive sentence with the preposition [e], as exemplified in the transitive sentence in (2a). ‘Absolutive’ case on the direct object of a transitive sentence and the subject of an intransitive sentence has been said to be unmarked (Chung 1978:54–56; Ochs 1982:649; Collins 2014:94), but in (2), I indicate where the absolutive H appears. The intransitive sentence (2b)Footnote 3 also illustrates the prepositional element [i] as a marker of oblique case. This preposition marks stative agents (see Chung 1978:29), indirect objects, locatives, temporal expressions, sources, and goals (Mosel and Hovdhaugen 1992:144). Before pronouns and proper names, [jaː] rather than [i] marks oblique case. Figure 1 displays f0 contours for these sentences. The intonational annotations are explained in Sect. 2.3.1.

  1. (2)

    Case-marking in transitive and intransitive sentencesFootnote 4

    figure b
Fig. 1
figure 1

F0 contours in the basic declaratives from m01: transitive clause (2a) [na laˈlaŋa e le maˈlini le maˈmanu] ‘The marine wove the design’ and intransitive clause (2b) [na ŋaˈlue le maˈlini i le maˈmanu] ‘The marine worked on the design.’ Pitch accent rises (LH*) occur over primary stressed syllables. An Habs  occurs before the absolutive object in Fig. 1a and before the absolutive subject in Fig. 1b. As in all individual f0 contours shown in this paper, the dashed lines overlaying the f0 contour mark syllable divisions given in the second tier of the textual transcriptions at the bottom of the figures

Case-marking exponence in Samoan is affected by register and word order. Samoan is well-known for having two distinct registers: tautaula lelei ‘good language’—used in literary contexts and Westernized institutional contexts like in church and school, as well as with foreigners, and tautaula leaga ‘bad language’—used in traditional ceremonies and meetings, as well as between family members and between friends (Shore 1977, 1980; Duranti 1981:165–168; Ochs 1988:196; Duranti 1990:4–5; Mosel and Hovdhaugen 1992:7–11). One of the most striking contrasts between the two registers is in the segmental phonology: /t/ and /k/ → /k/ and /n/ and /ŋ/ → /ŋ/ from tautaula lelei to tautala leaga.

The segmental ergative case marker e is rarely used in tautala leaga (Mosel and Hovdhaugen 1992:9). Ochs (1982) found that the frequency of use of the ergative case marker e is quite variable across social contexts: in utterances with postverbal agents, in a corpus of adult Samoan speech, the presence of e ranged from 20% between family members to 75% in informal interactions between male non-family members and in discussion between titled men in formal village meetings (Ochs 1982: Table 1). Ochs (1982) also found substantial variability in word order choices in adult speech: 34.7% of the utterances were VSO order, 36.0% VOS order, 20.0% SVO order, and 9.3% OVS order (see Ochs 1982: Table 12).Footnote 5

In non verb-initial word orders, no ergative, absolutive or oblique case-marking occurs on fronted nominals, as shown in (3), but fronted nominals are preceded by [ʔo], which I gloss as topic. See Fig. 7 in Yu and Stabler (2017) for a representative f0 contour of (3a).

  1. (3)

    No case marking on fronted DPs in non verb-initial word orderFootnote 6

    figure c

2.3 Overview of intonational system

In this section, I build on the background presented thus far on Samoan word-level prosody and morphosyntax to introduce the most relevant aspects of the intonational system of Samoan for this paper (see Orfitelli and Yu 2009, Zuraw et al. 2014, Calhoun 2015, 2017, Yu and Stabler 2017, Howard 2018 for more on other aspects of Samoan intonation not covered here). First, I explicate the intonation of the basic declarative in (2) in Sect. 2.3.1. Then I highlight the aspect of the intonational system that is the focus of this paper: the existence of sentence-medial high edge tones (Sect. 2.3.2).

2.3.1 A first example

Figure 1 is a side by side comparison of the f0 contours and intonational transcriptions for representative utterances of the transitive and intransitive declaratives in (2). There are three different types of intonational events annotated. There is an ‘LH*,’ which is realized over the verb [laˈlaŋa] ‘weave’ in Fig. 1a and [ŋaˈlue] ‘work’ in Fig. 1b, as well as over [maˈlini] ‘marine’ and [maˈmanu] ‘design’ in both figures. There is also an ‘L%’ utterance-final fall that occurs at the end of both the declaratives. Finally, there is an Habs annotated at the right edge of [maˈlini] in Fig. 1a and [ŋaˈlue] in Fig. 1b. Habs is discussed in detail in the following section.

I use ‘LH*’Footnote 7 to annotate a rising pitch accent, where the ‘*’ is a diacritic from autosegmental-metrical theory (see Ladd 2008 for an overview) that indicates pitch accents, and ‘L’ stands for a low f0 target. LH* pitch accents are associated to the primary stressed, penultimate syllables of [laˈlaŋa], [ŋaˈlue], [maˈlini], and [maˈmanu] (see Sect. 2.1 for a description of stress assignment in Samoan).Footnote 8 The low target ‘L’ typically appears to be aligned to the beginning of the stressed mora. The high peak of the pitch accent is typically aligned at the right edge of the syllable it’s associated with, or in the syllable following. This phenomena of peak delay is observed cross-linguistically (Silverman and Pierrehumbert 1990; Xu 1999, 2001; Myers 2003), and can also be observed in many other f0 contours in this paper, e.g., Figs. 10b, c; 7a, b, c.

The ‘L%’ utterance final fall will not be of central importance for this paper.Footnote 9 I will, however, introduce and discuss sentence-medial falls to a low tone later in Sects. 5.2 and 6.3.1. I hold off introducing them until they can be put into context together with the sentence-medial high tones.

2.3.2 A first encounter with sentence-medial high edge tones

Looking at the f0 contours and their intonational transcriptions in Fig. 1, the reader may wonder how one could reliably and confidently transcribe Hs like Habs: the rises in the f0 contours for LH*s and Habs  look quite similar. This is an issue I address further in the discussion of methods of analysis in Sect. 3.3.1. For now, I’ll point out that while the rises for LH*s and Hs may look quite similar in Fig. 1, it turns out that they show systematic differences in phonetic realization. This is apparent if we zoom in on the f0 contour just over [malini] ‘marine’ in Fig. 1, as shown in Fig. 2. Let me emphasize here that the H at the right edge of [malini] in Fig. 2b is not triggered because [malini] is an absolutive argument. Rather, the H is triggered by the immediately following absolutive argument [le mamanu] ‘det design’. That is, the Habs  is realized at the right edge of the word preceding the absolutive argument. This adds to the puzzles raised by the proposal of an absolutive tonal case marker in Sect. 1. While any reasonable syntactic theory would group the absolutive case head with the following DP, the Habs  appears to be phrased to the left. I address this potential ‘boundary paradox’ between syntactic and prosodic constituency in Sect. 5.3.

Fig. 2
figure 2

Phonetic realization of a sentence-medial high edge tone (annotated H) at the right edge of [maˈlini] ‘marine’ in the f0 contour, from m01. This is demonstrated by contrasting: (a) when an H is absent at the right edge of intransitive subject malini in VS-PP sentence (2b), vs. (b) when an H is present at the right edge of transitive subject malini in the VSO sentence (2a). In both figures, malini receives an LH* pitch accent associated to the stressed penultimate mora

Looking at Fig. 2, if the immediately following tonal event after the pitch accent on malini ‘marine’ is another pitch accent, e.g., an LH* on mamanu ‘design’ in (2b), then the f0 contour over malini falls after the high f0 peak over the last syllable towards the low target (L) of this next pitch accent, as in Fig. 2a. If however, an Habs is present, then the f0 contour continues to rise over the last syllable of malini, e.g., in (2a), as in Fig. 2b. Throughout this paper, f0 contour data also shows that f0 often continues to stay high even into the following syllable in the presence of an H (e.g., the ledet’ following malini in (2a)).Footnote 10

It is this phonetic contrast in f0 contour shape that I use to diagnose sentence-medial Hs in Samoan. Three syntactic configurations have been found that reliably trigger Hs: fronting (i.e., non verb-initial word orders), coordination, and absolutive case (Orfitelli and Yu 2009; Yu 2011; Calhoun 2015, 2017; Yu and Stabler 2017). These configurations are all exemplified in the f0 track of a representative utterance of (4) shown in Fig. 3. The Hs in (4) are bolded for clarity. The position of these Hs in the f0 contour is indicated in Fig. 3 by the time-alignment of the H annotations to the f0 contour. For example, the annotation ‘H front’ is time-aligned to the f0 contour at the right edge of the ‘o-marked fronted argument [ʔo le malini] ‘top det marine’.

  1. (4)
    figure f
Fig. 3
figure 3

An f0 contour demonstrating Hs appearing in three different syntactic configurations within the same sentence (4): at the right edge of a fronted argument (Hfront), at the right edge of the first conjunct (Hcoord) in verbal and nominal coordinated structures, and at the right edge of the word preceding an absolutive argument (Habs), from m01. As exemplified for the Hcoord tones, f0 often continues to stay high even into the following syllable in the presence of an H (see also Sect. 2.3.2)

The three syntactic configurations that trigger Hs, as exemplified in (4) and Fig. 3, are summarized in (5). Where in a sentence each configuration triggers Hs never overlaps, e.g., an Habs could never occur where an Hcoord  or Hfront  could. Representative f0 contours illustrating Hs in fronting and coordination can be found in Yu and Stabler (2017: Sect. 4).

  1. (5)

    Syntactic configurations that trigger Hs, exemplified in (4) and Fig. 3

    1. a.

      Fronting. An Hfront occurs in non verb-initial sentences, e.g., SVO word order—in (4), at the right edge of the ‘o-marked fronted argument [ʔo le malini] ‘top det marine’ immediately preceding the predicate [na lalaŋa-ina …] ‘past weave-ina …’.

    2. b.

      Coordination. An Hcoord occurs at the right edge of the first conjunct in a coordinated structure, immediately preceding the conjunction [ma]. In (4), an Hcoord  occurs at the right edge of [lalaŋa-ina] ‘weave-ina’, the first conjunct in a VP-coordination, as well as at the right edge of [le mamanu] ‘det design’, the first conjunct in a nominal coordination.

    3. c.

      Absolutive. In (4), an Habs  occurs at the right edge of the verb [fufulu-ina] ‘wash-ina’, which immediately precedes the (coordinated, postverbal) absolutive argument [le mamanu ma le ato] ‘det design conj det basket’.

Besides using Fig. 3 to introduce syntactic triggers of Hs in Samoan, I have also chosen it to demonstrate some challenges of intonational transcription and illustrate how transcription is analysis, not data. The reader may notice that the f0 contour is actually falling in the last mora of [malini], [ni], at the right edge of the fronted argument: this does not look like the f0 contour shape over [malini] in the presence of an H shown in Fig. 2b; it looks more like the f0 contour shape when the H is absent, in Fig. 2a. Why then, am I transcribing an H? One might have the same worry for the ‘H abs’ transcribed at the right edge of [fufulu-ina] ‘wash-ina’ in Fig. 3: there’s a sharp dip in f0 in the last mora [na] right before the absolutive argument [le mamanu ma le ato] ‘det design conj det basket’. It is perhaps not very satisfying for the reader if I say that I am nevertheless confident that there are Hs in these locations, based on experience from listening to and prosodically analyzing thousands of Samoan utterances: these sound like Hs to me. I can make an attempt to reconstruct from the acoustic signal why I perceive Hs here. One can see that the topline (the line connecting the peaks in the f0 contour) stays high at these points (cf. the downtrend in f0 in Fig. 1), and also the f0 shoots very high and although it doesn’t stay high over the last mora [ni] in [malini] and [na] in [fufulu-ina], it falls only towards the very end of it, not over the whole mora.Footnote 11 At the right edge of [fufulu-ina], there also seems to be some segmental perturbation of the f0 contour due to properties of the articulation of the lateral; in this particular utterance of le immediately following [fufulu-ina], the lateral in le is pronounced as an alveolar lateral tap or flap, [] (an allophonic variant of /l/ I’ve observed in Samoan speakers), which could be contributing to the f0 dip seen in the realization of the Habs. These kinds of observations and tricky (sometimes even ineffable) judgment calls underlie every single H (and any other tonal event) transcribed, and of course different transcribers might make different judgments. The way I set up phonetic analyses in this paper to support my claims about the distribution of Hs is designed to circumvent these transcriptional issues as much as possible, as I explain in Sect. 3.3.1.

3 Materials and methods

All data referred to in this paper were elicited and recorded from my consultants’ speech. Information about the consultants is given in Sect. 3.1. Information about elicitation procedures is provided in Sect. 3.2, and the methods used for phonetic and phonological analysis of the data are explicated in Sect. 3.3. All data and analyses can be found at the OSF repository: https://osf.io/8cvg5/?view_only=e9be8cb15097493897b826f53487e345. The repository is structured with the same organization as the paper. For each data set discussed, the repository includes the full stimulus set, recordings, annotated TextGrid files, f0 estimations extracted, and R code to quantitatively analyze the f0 contours and produce the figures.

3.1 Consultants

Data were collected in the Los Angeles area in one- to two-hour sessions from September 2007 to December 2014 and July 2016 with one main consultant (m01), aged 19 when I started working with him, who was born and raised in Upolu and had moved to the Los Angeles area four years previously. Data were also elicited and recorded from four consultants in Apia, Samoa in November 2011 (s13, s18, s19, s20), and an additional woman in her 50s in the Los Angeles area in January 2012 (s22). The additional consultant in Los Angeles had been in the United States for 27 years, but regularly spent an extended part of the year in Samoa. The consultants in Samoa included three men, aged 21 to 23 (s18-s20), and one woman aged 46 (s13) from the capital city of Apia and other areas of Upolu.Footnote 12 Data were also elicited and recorded in Auckland, New Zealand in July 2015 from two additional women. One (f03) was 48 and had grown up in Apia and moved to New Zealand from there in 2009; the other (f05) was aged 19 and had grown up in Savai’i and been in New Zealand since age 10. All of them spoke primarily Samoan in daily life and were literate in Samoan, but also spoke English as a second language with some fluency. English was used as the contact language. For one consultant, s22, recordings were made both in the style of how one would speak in church (s22c) as well as to a sibling (s22s); since no detectable differences in prosodic patterns occurred between these two styles, the data from the two styles for this consultant were combined.

3.2 Elicitation procedures

Procedures for elicitations with the primary consultant are described below in Sect. 3.2.1, and procedures for the other consultants are described in Sect. 3.2.2. Technical details of recording are in Sect. 3.2.3.

3.2.1 Primary consultant

Elicitation sessions with the primary consultant m01 and also f03 in Auckland involved either (i) developing and/or checking words and sentences to be recorded, or (ii) recording sessions. In sessions involving the development of stimuli, the consultant was asked to help construct Samoan sentences either from some starting scenario or from an English sentence, to judge whether Samoan sentences from the literature or constructed by the author were licit, and to provide alternative ways to construct sentences, if any. During recording sessions, elicitation items were presented individually written on slides on a computer screen, and they were elicited in randomized order. Different constructions were included in each elicitation session, so that one construction served as a filler for another construction; this prevented minimally different sentences from being presented adjacently. The consultant was asked to read each sentence twice. All data in this paper from the primary consultant was elicited in tautala lelei. No systematic discourse context was provided for recording sessions: sentences were elicited “out-of-the-blue” unless pronouns or pro-drop was present, in which case a context was provided with a referent.Footnote 13

3.2.2 Other consultants

Since there was only a limited time to work with the other consultants, the elicitation procedure was necessarily different than for the primary consultant: these other consultants weren’t used to the fieldwork elicitation context. The stimuli consisted of mostly sonorant sounds, sometimes at the expense of the plausibility of the sentences. Thus, recording sessions were preceded by an explanation that some of the sentences might be strange—like something out of a fairy tale—e.g., stories about different animals living together in a house. Consultants were also given the opportunity to skim through the sentences prior to recording for familiarization. The consultants were also told to flag any sentences that they thought didn’t sound like Samoan, but like a foreigner trying to speak Samoan. Finally, consultants were asked to speak as if they were speaking to a friend to avoid heavily phrased, dictation-style reading (see Sect. 3.1 and OSF repository). One speaker responded to this instruction by speaking with segmental characteristics of tautala leaga, with [t] → [k] and [n] → [ŋ]. If the consultant flagged a sentence, then the consultant was reminded that some sentences might make sense only in a fairy tale, and sometimes a richer background context for the sentence was explicated. If the consultant still found the sentence problematic, then they were asked to repair it, and a note was made that the sentence wasn’t licit for the consultant. This happened with verb-initial, non-VSO word order for one consultant in particular, who repaired the sentences by putting them in VSO word order. The consultants’ understanding of the sentence meaning was also often checked as sentences were recorded, especially for more complex sentences. Otherwise, elicitation sessions were the same as for the primary consultant, i.e., different constructions serving as fillers for one another presented in randomized order, two fluent repetitions elicited per stimulus, etc.

3.2.3 Recordings

Recordings made in Samoa and Los Angeles before 2015 were made directly to a computer through a head-mounted microphone (Shure SM10A), whose signal ran through a Shure X2u pre-amplifier and A-D device; recordings in Auckland and Los Angeles after 2015 were made to a Marantz PMD661 MKII. Recordings were made at a sampling rate of 22,050 Hz with 16-bit precision. Recording sessions in Los Angeles were made in either a sound-attenuated booth or a quiet room, and recordings in Auckland were made in a quiet room. Recordings in Apia, Samoa were also made in a quiet room insofar as possible; sometimes sudden torrential downpours produced substantial background noise.

3.3 Analysis

3.3.1 Minimal comparisons as a strategy for diagnosing Hs

While I did intonational transcriptions for the data, my main strategy for diagnosing Hs in f0 contours was to perform phonetic comparisons of f0 contours within minimal sets (Yu 2014), a classic methodological strategy exemplified by Bruce’s (1977) foundational study of Swedish intonation (see also Yu and Stabler 2017 and Fig. 3 in Clemens and Coon 2016 for additional example of comparisons of this type). A minimal set was designed, as much as possible, to systematically vary only in a single factor of interest, while holding other factors constant. Hs were not diagnosed on an utterance-by-utterance basis, analyzing each utterance in isolation. An example of minimal comparison is how I explicated diagnosing the H in the f0 contour in Fig. 2b only in comparison with the f0 contour in Fig. 2a. Note that while I designed elicitations using minimal sets, I did not elicit minimal sets—as stated in Sect. 3.2, sentences were presented in randomized order in elicitation sessions.

An advantage of using minimal comparisons to diagnose Hs is that the comparisons help control for allophonic variation in the realization of Hs. As an example, one phonetic factor conditioning allophonic variation is tonal crowding, which occurs when there is close spacing between neighboring tonal events (Bruce 1977; Pierrehumbert 1980; Gordon 2000; Arvaniti et al. 2006; Gordon 2014, et seq.). In some cases, tonal crowding can even result in the neutralization of tonal distinctions which would be present if there were more segmental material available between the crowded tones (Pierrehumbert 1980:112-113). With minimal comparisons, even for sentences with substantial tonal crowding around the site of the H, one still might be able to diagnose an Habs  if there is a distinct contrast in f0 contour shape between different sentences where case is systematically varied. However, in the examination of the f0 contour for just a single utterance at a time, judging the presence or absence of an H might be quite difficult and subjective, as demonstrated in the discussion of Fig. 3 in Sect. 2.3.2.

In addition, this approach comparing f0 contours is advantageous because it stays close to the raw phonetic data, and all the choices made in processing the f0 data are transparent and reproducible if the code written for the analysis is released. In contrast, transcriptional analysis is well-known to vary between transcribers, as measured in studies of intertranscriber reliability (Ostendorf et al. 1995; Gut and Bayerl 2004; Yoon et al. 2004; Cole et al. 2010; Breen et al. 2012). The comparative approach I take here precludes the transcriber from imposing any subjective biases in transcription, and it releases the transcriber from making difficult judgment calls for transcriptional labels.

But this comparative, phonetic approach is only possible when enough is known about the basic atoms of the intonational system and what conditions them so that the researcher can design structured, targeted elicitations to home in on how some particular factor conditions these basic atoms. And initial discovery of these basic atoms is facilitated by the challenge of labeling them in transcription. That is to say, the phonetic, comparative approach taken here doesn’t replace intonational transcription, but complements it and relies on insights from it. In this paper, I focus on analyzing f0 data local to sites in the sentence where I am testing for potential Hs. The transcriptional analysis in Calhoun (2015, 2017) complements this local, phonetic approach by analyzing whole f0 contours over utterances with intonational transcription to study non-local intonational patterns.

3.3.2 Data processing and analysis

All sound files were segmented and annotated using Praat (Boersma and Weenink 2012). Utterances were segmented by word and syllable and transcribed intonationally by the author. F0 extraction was performed using Praat’s autocorrelation algorithm, as implemented in VoiceSauce v1.19 (Shue et al. 2011), software for automatic voice quality analysis, with the floor and ceiling values for candidate f0 set to 40 Hz and 300 Hz, respectively, and default settings for other parameters. For the f0 contours plotted throughout the paper, f0 values were averaged over each of 10 time slices uniformly dividing each syllable for each utterance throughout the paper, e.g., the first f0 value was the average f0 over the first tenth of the syllable. Converting the time scale from absolute time in seconds to time in syllables allowed trends in the shape of f0 contours to be captured without the noise introduced by variable speech rates.

All further data processing and analysis was performed in R (R Core Team 2014). For data sets from multiple speakers, f0 values were z-score normalized so that values would be comparable across speakers with different f0 ranges. The result of this normalization is to have scaled f0 values within a speaker such that the speaker’s f0 values have a mean of 0 and standard deviation of 1. Trends in f0 contour shape were visualized by plotting computed means and standard error for f0 data aggregated across sentences and/or across speakers and computing means and standard error. As a representative example: in Fig. 4, the thick solid black lines show mean f0 contours computed over VOS sentences, and the gray ribbons flanking the lines visualize variability in the f0 contours over these sentences by showing ±1SE (1 standard error). The wider the gray ribbons are over some time span, the greater the variability in f0 values about the mean in that time span.

Fig. 4
figure 4

Comparison of m01’s mean f0 contours (37 tokens) for words in transitive sentences for VSO order (ergative subject first, (6a)) vs. VOS order (absolutive object first, (6b)). The large jumps in the f0 contour over the labeled penultimate syllables are due to segmental perturbations from obstruents

All plots were created using the ggplot2 package (Wickham 2009). In every plot showing f0 contours over a string that includes segmental case markers, these case markers are included as part of the final syllable of the preceding word. The rationale for this is that these monomoraic, vocalic case markers were very difficult to segment from the preceding vowel. (One might then worry that Habsis some phenomenon that is a side effect of including segmental markers as part of the preceding word in plots of f0 contours. But Yu and Stabler (2017: Sect. 2.4) analyzed data from utterances elicited in tautala leaga with case markers dropped and found that the Habswas still present.)

4 Evidence for the absolutive H

This section presents distributional evidence that a high edge tone co-occurs with absolutive arguments. Section 4.1 shows that in verb-initial sentences, the high tone reliably occurs before the object in transitive sentences, and Sect. 4.2 shows that it reliably occurs before the subject in intransitive sentences. Section 4.3 shows that this distribution of the high tone also holds for a range of nominal phrases: specific or non-specific, common or proper, pronominals, and arguments within nominalizations. Section 4.4 shows that an H always occurs before the subject in pseudo noun incorporation; however, no H appears before the pseudo-incorporated object. Section 4.5 shows evidence that the presence of the Habs  is insensitive to word order in ditransitives and under systematic manipulations of discourse context. Finally, I close the section by presenting evidence that the Habs, Hcoord, and Hfront are edge tones rather than pitch accents (Sect. 4.6).

4.1 Transitive sentences

In transitive sentences, an Habs  reliably precedes the absolutive argument, whether word order is VSO or VOS. I present evidence for this distribution from manipulating word order (VSO, VOS) in a set of transitive sentences exemplified in (6a), shown with VSO order, recorded with m01. An example of VOS order for (6a) is given in (6b). In VSO order, the first argument takes ergative case; in VOS order, it takes absolutive case.

  1. (6)
    figure h

One other factor I varied was whether or not the “transitive” -Cia suffix form -ina was present on the verb. Cook (1999) states that this suffix may be present if word order in a transitive sentence is inverted, with the absolutive object first, while Chung (1978:55) states that VOS and VSO order are about equally common in frequency when the -Cia suffix is present. My primary consultant was happy to suffix a transitive verb with -ina regardless of the word order in the transitive sentence. The purpose of including the -ina suffix was to add additional (sonorant) segmental material before the first argument and thus make the phonetic contrast between the presence and absence of an Habs  there easier to discern.

Figure 4 summarizes the effect of word order on the f0 contour over the verb and the first argument for the sentences in (6a).Footnote 14 These f0 contour data show that an H always appears before the absolutive object and never before the ergative subject—regardless of word order, and regardless of whether or not -ina is present.Footnote 15 Fig. 4a and 4b show the contrast in f0 contours over the verb induced by the case of the first argument. Figure 4a shows the f0 contour over the last two syllables in the (unsuffixed) verb and ergative case marker if present, e.g., tala (e) for tatala ‘open’, and the determiner le in the first argument. Figure 4b shows the f0 contour over the stem-final vowel of the verb and the -ina suffix (and the ergative case marker, if present), e.g., a-ina (e) for tatala-ina, and the determiner le in the first argument. Whether or not the verb stem was suffixed with -ina, f0 over the final syllable of the verb was 20-30 Hz higher if the case of the first argument was absolutive (VOS order). This f0 difference persisted into the determiner le in the first argument following the verb. Figure 4c shows how the case of the second argument affects the f0 contours over the last two syllables of the first argument and the determiner in the second argument, e.g., tama le ‘boy det’ for (6a). F0 on the ultima of the 1st argument was also about 20 Hz higher when the case of the second argument was absolutive (VSO) rather than ergative (VOS); this f0 difference persisted into the determiner le of the second argument as well.

4.2 Intransitive sentences

In intransitive sentences, an H reliably precedes the absolutive subject, as I demonstrate below with data from m01. Section 4.1 already demonstrated that an H does not occur between the verb and an immediately following ergative subject in a transitive sentence. Thus, the f0 contour over the verb in a VSO transitive sentence can serve as a baseline for how the f0 contour looks without an H present, compared to when the H is present in an intransitive sentence (7).

  1. (7)

    Is there an H between the verb and the immediately following argument?

    1. a.

      Transitive baseline: Verb [erg Subject] [H Object]

    2. b.

      Intransitive  : Verb [ Subject] [obl DP]

I compared the f0 contours on the verb between intransitive sentences like those in (8) and their nearly string-identical transitive baseline counterparts. These transitive counterparts replaced the intransitive verb [manoŋi] ‘to be smelly/fragrant’ with the transitive verb [laŋona] ‘to hear’, changed the absolutive subject to an ergative subject, and changed the oblique object to an absolutive object, e.g., (9) is the transitive counterpart to (8).

  1. (8)
    figure j
  2. (9)
    figure k

Figure 5 shows a clear difference between the mean f0 contour over transitive verb [laŋona] ‘hear’ and the mean f0 contour over intransitive verb [manoŋi] ‘smelly/fragrant’. The f0 contour rises over the stressed second syllable (labeled S2) of both verbs. However, the f0 contour over [laŋona] drops in the third syllable (labeled S3), while the f0 contour over [manoŋi] continues to rise and stay high. Thus, Fig. 5 shows that, unlike verbs before ergative subjects, verbs before absolutive subjects have an H realized over the last syllable. In Sect. 4.5.1, I show that Habs shows up on oblique PPs before absolutive arguments, too.

Fig. 5
figure 5

A comparison of mean f0 contours over the verb for intransitive sentences from m01 (54 tokens), like [manoŋi] ‘smelly’ in (8) vs. their transitive counterparts, like [laŋona] ‘hear’ in (9). When the subject immediately following the verb is absolutive, the f0 contour rises in the 3rd syllable ‘S3’ to the absolutive high. When the subject immediately following the verb is ergative, the f0 contour falls in the 3rd syllable ‘S3’

4.3 Other types of nominal phrases

Thus far, I have only presented distributional data for the Habs with specific and common nominal phrases that are singular or plural, such as le manudet.spec.sg bird’ or manudet.spec.pl birds’ (det.spec.pl is ∅). If the H under discussion really is marking absolutive case, then it should appear under all different kinds of absolutive nominal phrases. Not showing that the Habs  is insensitive to nominal phrase types leaves open the possibility that the H is marking something more restricted than absolutives. As a case in point, Niuean, a Polynesian language related to Samoan, case-marks different types of nominal phrases differently (Massam 2001:156, (2)):

  1. (10)

    Niuean case marking (Massam 2001:156, (2))

     

    erg

    abs

    Proper/pronoun

    e

    a

    Common

    he

    e

In this section, I provide data on the distribution of the Habs  in a variety of nominal phrases from Mosel and Hovdhaugen (1992: Ch. 6). I show that the presence of the Habs  is insensitive to whether the nominal phrase is specific or non-specific or proper or common (Sect. 4.3.1), pronominal or non-pronominal, (Sect. 4.3.2), or internal to nominalizations (Sect. 4.3.3).

4.3.1 Specificity

I found that the absolutive high appears before both specific (11, 12) and nonspecific nominals (13, 14) in data from f03 and f05. I established the presence of the Habs  by comparing pitch tracks between sentences where I systematically varied specificity of objects in transitives and PP objects in intransitives. The reason I compared absolutive objects to PP objects is so that I would have controlled prosodic position in the sentence for the minimal comparisons of f0 contours (Sect. 3.3.1). The data set was recorded from two consultants in Auckland, who were provided a context for each sentence (shown below; contexts for the intransitive sentences are analogous and given in the OSF repository). In the examples given below, (11-14), the object is always underlined and Habs tones preceding the object are bolded. The presence of an Habsbefore proper names in the intransitive sentences also shows that Habs tones occur before proper as well as common nouns.

  1. (11)

    Specific, singular le

    figure m
  2. (12)

    Specific, plural ∅

    figure n
  3. (13)

    Nonspecific, singular se

    figure o
  4. (14)

    Nonspecific, plural ni

    figure p

4.3.2 Pronouns

In this section, I show that postverbal pronouns (which are free-standing) that are absolutive must be preceded by an Habs, e.g, (15c). In addition, I show that a postverbal pronoun can host an Habs marking an immediately following absolutive argument, e.g., (15a).Footnote 16

From s13, s18, s19, s20, m01, and s22c/22s, I elicited simple VSO and VOS declaratives with the pronominal form as the first argument (15) or the second, and with malini ‘the marines’ as the other argument. For both pronominal and non-pronominal DPs, I varied the case over all three possibilities for subject, direct object, and indirect object—ergative, absolutive, or oblique—resulting in 12 configurations: 3 cases (erg, abs, obl) × 2 arguments (, malini) × 2 orders (VSO, VOS). (While the verb momoli ‘to take, deliver, drop off’ is ditransitive and some of the sentences may involve pro drop, I show later in Sect. 5.1.1 that pro drop has no special effects on the distribution of Habs  tones: only overt arguments affect the presence of Habs tones.) I used a ditransitive verb to be able to construct sentences contrasting ergative, absolutive, and oblique case for a given argument in a single, controlled data set. A scenario was introduced for each sentence to give a referent for pro drop, e.g. the scenario that ‘we two delivered the fish to the marines’ for eliciting ‘We two delivered (pro) to the marines.’

  1. (15)

    Examples: pronoun as first (overt) argument

    figure q

The elicitation of the described set of sentences resulted in data from 6 consultants in total. Only the subset of VSO sentences was included from the consultant who rejected VOS word order. One consultant produced many fluent utterances including prosodic junctures with silence, i.e., H% and L% tones. Since the presence of these larger prosodic junctures obscure the presence of Habs  tones, the consultant was asked to repeat the sentences at a faster speech rate when this occurred (a faster speech rate did not result in Habs  tones disappearing, consistent with data from manipulating speech rate in Yu and Stabler 2017). For the other 4 consultants, no more than a handful of items were discarded due to speech errors or obvious prosodic junctures, see (50) for details.

Figure 6 illustrates the effect of case on mean z-score normalized f0 contours over the verb and pronominal argument in the elicited set of sentences, including sentences in (15), and shows that the presence of an H occurred only before (overt) absolutive pronouns. Figure 6a shows that an Habs  occurred at the right edge of the verb [momoli] ‘deliver’ when it was immediately followed by absolutive pronoun [] ‘1.du.exc’, but not an ergative or oblique one. Note that the final rise in the f0 contour before absolutive [] cannot be attributed to a pitch accent due to secondary stress on []. If there were a pitch accent on [], then we’d expect to see f0 rises into [], regardless of case. Figure 6b shows that this H persisted into the first syllable of the absolutive-case marked pronoun [] ‘1.du.exc’. Figure 6c shows that an H only occurred at the right edge of [] when it was immediately followed by an absolutive argument. Note that the f0 rise to the Habs  clearly occurred later than the f0 rise due to the pitch accent on the stressed penultimate mora [u] in []. Figure 6d shows that the high f0 from this Habs  persisted from the first argument into the first syllable of the second argument, absolutive []. All together, Figs. 6a, b, and d show that absolutive postverbal pronouns are preceded by an H; non-absolutive postverbal pronouns are not. Figure 6c shows that the postverbal pronoun itself can also bear an Habs when it precedes an absolutive argument.

Fig. 6
figure 6

Mean z-score standardized f0 contours for [momoli] ‘deliver’ and [] ‘1.du.exc’ in sentences with postverbal pronouns, e.g., (15), from s13, s18, s19, s20, m01, and s22c/22s (6 speakers, 94 tokens for (a)-(c), 89 tokens for (d)). The large dips in the f0 contour at the boundary between the first and second syllables in the pronoun [] (b,c,d) are due to the glottal stop, which was typically realized as some laryngealization rather than a full glottal stop. In Figs. 6b, c, and d, [] is partitioned into intervals as [maː] in ‘S1’ and [ua] in ‘S2’

4.3.3 Case internal to nominalizations

This section shows using data from m01 that the distribution of segmental case marking and Hs for arguments internal to nominalizations is also consistent with the existence of an Habs. Examples of nominalizations are given in (16) and (17).Footnote 17 Bracketed syntactic schema are given, where [N V] stands for the nominalized verb. The agent in a nominalized transitive predicate may either maintain ergative marking (16c) or be marked with the alienable genitive a (16a) (Mosel and Hovdhaugen 1992:545). The patient in a nominalized transitive predicate may either be marked with the inalienable genitive marker o (16b) or (appear to be) unmarked (16c) (Mosel and Hovdhaugen 1992:546; Collins 2014).

Figure 7a shows the mean f0 contour over the final three syllables of the word preceding: (i) the a-marked agent, e.g., mamanu ‘design’ preceding a malinigen marine’ in (16a), (ii) the o-marked patient, e.g., momoli-ina ‘deliver-ina’ preceding o le malalagen det charcoal’ in (16b), or (iii) the unmarked patient, e.g., liona ‘lion’ preceding le maninidet fish’ in (16c); this word is annotated in the figure as “verb” for short. Figure 7b shows the mean f0 contours over the unmarked or o-marked patient, or the a-marked agent, not including the determiner le, if present in the sentence. Together, the figures show that when an argument internal to a nominalization is a- or o-marked, e.g., malini ‘marine’ in (16a) or mamanu ‘design’ in (17), it isn’t preceded by an H; however, if the argument is not preceded by a segmental case marker, e.g., le maninidet fish’ in (16c), it is preceded by an H.

Fig. 7
figure 7

Mean f0 contours from m01 (56 tokens) for nominalizations in (16) and in (17). Figures 7a, b show that a H precedes the argument internal to the nominalization only when it’s segmentally unmarked (absolutive) and not if it’s an a-marked agent or o-marked patient

In Fig. 7a, the contrast between mean f0 contours may be hard to discern at first glance. In the last syllable, the dotted o-poss f0 contour is as high as the solid absolutive one, and the absolutive f0 contour also falls slightly at the end. However, a closer look shows that: (i) the o-poss f0 contour peak in syllable 3 is relatively lower than the absolutive one, since the o-poss f0 contour starts close to 10 Hz higher than the absolutive one in syllable 2, and (ii) the fall over syllable 3 for the a- and o-poss contours is clearly sharper than for the absolutive, and Fig. 7b shows that the absolutive high at the right edge of the word in Fig. 7b is maintained into the absolutive patient in the nominalization, while the f0 contours over the a- and o-marked arguments are clearly lower. This is an example of where minimal comparisons between f0 contours is important (Sect. 3.3.1).

  1. (16)

    Arguments internal to absolutive nominalizations

    figure t
  1. (17)

    Example of argument internal to oblique nominalization, V Habs  Agt abs [obl det [N V] Objgen ] PP

    figure u

In summary, within a nominalization, arguments that are segmentally case-marked as genitive are not preceded by an H, but arguments that are not segmentally case-marked, i.e., absolutive arguments, do.

4.4 Pseudo noun incorporation

I complete the description of the distribution of the Habs with pseudo noun incorporation (PNI) (Massam 2001). I show an example of [VO]S/VSO alternation in (18). With PNI (18a), the order is Verb-Object-Adverb-Subject, cf. default Verb-Adverb-Subject-Object transitive order in (18b). The placement of the adverb shows whether a construction is PNI or not (Collins 2014, 2016). In addition, PNI objects must be non-specific. Also, the agent Manogi is unmarked (segmentally) in PNI (18a), but marked with ergative case in (18b). Using minimal comparisons between non-PNI and PNI sentences with m01, I found that the Habs  always appears before postverbal subjects in PNI constructions, and never before the pseudo-incorporated object. I used two sets of minimal comparisons: (i) one set contrasting the members of the [VO]S / VSO alternation, as shown in Fig. 8, and (ii) one set contrasting [VO]S / VOS, as shown in Fig. 9. The reason I include both is because, while [VO]S / VOS provides a better minimal comparison in some ways, VOS acceptability has been highly variable among my consultants and also consultants in Calhoun (2015, 2017). (My primary consultant, whose data is used in this section, is happy to produce VOS word order under a wide variety of contexts.) Thus, I also include comparisons within the [VO]S / VSO alternation.

  1. (18)

    PNI [VO]S / VSO alternation example, with adverb placement diagnosing PNI

    figure v
Fig. 8
figure 8

The effect of pseudo noun incorporation on the presence of the H at the right edge of the verb and the object in sentences from m01, 26 tokens for (a), 34 tokens for (b). The comparison is between [V Opni] Sabs PP (PNI), e.g., (18a) and V Serg Oabs PP (non-PNI) e.g., (18b) plus [Spronoun V Oabs] PP (non-PNI object, pre-verbal pronominal subject)

Fig. 9
figure 9

The effect of (pseudo) noun incorporation on the presence of the H at the right edge of the word preceding the object and the word preceding the subject in the sentences in (19), from m01 (58 tokens). The comparison is between the PNI construction [V Opni] (Adv) H Sabs, e.g., (19a), and the non-PNI construction V (Adv) H Oabs Serg, e.g., (19b)

For the [VO]S / VSO alternation comparison, I elicited four minimal sets of sentences like (18) with and without PNI from my primary consultant, m01.Footnote 18 I did not include adverbs in the sentences, but elicited the sentences in contexts where object specificity was clear, and checked the contexts with adverbial placement.

Mean f0 contours over the verb (e.g., fufulu ‘wash’ in (18))Footnote 19 and the last word in the object (e.g. [leaŋa] ‘bad’ in (18)) are shown in Fig. 8. Figure 8a shows that the f0 contours over the verb in PNI and non-PNI constructions with postverbal subjects both do not rise up to a high peak. In these constructions, the verb is immediately followed by either the PNI object or ergative subject. However, an H does appear at the end of the verb when it is followed by a non-PNI, absolutive object in constructions with preverbal pronominal subjects. This contrasts suggests that the PNI object is not absolutive. Figure 8b is consistent with Fig. 8a in demonstrating an absence of an H before the PNI object: as the f0 contour continues through the first syllable of the object immediately after the verb, it continues to drop to the L valley of the pitch accent on S2 in the PNI and non-PNI object. However, Fig. 8b shows that an H appears at the end of the PNI object immediately followed by a subject, while no H appears at the end of an object immediately followed by an oblique PP. This contrast suggests that the subject in PNI constructions is absolutive.

For the [VO]S / VOS comparison, I elicited a set of sentences varying contexts for specificity of the object in “design-weaving” from m01, as shown in (19) and Fig. 9. While the object in VOS could take specific and non-specific singular and plural determiners, the PNI object in [V O] S could not take any determiners. With a specific plural object, VOS word order can be string-identical to [VO]S word order. My primary consultant describes a meaning contrast between VOS and [VO]S, as the [VO]S being a context where the marine’s job is to weave designs, whereas the VOS word order could refer to a single event of weaving the designs.

  1. (19)

    [VO]S / VOS alternation comparison

    figure z

Mean f0 contours over the word preceding the object (the verb [lalaŋa] ‘weave’ in (19a) and the verb [lalaŋa] or adverb [leaŋa] ‘badly’ in (19b)) and the word preceding the subject (the object [mamanu] ‘design’ or the adverb [leaŋa] ‘badly’ in (19a) and the object [mamanu] ‘design’ in (19b)) are given in Fig. 9. Figure 9a shows that an H appears before the absolutive object in VOS, but not before the PNI object in [V O] S. Figure 9b shows than an H appears before the subject in [V O] S, but not before the ergative subject in VOS. Figure 9b also shows the H on the absolutive object in VOS, at the left edge in syllable 1 (S1).

In summary, in PNI constructions, an H appears before postverbal subjects. However, no H appears before the pseudo-incorporated object—whether the subject is pronominal, or whether the subject is postverbal or preverbal.

4.5 Other word orders

In this section, I present evidence that the presence of the Habs is insensitive to argument order in ditransitives (Sect. 4.5.1) and changes in word order in different discourse contexts (Sect. 4.5.2).

4.5.1 The presence of segmental case markers and Habs is insensitive to argument order in ditransitives

There is no reason to expect segmental case markers to fail to surface in ditransitives. Suppose, though, that the Habs marked the right edge of a constituent preceding the absolutive under some syntax-prosody mapping. Then the presence of the Habs might depend on argument order in the ditransitive, since syntactic constituency would certainly be sensitive to argument order. But in fact, the presence of the Habs, as well as the presence of the segmental case markers, is insensitive to argument order in ditransitives. I show this using f0 data from a set of ditransitive sentences derived from (20), where I permuted the location of the case markers in all 3! = 6 ways, producing the argument orders schematized in Table 1. For instance, the first column in the table shows word orders where the absolutive object occurs first, i.e., na momoli le liona e le nunua i le toloa (abs erg obl) and na momoli le liona i le nunua e le toloa (abs obl erg), and the first row in the second column has the order given in (20), (erg abs obl). The consultants that provided the data were: s13, s18, s19, s20, m01, and s22c/s.Footnote 20 I found that an H occurred immediately preceding the absolutive argument, regardless of the word order, as indicated in Table 1.

  1. (20)
    figure aa
Table 1 Permutations of word order among arguments in a ditransitive sentence, grouped by the location of the absolutive argument

Figure 10 shows the mean f0 contours over the verb and the first two arguments in the ditransitive sentence. (The third argument is shown in the OSF repository.) Each plot shows three mean f0 contours: one for each position of the absolutive argument—the 1st argument, 2nd, or 3rd. In each plot, the only f0 contour shape that shows a final rise is the f0 contour when the absolutive argument immediately follows the word shown in the plot. For instance, the only f0 contour shape with a word-final rise over the verb momoli ‘deliver’ (Fig. 10a), occurs when the first argument liona ‘lion’ is absolutive. Figures 10b, c also show that high f0 from the Habs  carries over into the beginning of the absolutive argument. For example, the f0 contour over the 2nd argument nunua “dolphin” (Fig. 10c) begins almost 20 Hz on average higher when it is absolutive compared to when it is ergative or oblique.

Fig. 10
figure 10

Mean z-score standardized f0 contours for the verb momoli ‘deliver’ and the first two arguments liona ‘lion’ and nunua “dolphin” in the ditransitive sentence set based on permuting the location of the case markers in (20). Data from s13, s18, s19, m01, and s22c/s (5 speakers, 71 tokens)

In summary, an Habs  appears before the absolutive direct object in verb-initial ditransitive sentences, regardless of how the arguments in the sentence are ordered. Likewise, the ergative e appears before the ergative subject, and oblique i before the indirect object in ditransitives, irrespective of argument order. In this sense, the distribution of the Habs  patterns like that of the segmental case markers. But I treat these ditransitive results with caution, since discourse context was not explicitly controlled. Nevertheless, the distributional patterns are consistent with Habs  behaving as an absolutive tonal case marker.

4.5.2 Habs is not sensitive to discourse context

In the corpus of data for this paper, elicitations were typically done under out-of-the-blue focus, with the exception of specifying referents for sentences with pronouns and pro-drop. This raises the potential concern that Habs might actually be marking some systematic information structural property not identical with absolutive case. (If there is some informational structural property that exactly coincides with absolutive case marking in syntax, we would never be able to distinguish them.) It has been observed cross-linguistically, independent of case-alignment, that new information preferentially appears in the S (subject) or O (object) roles, but not in the A (agent) role, e.g., DuBois (1987). Thus, if no context is given to a speaker in sentences elicited ‘out-of-the-blue’, it’s possible that speakers could be creating a context in their head, and opting for one that aligns with typical frequencies, i.e., where the absolutive argument happens to also to be introducing new information. However, the tautala lelei and tautala leaga data sets in Yu and Stabler (2017) provide evidence from systematic manipulation of discourse contexts that suggests that Habs is not in fact marking new information or information under focus or given material. Four sets of question-answer pairs manipulating discourse conditions were elicited: two with transitive verbs ([lalaŋa] ‘weave’, taking an inanimate object; [laŋona] ‘hear’, taking an animate object), and two with intransitive verbs ([malaŋa] ‘journey’, taking an inanimate PP object; [leaŋa] ‘be bad’, taking an animate PP object).

Whether an argument was given, new, or under contrastive focus in the answers to the questions had no effect on the appearance of an Habs  (H- in intonational transcriptions): an H always appeared before the absolutive argument, and never before the ergative argument or oblique object. This result is consistent with Calhoun’s (2015) results from intonational transcriptions for sentences elicited under broad focus (‘What happened earlier’), question focus on the agent or direct object, and contrastive focus on the agent or direct object. That study also showed no evidence that the H preceding the absolutive was sensitive to discourse structure.

4.6 Evidence that Hs are edge tones

Having shown that a high tone appears before the absolutive argument, I now turn to the question of whether the position of the high tone alignment is determined by the positioning of heads (prominent, stress-bearing units) and/or edges (Ladd 2008: Ch. 2-4). What’s the evidence that the Hs discussed thus far track with edges? Or is there evidence that the Hs track with prominent, stress-bearing morae? For instance, could the absolutive high be a trailing upstepped high in the pitch accent?

A classic method to diagnose if a tone might track with a stress-bearing unit or if it might track with an edge is to vary the position of stress and the number of syllables/morae in words, and to observe if the alignment of the tone correlates with stress position (the signature of a pitch accent) (and)/or with word length (the signature of a edge tone) (Jun and Fletcher 2014). However, the penultimate mora is the furthest mora from the left edge of a prosodic word that native Samoan words can bear primary stress on (Zuraw et al. 2014). Thus, the position of primary stress cannot be shifted far away enough from the right word edge in Samoan to clearly determine if Hs track with stress or edges. One way to circumvent the closeness of stress to the right edge in Samoan is to turn to code-switching. Codeswitching between Samoan and English is a common everyday occurrence for the speakers I’ve worked with, especially in California and New Zealand, and in English, there can be antepenultimate primary stress in a word, e.g., Melanie [ˈmɛləni]. If the H tracks with the pitch accent, then the H peak should occur earlier with antepenultimate stress than penultimate stress. If the H tracks with the right edge, then the H peak should stay at the right edge even when stress is antepenultimate rather than penultimate. (It’s also possible that the H might track with both the pitch accent and the edge.)

Like in Yu and Stabler (2017), I performed a Bach test (Halle 1978:301), using English proper names with stress patterns non-native to Samoan. I performed a production experiment with m01 where I manipulated stress position in proper names to be penultimate (Lorena, Marina, Melani) or antepenultimate (Emily, Helena, Melanie)Footnote 21 in different syntactic configurations where I had previously found that an H would be reliably present or absent at the right edge of the proper name (fronting, coordination, default word order and VOS/V-PP-S transitives and intransitives). Some example question-answer pairs elicited for antepenultimate stress in Melanie are given in (21)—constructions where an H is reliably present at the right edge of the first name— and (22)—constructions where an H is reliably absent at the right edge of the first name. All of these Q/A pairs were set up to provide a context for polarity focus, so the questions and answers were string-identical, except for the affirmative ‘yes’ initiating the answer. For example, the question for the fronted expression in (21a) [ʔo liona a Melanie na taŋi i mauŋa?] meant ‘Was it Melanie’s lions that cried on the mountain?’. Sentences were constructed to have the same number of morae (8) up to the H testing site and after the H testing site (6) to control for prosodic length.

  1. (21)

    Example question/answer pairs for Bach test with proper names, H site present at right edge of first name

    figure ac
  2. (22)

    Example question/answer pairs for Bach test with proper names, H site absent at right edge of first name

    figure ad

Figure 11 shows the effect of primary stress position on mean f0 contours over the first name and surrounding words when an H is present at the right edge. The L valley for names with penultimate stress aligns with the onset of the penultimate mora of the name, while the L valley for names with antepenultimate stress aligns with the onset of the antepenultimate (i.e., initial) mora. This shift in L valley alignment is expected, and shows that the L valley is tracking with stress position. How the high f0 peak tracks is less clear. While the high peak in penultimate stress is clearly in the final mora, there appears to be either two high peaks or a high plateau in antepenultimate stress that starts in the antepenultimate mora and extends through the final mora. This f0 contour behavior is consistent with there being two distinct high tonal targets: one high peak tracking with antepenultimate stress (the high target for the LH*) and another high tone tracking the right edge (an H). It is not the case that the f0 contour begins to fall after reaching the high peak in the penultimate mora of the first name, as we would expect if there were only high tones tracking with stress and none tracking the right edge. Instead, the maintenance of high f0 at the right edge even with antepenultimate stress suggests that the three syntactically conditioned Hs are aligned to the right edge.

Fig. 11
figure 11

The effect of primary stress position on the f0 realization of an H at the right edge of the first name in sentences like those given in (21) and (22). When the first name has antepenultimate stress, e.g., in Melanie, there is still a high f0 peak at the right edge of the first name, indicating that the Hs track with the right edge rather than with stress position. Data from m01, 208 tokens

Together, these observations support the hypothesis that Habs, Hcoord, and Hfront are edge tones and not (part of) pitch accents.Footnote 22 This raises another question. Taking the Habs  as an example H: if the Habs  is an edge tone, though, what’s the evidence that it tracks with the right edge of the phonological material preceding the absolutive argument, rather than with the left edge of the absolutive argument? After all, the realization of the absolutive high persists into the syllable preceding the first primary stress in the absolutive argument. But if the absolutive high tracked with the left edge of the absolutive argument, it would be strange to have the f0 peak realized in anticipation of the segmental material of the absolutive argument. If anything, one would expect the f0 peak to be realized after the syllable that the tone is associated to due to peak delay.

5 What is Habs? Habs  as the spellout of absolutive case

I’ve reviewed in Sect. 2.3.2 and shown in Sect. 4 that a sentence-medial high edge tone surfaces with the following three distinct syntactic constructions: absolutive arguments, coordination, and fronted expressions. One further point to emphasize is that Habs, Hcoord, and Hfront  are reliably triggered by their respective syntactic configurations: up to some small degree of noise, Hs always appear when expected and never appear when unexpected. In the plots of mean f0 contours in this paper, the gray ribbons visualizing variability across uttered tokens (see Sect. 3.3.2) all have the same shape as the mean f0 contours. This reflects that the distribution of Hs in each utterance that contributed to the phonetic data in this paper was consistent across repetitions, items, and speakers, as the reader can verify in the OSF repository. The frequency count data of transcribed Hs in Yu and Stabler (2017, Sect. 6) also show the same degree of reliability, modulo the appearance of “overriding” prosodic boundary tones that can be attributed to separate and unrelated factors discussed in Sect. 5.2. With this frequency data unambiguously showing that the distribution of Hs is entirely predictable from the three different syntactic configurations, no statistics are needed to quantify the degree of reliability. Moreover, Yu and Stabler (2017, Sect. 2) also provides evidence that the appearance of Habs is insensitive to prosodic length, speech rate, and register.

Given this reliability, I propose that Habs, as well as Hcoord and Hfront, are syntactically triggered. For the purposes of having a concrete proposal to refer to in grappling with the syntax-prosody interface, I assume that Habs is introduced postsyntactically as a pronounced reflex of the structural configuration of absolutive case (see e.g., Marantz 1991, Bobaljik 2008) following Yu and Stabler (2017), which assumes a syntactic analysis of Samoan inspired by Collins (2016, 2015, to appear, 2014); see also the derived tree in (30). But whether Habs might be (part of) a pronounced morpheme that is in the lexicon, which is then concatenated with other lexical items in syntactic derivation, or if it is inserted postsyntactically as a reflex of a syntactic configuration stated over bundles of abstract features (see Yu and Stabler 2017: Sect. 7.2) doesn’t matter for the basic claim I defend here: Habs, Hcoord, and Hfront  are introduced in the spellout of syntactic structure. I remain agnostic, too, as to whether Hcoord and Hfront might be (part of) lexical items, or inserted postsyntactically, but see Yu and Stabler (2017) for one proposal about how postsyntactic tonal insertion of Hcoord  and Hfront  might be formalized. When I use the term ‘morpheme’ in this paper, it is a descriptive term that refers to something spelled out in syntactic structure, without reference to whether it might happen in lexical insertion or postsyntactically.

An immediate puzzle raised by this proposal is: why would these particular syntactic configurations trigger Hs? A natural step might be to propose some shared property underlying the three syntactic configurations is in fact what triggers Hs. In this section, I present empirical evidence that shows that such a move towards unification is not supported. Rather, the evidence demonstrates that if there is anything more general underlying absolutive case that triggers an H, it is case: the distribution of Habs patterns with the distribution of segmental case markers. (Sect. 6 continues the argument that Habs is a case marker in terms of theories of the syntax-prosody interface.) First, I show that the distribution of Habs shares properties with the distribution of segmental case markers: Habs is illicit when other segmental case markers are illicit (Sect. 5.1). This section also addresses Calhoun’s (2014, 2017) apparent challenge to the Habs case marking proposal that Hs are absent before post-verbal absolutives under focus-sensitive na‘o ‘only’. I show that in fact, no case markers—whether segmental or tonal—can surface under na‘o. Then in Sect. 5.2, I show that high (and low) edge tones that variably appear across variable syntactic environments (Calhoun 2017; Yu and Stabler 2017) can be attributed to factors other than syntax, namely, prosodic phonology. This addresses Calhoun’s (2017) apparent challenge to the Habs case-marking proposal that Habs does not in fact occur reliably with absolutives or fronting and also occurs in disparate, additional syntactic environments. Finally, in Sect. 5.3, I provide an explanation for why the absolutive case marker is tonal and not segmental like other case markers in Samoan and summarize my overall proposal.

5.1 Habs is illicit where the segmental case markers are illicit

This section demonstrates that Habs is illicit when segmental case markers are illicit: that is, the distributional behavior of Habs  patterns like that of segmental case markers. Section 5.1.1 shows that ergative and absolutive case makers both fail to surface with argument traces in pro-drop and extraction out of relative clauses. Section 5.1.2 points out several constructions where not only Habs, but also segmental case markers are illicit: in fronted arguments, on preverbal pronominal clitics, and in focus-sensitive na‘o constructions.

5.1.1 Case markers do not surface with argument traces

Under pro-drop of an ergative argument or the extraction of an ergative argument, it would not be expected for the ergative case marker to remain behind with the trace. But suppose that Habs  was not inserted in spellout, but marked the right edge of a constituent preceding the absolutive, under some syntax-prosody mapping. As Calhoun (2015:21) points out, “prosodic phrasing is standardly held to align only with overt syntactic structure,” e.g., see Elfner (2012: Ch. 5) for a recent argument that this is true based on empirical evidence from Connemara Irish as well as Selkirk and Lee (2015:9). And so if Habs  marked the left edge of some phonological constituent corresponding to a syntactic constituent initiated by the absolutive, it might be expected for an Habs  to fail to surface under pro-drop or extraction of an absolutive. But if the Habs  marked a right prosodic edge, although the absolutive argument itself wouldn’t be overt under pro-drop or extraction, the phonological material in the immediately preceding prosodic phrase marked by the Habs  would be—predicting that the Habs  might remain behind. In this section, I show this is not the case: like the ergative case marker e, an Habs  does not surface with traces.

pro-drop

In the sentences in (23) and (24) from four speakers (m01, s13, s18, s22c/s), I found that an Habs  was absent under pro-drop of the absolutive. However, an Habs  was present under pro-drop of the ergative, as long as an overt absolutive argument was present. That is, for the pro-drop sentences in (23) and (24), as long as [malini] was absolutive, Habs  appeared. Figure 12a shows the mean f0 contours over [lau] ‘make.fun’ and the first syllable of [malini] for the sentences in (23). Even though the realization of the H on [lau] is difficult to discern because of final stress on [lau], the realization of the H is quite clear at the onset of the absolutive argument [malini] in the f0 contour. This is another example of how minimal comparisons is useful (Sect. 3.3.1). Figure 12b shows the same results for pro-drop in (24): Habs  is absent if the absolutive pronoun has been pro-dropped, but it is present otherwise, as long as the absolutive argument is overt.

  1. (23)
    figure ag
  2. (24)
    figure ah
Fig. 12
figure 12

Mean z-score standardized f0 contours for pro drop sentences in (23) and (24) from m01, s13, s18, and s22c/s (4 speakers, 15 tokens for (a), 24 for (b)). Habs  isn’t present if the absolutive pronoun is pro-dropped, but is present under ergative pro-drop as long as the overt argument [malini] is absolutive

Syntactic extraction in relative clauses

Habs  also does not appear in a relative clause when an absolutive argument has been extracted out of it. I show this with two data sets: one comparing extraction of the ergative subject vs. the absolutive object out of transitive embedded clauses (from s13, s18, m01, s22) and one comparing extraction of ergative subjects out of transitive clauses vs. extraction of absolutive subjects out of intransitive clauses (from m01).

I elicited (25) as a minimal pair for comparing extraction of the ergative subject vs. the absolutive object out of a relative clause; see Sect. A.7 for a minimal pair with a transitive matrix clause also elicited. The embedded verb is underlined.

  1. (25)

    Extraction of ergative subject vs. absolutive object, intransitive matrix clauseFootnote 23

    figure ai

I show mean f0 contours over the last three syllables of the embedded verbs (e.g., [ŋa-i.na] from [la.laŋa-i.na]) in (25) from four speakers in Fig. 13a. Figure 13a shows that Habs  does not appear at the right edge of the embedded verb if the absolutive object has been extracted. However, Habs  does appear on the embedded verb if the ergative subject has been extracted, with the absolutive object remaining in the embedded clause.

Fig. 13
figure 13

Mean contours over embedded verb in relative clause. (a) z-scored f0 contours for ergative subject vs. absolutive object extraction, e.g., (25) from m01, s13, s18, s22c/s (4 speakers, 36 tokens); (b) f0 contours for ergative vs. absolutive subject extraction e.g., (26) from m01, 13 tokens. Habs  appears only if the ergative, rather than the absolutive argument, is extracted

To confirm that the H distribution shown in Fig. 13a isn’t the effect of subject vs. object extraction, I also elicited six additional sentences from my primary consultant with only subject extraction out of a relative clause, where the ergative or absolutive subject was extracted to object position in the matrix clause. Two examples are given in (26). These sentences were also “easier” extractions: subject extraction out of the final constituent in the matrix clause. Figure 13b shows that with these extractions, too, Habs  did not appear on the embedded verb if the absolutive subject was extracted, but did appear if the ergative subject was extracted and the absolutive object was still present in the embedded clause. For this set of sentences, f0 on the embedded verb happened to be globally higher when the absolutive subject was extracted, compared to when the ergative subject was extracted. However, the contrast in the presence vs. the absence of Habs  is still clear: the f0 contour is falling in the final syllable of the verb when the absolutive subject is extracted, but rising when the ergative subject is extracted.

  1. (26)

    Examples of ergative vs. absolutive subject extraction

    figure ak

5.1.2 Genuinely unmarked bare NPs: Habs  is illicit where other case markers are also illicit

Up to this point in the paper, I have documented cases where Hs are not observed before bare NPs that are independently expected not to be case-marked, i.e., before pseudo-incorporated objects (Sect. 4.4). Yu and Özyıldız (2016:399, Sect. 3.4.2) shows other constructions. I have also shown syntactic environments where not only Habs, but also other case markers are illicit: preceding fronted arguments (3), preceding preverbal pronominal clitics (Sect. 4.3.2), and preceding argument traces (Sect. 5.1.1).

Additionally, we show here that no case markers—including Habs—can co-occur with focus-sensitive na‘o ‘only’.Footnote 24 Calhoun (2014, 2017) first noted that an Habs  does not co-occur with absolutive arguments under na‘o and argued that this data challenges the claim that there is a tonal absolutive case marker.Footnote 25 But Calhoun (2017:15) did not elicit ergative or oblique nominals under na‘o and reported eliciting constructions with “na‘o modifying a noun phrase in all of the syntactic positions in which it is known to be grammatical in Samoan”: preceding the verb phrase, a fronted noun phrase, or a postverbal absolutive argument (Calhoun 2017:11, (16)-(18)).

Mosel and Hovdhaugen (1992:272-273) says that “noun phrases combined with na‘o are always unmarked for case:…They occur in the function of fronted noun phrases, absolutive arguments in verbal clauses, predicates in nominal clauses, and predicative noun phrases in semi-verbal clauses” and Mosel and Hovdhaugen (1992:526) refers to “the absolutive noun phrase of na‘o.” I suspect that either some language change has occurred since Mosel and Hovdhaugen’s (1992) work that may have admitted non-absolutive arguments under na‘o (although the Auckland consultants I elicited na‘o constructions from had a wide age range—19, 23, and 48 years old), or perhaps Mosel and Hovdhaugen’s (1992) work conflated segmentally unmarked case with absolutive case.

I show here that case-marking data in na‘o constructions collected from my Auckland consultants (f03, f05) in fact supports the analysis of Habs  as a tonal case marker. The examples in (27) show na‘o combining with nominals bearing different cases. Case markers are shown to be ungrammatical in positions preceding and following na‘o.Footnote 26 Plots are given in the OSF repository.

  1. (27)

    Case marking cannot co-occur with na‘o

    figure al

The same is true if the argument under na‘o is fronted, e.g., in na‘o le liona na lagona e Melina. ‘only det lion past hear erg Melina’, the fronted counterpart of (27b): the only H that appears is the expected Hfront at the right edge of fronted na‘o le liona ‘only the lion’. Thus, the Habs  patterns together with segmental case markers in being illicit under na‘o, supporting a unified analysis of Habs  and segmental case markers in Samoan. That case marking is illicit under na‘o is intriguing considering Hohaus and Howell’s (2015) analysis that na‘o is a special case of ‘‘o-marked’ constructions. More generally, Brown and Koch (2016: §4.1) analyzes focus-sensitive ‘only’ expressions in Polynesian such as ko in Tokelauan as association between a focus-sensitive na particle and *ko-marked nominals. Perhaps there is a connection between the apparent absence of case-marking under na‘o and in fronted DPs; recall that fronted DPs are preceded by ‘o. Together, this data raises the hypothesis that ‘o interacts with case-marking and might block case assignment—possibly even because it itself assigns case.

5.2 High prosodic boundary tones are distinct from Hs inserted in spellout

In the course of demonstrating that the Habs  is illicit where other (segmental) case markers are also illicit in the preceding section, I showed that Calhoun’s (2017) transcriptional data that H- (almost) never appears before post-verbal absolutives under na‘o is not problematic for analyzing the Habs  as the spellout of absolutive case. In this section, I address another challenge Calhoun (2017) raises for the analysis of Habs  as a tonal case marker: variability in the appearance of high edge tones before post-verbal absolutives under na‘o and between nominals in equative copular clauses (Mosel and Hovdhaugen 1992: Sect. 11.3).

Yu and Stabler (2017) hypothesizes that there are high edge tones in Samoan which are not inserted in spellout like Habs, Hcoord  or Hfront. Yu and Stabler (2017) found that there were positions where none of these three would be expected, but where both high and low edge tones occurred, often alongside a pause. It was hypothesized that these sporadically appearing tones were high and low prosodic boundary tones. Recall that by ‘prosodic boundary tone,’ I mean a tone that marks the edge of prosodic constituents above the level of the prosodic word. Such tones are, by definition, subject to the vagaries of prosodic phrasing choices of the speaker. While prosodic boundary tones can be indirectly conditioned by syntax, crucially, they are also conditioned by non-syntactic, prosodic factors, e.g., ‘prosodic markedness’ restrictions on size and eurhythmy and speech rate (see Yu and Stabler (2017: Sect. 2.2)). Based on the work presented in Yu and Stabler (2017) and in Calhoun (2017), I raise the possibility that the H-s Calhoun (2017) transcribed as preceding absolutives under na‘o are prosodic boundary tones rather than Habs  tones, and that positions where Calhoun (2017) transcribed both L-s and H-s are positions where low and high prosodic boundary tones (L% and H%) variably occur.

5.2.1 Prosodic boundary tones in na‘o constructions in Calhoun (2017)

In the discussion of the absence of Habs  under na‘o in the immediately preceding section (Sect. 5.1.2), I noted that Calhoun (2017) actually did report the appearance of some high edge tones before absolutives under na‘o, but I was abstracting away from this. I address this observation here. There are three na‘o construction types in Calhoun (2017) with post-verbal absolutives under na‘o (Calhoun 2017, (41), (42), (44)). Among these three, Calhoun (2017:22, (41)) reports transcribing an H- preceding na‘o absolutive objects only in sentences with V na‘o OS order, and then, only 25% of the time.

If no case-marking occurs under na‘o, as I hypothesized in Sect. 5.1.2, then how can these apparent sporadic H-s be accounted for? Fig. 6 in Calhoun (2017) compares f0 tracks for two renditions of the same V na‘o OS sentence, one in which an H- is transcribed before na‘o and one in which there isn’t. Calhoun’s (2017) transcriptional data doesn’t distinguish between edge tones which occur with pauses and those that not. But as Calhoun (2017:22) notes, a pause (longer than the duration of na‘o in the utterance) occurs following the rise in f0 to the high edge tone at the right edge of the verb in Fig. 6. If it is typical for a pause to co-occur with the H- here, then it may be that the H-s under na‘o reported 25% of the time are not absolutive H-’s, and not H-s marking the right edge of a phonological phrase (as proposed by Calhoun), but high prosodic boundary tones marking the right edges of higher-level prosodic constituents such as intonational phrases, i.e., what would often be intonationally transcribed as H% rather than H- in the conventions used in Orfitelli and Yu (2009), Yu (2011), Calhoun (2015), Yu and Stabler (2017), Calhoun (2017).

There are other places that Calhoun (2017) reports H-s in na‘o constructions. These all occur following a fronted argument (which may be under na‘o or not), and variably occur alongside low edge tones transcribed as L-s. Some of the H-s could be Hfront  tones, but others could be prosodic boundary tones. Figures 3 and 5, which are described as typical realizations, show pauses as long as two syllables following the transcribed L-, suggesting that sentence-medial low edge tones, too, might be prosodic boundary tones marking the right edge of a higher-level domain like an intonational phrase, e.g., an L%.

5.2.2 Prosodic boundary tones in equative copular constructions in Calhoun (2017)

Like in the na‘o constructions, Calhoun (2017) finds many instances of high edge tones in places where an Habs  would not be expected, as well as low edge tones. In all cases, the transcribed edge tones occur at the left or right edges of ‘o-marked nominals and/or at the left edges of PPs or clausal conjuncts. Some of the transcribed H-’s might be tones syntactically conditioned by fronting; see Collins (2016:8, Sect. 2.2) for an XP-fronting account of predicate initial ordering. But all representative f0 contours for copular constructions shown in Calhoun (2017) exhibit phonetic signatures of higher-level prosodic phrase breaks. Figures 7,8, and 10 in Calhoun (2017) show pauses following every transcribed H- and L- —some pauses as long as a 3-syllable word in the utterance, and Fig. 9 shows a stretch of glottalization spanning three syllables at the right edge of the transcribed L-. (Glottalization at domain edges has been found to be more frequent at the edges of higher-level prosodic domains, e.g., see Dilley and Shattuck-Hufnagel 1996).

In summary, the copular and the na‘o construction data in Calhoun (2017) is not problematic for the analysis of the absolutive high as a tonal case marker, if the low and high edge tones that reliably co-occurred with pauses are in fact boundary tones marking a high-level prosodic constituent such as the intonational phrase.

The presence of an audible pause is often taken to be a phonetic signature of a prosodic domain edge high up in the prosodic hierarchy, e.g., the intonational phrase. Pauses indicate strong prosodic boundaries and are due to a slowdown in the speed of articulators (see Krivokapić (2014) for a review); in this sense, (fluent) pauses at the end of intonational phrases can be seen as extreme lengthening. As a rule of thumb, pauses have been used to diagnose intonational phrase boundaries, see e.g., Selkirk (1978/1981:135), Pierrehumbert (1980:19), Ladd (1986:315-317), Nespor and Vogel (1986:188), Krivokapić (2007:163), Jun and Fletcher (2014:501-502). (For instance: “It is between intonational phrases (and only between them, we would claim) that one finds potential pauses.” (Selkirk 1978/1981:135)) The convention that (fluent) pauses are used to diagnose intonational phrase boundaries—plus the fact that sentence-medial low edge tones in Calhoun (2017), Yu and Stabler (2017) co-occur reliably with a following pause—raises the possibility that low edge tones are licit only at the end of intonational phrases.

An analysis of edge tones with pauses as intonational phrase boundary tones provides a unified analysis for both the variability in the appearance of an edge tone, as well as the alternation between low and high edge tones—these properties are typical for prosodic boundary tones across languages, e.g., see Jun (1998, 2005, 2014). In Sect. 6.3.2, I further discuss the hypothesis that there are high (and low) edge tones that are prosodic boundary tones, and that these tones are distinct from Habs, Hcoord, and Hfront.

5.3 On the improbability of a tonal morpheme in a “non-tonal” language

This section (Sect. 5) has explicated my proposal about what Habs is. I close the section by addressing perhaps the most surprisingFootnote 27 aspect of it: positing high tones inserted in the spellout of three disparate syntactic configurations in a “non-tonal” language in a “non-tonal” language family.Footnote 28 I address two issues: (i) if such a language where tone appears in so few and such disparate syntactic configurations could plausibly exist, and (ii) if there is any plausible explanation for a tonal case marker alongside segmental ones.

Languages where tone appears in few and disparate syntactic configurations do exist. Hyman (2018: (1), 2011a: (1.17), 2011b, et seq.) (the wording varies slightly in different papers) defines a tone language as follows: “A language with tone is one in which an indication of pitch is lexically affiliated with at least some morphemes.” The motivation for this definition is illustrated with Chimwiini. Chimwiini is described by Kisseberth and Abasheikh (2011) as having an obligatory high tone (“accent”) in the final word of every phonological phrase. The high tone is phrase-penultimate by default, but phrase-final for some syntactic configurations. Tone placement is the only thing that carries the contrast between first and second vs. third person in past and present when there is no overt subject. By his definition, Hyman (2011a:130, 135) points out that this single tonal contrast qualifies Chimwiini to be a tone language (i) even if “very sparsely so,” and (ii) even if the high tones demarcate phonological phrases (they are still “lexically affiliated”: they carry morphemic contrast). Similarly, an Habs carries morphemic contrast between absolutive vs. other cases in Samoan and if segmental case markers are dropped, can be the only signal of this contrast. Thus, Samoan is a tone language, even if very sparsely so.

There are other, disparate syntactic configurations that reliably trigger phrase-final rather than phrase-penultimate high tones in Chimwiini. A phrase-final high tone has also been found to be triggered by: relative clauses, negative imperatives, the ka-conditional tense, and the conjunction na (Kisseberth and Abasheikh 2011: Sect. 4.1). In all other cases besides first and second vs. third person, the syntactic configuration also triggers segmental changes. For example, a relative verb is marked by a final high tone, but it is also marked by a final -a or -o vowel (Kisseberth and Abasheikh 2011: Sect. 4.1.2). And a conjunction triggers not only final high tone on the coordinated phrase, but also the appearance of the conjunction na. Samoan looks like this, too. Disparate syntactic configurations are spelled out with segmental as well as tonal material. Coordination is spelled out not only with Hcoord, but also a coordinator like the conjunct ma. Fronting is spelled out not only with Hfront, but also with a change in word order and the appearance of ‘o. But one particular configuration, absolutive, may be spelled out only with tonal material.

Some other languages where tone-morphology interactions are very sparse include Chickasaw (Gordon 2005: Sect. 11.2.3) and Uspanteko (Bennett and Henderson 2013: Sect. 2.3). Both are described as accentual systems. In Chickasaw, some verbs appear with a high tone (and possibly segmental changes) to express aspectual contrast. In Uspanteko, a plural marker, VP focus clitic, possessive prefixes and a “phrase final status suffix” are associated with the introduction of a high tone. Other languages where particular tones reliably appear under collections of specific syntactic constructions include Dogon languages (Heath and McPherson 2013; McPherson and Heath 2016) and Naxi. In Naxi, the distribution of rising tones is restricted. Synchronic reduction and deletion of a small set of high frequency H-toned enclitics in connected speech/less formal speech can result in a rising contour on the previous syllable when the “orphaned” tone reassociates (Michaud 2006). These enclitics include topicalizers, focus markers, classifiers, exclamative particles, and ‘or.’ Moreover, Michaud (2006, Sect. 1) traces the diachronic source of a small set of lexical items and constructions that always appear with rising tones to be the orphaned H tone from an “earlier H-toned possessive.”

The Naxi example of tonal reassociation under reduction/deletion of segmental material bears similarities to a possible explanation of why the absolutive case marker might be tonal rather than segmental. Yu and Özyıldız (2016) discusses the existence of a segmental absolutive particle ia (denoted here as \(ia_{\text{abs}}\)) which appears to be a possible source of Habs. Whether similar processes might be in play for Hcoord or Hfront remains an open question. Empirical data on \(ia_{\text{abs}}\) and a proposal about the emergence of Habs  from \(ia_{\text{abs}}\) are presented in Yu and Özyıldız (2016); I sketch the outline here and refer the reader to Yu and Özyıldız (2016) for details.

Only a few sources in the literature remark that absolutive arguments are preceded by \(ia_{\text{abs}}\) (Hovdhaugen 1987:154-155; Mosel and Hovdhaugen 1992:51, 143; Vonen 1988:38–39). Mosel and Hovdhaugen (1992:143) states that \(ia_{\text{abs}}\) is always optional and is mostly used before proper nouns and seldom in literary texts. Vonen (1988:38-39) also states that \(ia_{\text{abs}}\) is always optional and that it can be followed by an article, especially after hesitation. No actual examples of elicited utterances of \(ia_{\text{abs}}\) have appeared in the literature other than a few in Hovdhaugen (1987). Yu and Özyıldız (2016) reports that speakers never spontaneously produced \(ia_{\text{abs}}\) in elicitation, but were meta-linguistically aware of it (if, for one younger speaker, prescriptively from grammar exercises in school) and had systematic intuitions about its distribution. In other words, \(ia_{\text{abs}}\) appears to be moribund. Yu and Özyıldız (2016) describes the distribution of absolutive \(ia_{\text{abs}}\) in the same set of syntactic structures as those included in this paper. Yu and Özyıldız (2016) shows that \(ia_{\text{abs}}\) is licit before absolutives, but not before ergatives or obliques. Moreover, like the segmental case markers, and patterning with the distribution of Habs, \(ia_{\text{abs}}\) is illicit under fronted expressions and na‘o (Yu and Özyıldız 2016: Sect. 3.4.1). And \(ia_{\text{abs}}\) is illicit before Hcoord and Hfront.

The distribution of \(ia_{\text{abs}}\) thus provides additional evidence that Habs is a case marker. In addition, the coinciding distribution of Habs and \(ia_{\text{abs}}\) offers an explanation for why the absolutive case marker in contemporary Samoan is tonal. Namely, the origin of the absolutive high could be leftward tonal reassociation of the high tone from the pitch accent on absolutive \(ia_{\text{abs}}\), upon deletion of the segmental material of \(ia_{\text{abs}}\), following Yu and Özyıldız (2016, Sect. 4.1). At a high-level, the process of segmental deletion and tonal re-linking that would be involved in this proposed origin of Habs is typical of tonal behavior in natural language, e.g., in the formation of Naxi rising tones. A characteristic property of tone, both synchronically and diachronically, is its “stability”: even if the segmental material hosting a tone deletes, a tone will remain and be re-associated to remaining segmental material (Yip 2002:67; Hyman 2011b:210). In the case of Samoan, the tone on \(ia_{\text{abs}}\) arises from stress, so this is in fact an example of “stress stability” (Kaisse 1982, Sect. 2.1), as discussed in Kaisse’s (1982, 1977) analysis of hiatus resolution in varieties of Modern Greek. For example, under fast speech rate, γríγora érxome ‘quickly I come’ → γrìγorá ’rxome. The bolded é is deleted, but its stress appears on the previously unstressed, immediately preceding a (also bolded) (Kaisse 1982, (13a)). The difference in Samoan is that the Habs doesn’t appear to be associated with stress (Sect. 4.6).

One last thing I’ll mention here is that the origin of Habs  from \(ia_{\text{abs}}\) also potentially offers another way in which absolutive case marking patterns with ergative and oblique case marking. One puzzling property of Habs  is that it appears to be realized on phonological material preceding the absolutive argument, rather than on it. (Yu and Özyıldız 2016, Sect. 4.1 provides additional reasons based on typological generalizations about tonal association for why leftward rather than rightward tonal association from the H of \(ia_{\text{abs}}\) to phonological material preceding the absolutive would be expected.) This property is less puzzling, if we assume that all case markers in Samoan are phonologically left-leaning, e.g., enclitic—even if they are syntactically right-leaning. Then absolutive \(ia_{\text{abs}}\), too, might be phonologically phrased to the left, resulting in leftward association of its high tone. This type of syntax-prosody “mismatch” is exceedingly common cross-linguistically (Klavans 1985; Himmelmann 2014). Moreover, contemporary utterances with absolutive \(ia_{\text{abs}}\) that I have elicited include renditions where the segments of \(ia_{\text{abs}}\) are highly reduced (and possibly stressless), but an Habs is still easily detected.

The next section, Sect. 6, shows that alternative analyses do not fit the current empirical data as well as my proposal. For ease of comparison, I give a high-level summary of my proposal and how it fits the current data below.

As shown in Fig. 14, I assume that semantic and information structure are encoded in syntactic structure (Kiss 1995). I propose that there are two distinct components of the grammar that trigger high edge tones in Samoan: (i) spellout, where Hs appear as phonological reflexes of structural configurations in syntax, and (ii) phonological grammar, in processes conditioned on prosodic domains. Hs inserted in spellout (Habs, Hcoord, Hfront, and perhaps more to be uncovered) have no access to prosodic domains. There is also no general property underlying absolutive case, coordination, and fronting that triggers Hs. Hs (and Ls) inserted in the phonological grammar (H%—and L%, too), i.e., ‘prosodic boundary tones,’ have no direct access to syntactic domains. They are conditioned on prosodic domains, which, in turn, are conditioned jointly on the basis of syntactic structure and prosodic markedness restrictions (see Sect. 5.2).

Fig. 14
figure 14

Diagram schematizing my proposal about two kinds of Samoan edge tones, cf., diagram in Fig. 16 schematizing Calhoun (2017) proposal

The two types of Hs—from spellout, and from phonological grammar—can be distinguished by the following:

  1. 1.

    Distributional facts

    • Hs inserted in spellout reliably appear in a small, restricted set of distinct syntactic configurations (absolutive case, coordination, fronting; more may be found with additional fieldwork). Moreover, the distribution of Habs patterns with the distribution of segmental case markers rather than with the distribution of Hcoordand Hfront.

    • Hs (and Ls) inserted by the phonological grammar, denoted as H% and L%, don’t reliably appear in particular syntactic configurations. Instead, they are triggered by prosodic domains, which are sensitive to both syntactic structure and prosodic factors. Thus these Hs variably and sporadically appear in variable syntactic configurations. These syntactic configurations do include those that trigger the spellout of an H, but also others, e.g., ergative case and oblique modifiers and objects.

  2. 2.

    Phonetic realization

    • Hs inserted in spellout do not co-occur with a pause.

    • H%s (and L%s) inserted in the phonological grammar reliably co-occur with a pause.

  3. 3.

    Sensitivity to prosodic factors

    • The presence/absence of Hs inserted in spellout is insensitive to prosodic factors, e.g., speech rate and prosodic length.

    • H%s (and L%s) inserted in the phonological grammar are sensitive to prosodic factors.

To facilitate future work, I state the criteria for distinguishing the different kinds of Hs as strongly as possible to make my proposal easily testable and falsifiable. For example, I would be surprised if the distinction in phonetic realization is as stark as stated. Tonal and rhythmic signatures of prosodic domains can be “mismatched” (as in Break Index 2 in ToBI transcription Beckman et al. 2005), e.g., so a rushed H% might not co-occur with a pause.

6 Alternative analyses of high edge tones in Samoan: Syntax-prosody mapping and information structure

In this section, I present possible alternative analyses of high edges in Samoan, including the ones proposed in Calhoun (2015, 2017), and show that they do not fit the current data as well as mine. Given that it appears that syntactic structure conditions at least some of the high edge tones in some way and that the pronounced elements at hand are tones, considering an analysis of the Hs based on what is known about the syntax-prosody interface is only natural. All but one of the alternative analyses, including Calhoun (2015) discussed here fall within the syntax-prosody interface. The exception is Calhoun’s (2017) proposal of a syntax-less information structure-prosody interface, which I discuss in Sect. 6.4. Unlike my proposal, both Calhoun (2015, 2017) pursue analyses where all high edge tones in Samoan are unified as having a single, shared source. It may be worth reiterating from Sect. 1 that my proposal in Sect. 5 falls within the syntax-prosody interface: some high edge tones in Samoan are introduced in spellout, and some are conditioned by prosodic domains, which may in turn be conditioned by syntactic domains.

The rest of this section discusses alternative analyses of high edge tones in Samoan within the syntax-prosody interface. First, I give derived syntactic tree structures after the spellout of the absolutive, coordination, and fronting Hs. The purpose of providing these trees is not to argue that these trees must be the “right” ones for Samoan, but to allow me to engage concretely with alternative analyses of Hs that fall within the syntax-prosody interface. I begin by considering analyses within the bounds of ‘direct reference’ theories, e.g., Kaisse (1985), Odden (1987), Pak (2008). These theories allow the domain of a phonological process to be defined directly in terms of syntactic relations (Sect. 6.2). In Sect. 6.3, I discuss analyses that fall within ‘indirect reference’ theories, e.g., Nespor and Vogel (1986), Selkirk (1986), Hayes (1989), Inkelas (1989), Truckenbrodt (1999), Selkirk (2011). These assume that the domain of phonological processes are defined in terms of prosodic constituents rather than directly in terms of syntactic relations and structures: phonological processes may reference syntactic structure only indirectly via systematic (but not necessarily transparent) relations between prosodic and syntactic constituents. I close the section by briefly considering some other recent ideas about the syntax-prosody interfaces, including analyses that refer to syntactic phases, e.g., Ishihara (2004), Kratzer and Selkirk (2007), Dobashi (2009), Downing (2010), Cheng and Downing (2016).

6.1 A working proposal for the spellout of Habs

The proper syntactic treatment of case in Samoan and other ergative languages remains controversial, e.g., Chung (1978), Legate (2008), Koopman (2012), Collins (2014, 2016), but those structural issues were largely orthogonal for the presentation of empirical data in Sect. 4. However, grappling with any theory of the syntax-prosody interface demands as a prerequisite some syntactic analysis and some prosodic analysis of the phenomenon at hand. Here, we provide a working proposal for syntactic structure and spellout in Samoan for Habs. We adopt the proposals of Yu and Stabler (2017), which follows a syntactic analysis of Samoan based on Collins (2016, 2015, 2014, to appear). This syntactic analysis includes a VP-fronting account of [VO]S/VSO order, as schematized in (28, 29). Due to space constraints, we show only a derived tree for the transitive clause (2a). See Yu and Stabler (2017) for syntactic analyses and derived trees for coordination and fronting.Footnote 29

  1. (28)

    [VO]S order: VP containing object fronts (Woolford 2015:495, (12))

    1. a.

      Base order S [\(_{VP}\) V O]

    2. b.

      Order after VP-fronting [\(_{VP}\) V O] S

  1. (29)

    VSO order: object moves out of VP before VP fronts (Woolford 2015:495, (13))

    1. a.

      Base order S [\(_{VP}\) V O]

    2. b.

      Order after object shift S O [\(_{VP}\)V t]

    3. c.

      Order after VP-fronting [\(_{VP}\) V t] S O

  1. (30)

    Derived tree for transitive declarative (2a)

    figure an

6.2 Direct reference theories in syntax-prosody mapping

Direct reference theories allow phonology to directly “see” syntax, but my proposal that Habs, Hcoord, and Hfront  are introduced by particular syntactic configurations does not fit within these theories. This is because direct reference theories allow phonological processes to access only “category-neutral, label-neutral, c-command relationships and edge conditions existing among syntactic terminal nodes, as determined by θ-theoretic hierarchical structure” (Elordieta 2008:225). In fact, as pointed out in Kaisse and Zwicky (1987:7), the major point of agreement between direct and indirect reference theories (other than that syntax can influence phonological patterns) is that syntactic “category membership is generally irrelevant (cross-categorial behavior being the rule…”. As far as I can tell, it does not seem that such cross-categorial syntactic relations can unify syntactic configurations for absolutive case, coordination and fronting; see the OSF repository for more details.

6.3 Indirect reference theories in syntax-prosody mapping

Proposing an indirect reference theory demands an additional kind of analysis not required for proposing a direct reference theory: an analysis of prosodic constituency for the phenomenon at hand. This is true of course, for any theory of the Samoan Hs that relies on prosodic constituency, whether it falls under the category of indirect reference or not.

The necessary motivation for the introduction of prosodic constituency for describing a phenomenon is the same as for syntactic constituency: positing constituents captures generalizations in the observed patterns of natural language, e.g., see Nespor and Vogel (1986:58-59). When constellations of phonological processes consistently target or refer to a particular chunk of phonological material, that is reason to identify that chunk as a prosodic domain in the grammar (Selkirk 1978/1981; McCarthy and Prince 1986/1996; Nespor and Vogel 1986; Pierrehumbert and Beckman 1988; Hayes 1995; Jun 1996, 1998; Selkirk 2011; Myrberg and Riad 2015). For instance, Jun’s (1996, 1998) evidence from a constellation of phonological processes for a prosodic domain termed the ‘accentual phrase’ in Korean includes post-obstruent tensing, vowel shortening, lenis stop voicing, and tonal insertion. Identifying prosodic domains as categories and referring to them in a prosodic grammar (Selkirk 1978/1981) then presumably allows the grammar to be more succinct than if prosodic domains are not recognized.

A common finite, ordered set of categories in a prosodic grammar is the enumeration in (31) (Selkirk 2011, (1)); given such an enumeration, prosodic trees are derived using these categories, and the category names are used as node labels. I will assume this enumeration for the discussion of prosodic trees and constituency in this paper.

  1. (31)

    Enumeration of categories in a prosodic grammar (Selkirk 2011, (1))

    1. a.

      Intonational phrase (ι)

    2. b.

      Phonological phrase (ϕ)

    3. c.

      Prosodic word (ω)

    4. d.

      Foot (Ft)

    5. e.

      Syllable (σ)

Comparing a syntactic tree from an independently motivated syntactic analysis of the phenomenon at hand—call this S—to a prosodic tree motivated by generalizations over the domains of phonological processes—call this P, one might then discover that there are systematic relations between constituents in the trees. For instance, in Match theory (Selkirk 2009, 2011), the following correspondence relations are predicted to hold between syntactic and prosodic constituents (stated as violable constraints):

  1. (32)

    Definition of syntax-prosody Match constraints (Bennett et al. 2016:187, (34))

    1. a.

      Match-Word: Prosodic words correspond to the heads from which phrases are projected in the syntax (heads that will often have a complex internal structure determined by head movement).

    2. b.

      Match-Phrase: Phonological phrases correspond to maximal projections in the syntax.

    3. c.

      Match-Clause: Intonational phrases correspond to those clausal projections that have the potential to express illocutionary force (assertoric or interrogative force, for instance).

A syntactic tree S can be transduced into a predicted prosodic tree \(P_{S}\) respecting these constraints. Following Elfner (2012) and Bennett et al. (2016), I assume that in this transduction, S is first transduced to a flattened syntactic tree \(S'\), where any phonologically empty terminals are deleted, and then any two syntactic nodes are merged if they both exhaustively dominate the same set of remaining terminals; \(S'\) is then transduced into \(P_{s}\) following the constraints in (32). For instance, consider the transduction of (30a) into a prosodic tree respecting the constraints in (32), shown in (33). The prosodic tree in (33b) is the predicted prosodic tree if prosodic and syntactic constituency “match” in the sense of Match-Phrase in (32).Footnote 30 See Yu (2019) for details.

  1. (33)

    a. the syntactic tree S for (2a)

    figure ao

    b. the predicted prosodic tree \(P_{s}\) for S

    figure ap

We can then compare the prosodic tree P, as determined by generalizations over domains of phonological processes, to the predicted prosodic tree transduced from the syntactic tree, \(P_{S}\). Note crucially, that the prosodic trees P and \(P_{S}\) are arrived at independently, one from a hypothesis about how syntactic trees map to prosodic ones (\(P_{S}\)), and one from a hypothesis about generalizations over phonological processes (P) (Ladd 2008:288–290). Even if these two prosodic trees are not identical, there might still be a systematic—though not immediately apparent—relation between syntactic and prosodic trees. One option to reveal this systematic relation is the approach taken in Match theory: any deviations between P and \(P_{S}\) are explained away as the result of the adherence of the prosodic tree to phonological well-formedness constraints ranked above the interface constraints in (32) (Selkirk 2011).Footnote 31

Using this approach, I explicate alternative analyses that fall within indirect reference theories. In Sect. 6.3.1, I point out there is little evidence from the domain of phonological processes to help diagnose prosodic constituency in Samoan. Perhaps the main evidence available is the positioning of edge tones, so in Sect. 6.3.2, I take an in-depth look at what evidence there is that edge tones in Samoan are prosodic boundary tones. My conclusion from these two sections is that there is in fact not particularly strong evidence that sentence medial Hs in absolutive, coordination, and fronting constructions mark the right edges of phonological phrases, as assumed in Calhoun (2015, 2017). Thus, any theory of Hs in Samoan that relies on the assumption of a particular analysis of prosodic constituency has the challenge of finding additional phonetic and phonological evidence for the assumptions made. Finally, Sect. 6.3.3 shows that even considering the different options for approaching mismatches, no systematic relation between syntactic and prosodic trees is at all apparent because the syntactic configurations in which Hs occur defy generalization (as we have already seen from one perspective, in the discussion of direct reference theory in Sect. 6.2). This is a point also made in Calhoun (2017), contra Calhoun (2015). In summary, the current evidence for an indirect reference theory of Hs in Samoan is not strong.

6.3.1 Lack of evidence bearing on prosodic constituency in Samoan

An attractive possibility for unifying the configurations where Hs occur is to posit that all Hs are tones that demarcate the edges of a particular kind of prosodic constituent. This is the approach that Calhoun (2015, 2017) takes: sentence-medial high (and low) edge tones mark the right edge of phonological phrases (ϕ-phrases). As reviewed in Sect. 6.3, we have reason to posit a phonological constituent when we can show that a constellation of phonological processes consistently target or refer to it. What phonological processes refer to the putative ϕ-phrase?

One phonological process that is commonly bounded to be internal to a particular prosodic domain is f0 range scaling. Calhoun (2015:219) found evidence that “H- tones trigger ‘accent suppression’,” but Calhoun (2017:26) found counter-evidence to this. Another phonological process that commonly occurs at prosodic domain edges is pre-boundary lengthening (Sect. 5.2). However, Yu (2011), Calhoun (2015:216), and Calhoun (2017:20) all state that if there is any (non-pausal) pre-boundary lengthening where sentence-medial Hs occur at all, it is subtle. Moreover, there is a pre-boundary lengthening process that targets a subset of the putative ϕ-phrases—the ϕ-phrases that co-occur with pauses, as discussed in Sect. 5.2. There, I raised the possibility that high and low edge tones that co-occur with pauses are prosodic boundary tones, perhaps intonational phrase tones, and distinct from the absolutive, coordination, and fronting Hs.

Thus, it appears that the only phonological process we know of that targets a putative ϕ-phrase in Samoan is tonal insertion of an edge tone. In the next section, I point out that tonal insertion alone is not necessarily strong evidence for proposing a particular prosodic constituent.

6.3.2 Not all edge tones are triggered by prosodic boundaries

In this paper, I have been using the term ‘edge tone’ descriptively to refer to a tone that seems to track with a (morphosyntactic) word edge. In intonational analysis, it is common practice to assume that a tone with this behavior marks a prosodic constituent edge, see e.g., Jun (1998:221) and Jun and Fletcher (2014). (This doesn’t necessarily preclude it from also being associated to a stress-bearing unit, e.g., Prieto et al. 2005, Grice et al. 2015.) But a tonally marked domain does not necessarily imply a particular prosodic constituent category, nor vice versa (Gussenhoven 1990). First, it is not the case that a prosodic constituent must be marked by a tone (Bennett 2015:346); it might instead be marked by f0 scaling, or by the application of segmental phonological processes, e.g., devoicing (Hyman and Monaka 2011). More generally, it is not necessarily the case that prosodic constituency determined by the placement of tones is the same as prosodic constituency determined by the application of other phonological processes (Jun 1998, Sect. 4.2). For example, Gussenhoven (1990) illustrates how the prosodic structure of English reporting clauses, vocatives, and constant polarity tags challenges the assumption that tonal association domains are the same as domains defined by durational properties such as pausing.

Second, it is not the case that a tone that tracks with word edges is invariably triggered by phonological constituency. Himmelmann and Ladd (2008:255) points out: “… not all lexical tone languages use intonational boundary tones; for example, some West African tone languages appear not to have them, so that in these languages the pitch contour of an utterance is almost completely determined by the string of lexical tones.” That is, an edge tone might be lexically specified. For instance, in Bole, a Chadic language of Nigeria, an example of a completely tonally specified utterance is: ànín lálá méŋgò ‘the owners of the spider came back’, where acute accents mark high tones and grave accents mark low tones. Of course some of the lexically specified tones will fall at word edges. An interesting example of lexical edge tones comes from Pittayaporn’s (2007) description and analysis of final particles in Thai. Pittayaporn (2007) defines final particles to be grammatical morphemes that occur at the end of (intonational) phrases; some are specified for tone and others are not, and intonational boundary tones surface when there is a toneless final particle. But if the intonational phrase ends in a tonally specified lexical word or final particle, then the boundary tone does not surface: lexical words and tonally-specified particles “override” boundary tones. Pittayaporn (2007) thus shows cases where edge tones might be either tones that come in in lexical insertion (in lexical words), or tones that come in in spellout of some syntactic configuration (in particles). Note that the latter case—edge tones that come in in spellout of some syntactic configuration—is exactly the case that I am proposing for Samoan Hs.

Finally, there are recent case studies of variably aligned tones that again highlight a dissociation between edge tones and prosodic boundary tones. Maskikit-Essed and Gussenhoven (2016) analyzes Intonational Phrase-final high boundary tones in Ambonese Malay as “boundary tones that remain floating” because the alignment of the H peak does not systematically correlate with some segmental anchor within the word. This is an example of prosodic boundary tones that are not aligned to edges, and indeed, perhaps not even associated to any prosodic structure. Bruggeman et al. (2017) finds variable alignment of a H tonal target within focused question words ‘qwords,’ e.g., ‘what’ and phrases, e.g., ‘which pineapple’ in Tashlhiyt Berber. The qword tones in question phrases are consequently analyzed as being associated directly to the focused qword/phrase rather than some specified prosodic constituent, see Bruggeman et al. (2017, Fig. 13)—even though the L tone is often aligned at/near the right edge of the qword. This qword example could be seen as another case of tones reliably appearing in some syntactic configuration, analyzed as coming in the spellout of that syntactic configuration. Additional examples of variable alignment will likely be discovered as coverage of the world’s prosodic systems in fieldwork continues to grow.

In summary, the fact that a tone appears at/tracks with a word edge does not imply that it was inserted by the grammar as the consequence of phonological constituency. I have presented examples from a variety of languages in which an edge tone instead can be analyzed as coming in in lexical insertion or spellout. In this case, the fact that a tone aligns to an edge is because spellout placed it there. I have also shown that there is precedence for positing that tonally-marked domains are not necessarily particular constituents in the prosodic hierarchy. Thus, the presence of high edge tones alone is not enough to warrant the assumption that some or all of them mark a particular category in the prosodic hierarchy, such as a phonological phrase, as assumed in Calhoun (2015, 2017).

6.3.3 Lack of unified syntactic environments where Hs appear

Any alternative analysis for Hs that falls within indirect reference theories assumes that there is evidence to support that the Hs mark prosodic domain edges. In Sect. 6.3.1 and Sect. 6.3.2, I have cautioned that it is far from clear that Hs in Samoan are prosodic boundary tones, or that we understand how to parse Samoan utterances into phonological phrases: that is, the key assumption underlying any indirect reference theory of Samoan Hs is not well supported. Coming from the viewpoint of syntactic structure, the support for an indirect reference theory fares no better, as also pointed out in Calhoun (2017). In Sect. 6.2, I already showed that no reasonable category-free, general syntactic configuration can unify the absolutive, fronting, and coordination configurations where Hs reliably appear. Thus, there is no impetus to unify a collection of syntactic configurations via prosodic constituency. I reinforce this point here by showing that assuming that Hs mark ϕ-phrases does not fit either the prosodic tree \(P_{S}\) predicted by Match Phrase operating on the syntactic tree S shown in (33), or the [\(_{VP}\) V t] S O and [\(_{VP}\) V O] syntactic constituency implied by a VP-fronting account of Samoan syntax (Collins 2016).

Let us assume that Hs mark the right edge of ϕ-phrases. (If Hs marked prosodic words, we’d expect to see many more of them, coinciding with the domain of footing; if Hs marked intonational phrases, we’d expect to see much fewer of them sentence-medially.) Consider the syntactic/prosodic tree pair shown in (33), for a VSO transitive sentence (with the terminal [iaH] deleted in \(P_{S}\)). The predicted prosodic tree \(P_{S}\) incurs no violation of Match-Phrase. If Hs marked the right edge of ϕ-phrases, we’d expect to see Hs not only immediately preceding the absolutive object, but also the ergative subject. This does not fit with the data: Hs do not reliably appear before the ergative subject in VSO sentences. But in the beginning of Sect. 6.3, I offered a number of approaches to rescue an analysis involving prosodic constituency. Here, we might allow for ‘mismatch’: suppose that some prosodic markedness constraints are ranked higher than Match Phrase. Then a reasonable prosodic tree P might be the one shown in Fig. 15. Here, I have assumed that the ergative case marker e is phrased to the left to satisfy StrongStart (Selkirk 2011:470, (38)), which militates against weak prosodic elements at left edges. This rephrasing of e also removes the violation of a Binarity-Minimum constraint (see Elfner 2012:153, (4), and refs. within) in the prosodic tree \(P_{S}\) in (33b) due to the unary branching leading to the terminal [lalaŋa] ‘weave’. But even allowing for the mismatch in Fig. 15, assuming that Hs mark the right edge of ϕ-phrases would still imply the presence of an H before the ergative DP, which does not fit the data.

Fig. 15
figure 15

The mapping from S in (33a) to this prosodic tree P violates Match-Phrase to satisfy higher-ranked prosodic markedness constraints

More generally, consider the constituency in the VP-fronting account of Samoan assumed here (Collins 2016): the default word order in a transitive clause, [\(_{VP}\) V t] S O, and the word order in pseudo noun incorporation [\(_{VP}\) V O] S. In Sect. 4.4, I showed that an H precedes the subject in [\(_{VP}\) V O] S, i.e., [\(_{VP}\) V O] H S. But again, it is not the case that an H reliably appears before the ergative subject, i.e., [\(_{VP}\) V t] H S O. This basic asymmetry in the distribution of syntactically-conditioned Hs in Samoan does not fit the predictions of an indirect theory where Hs are associated to ϕ-phrases, given the working proposal of the syntax-prosody interface assumed here.Footnote 32 To rescue an indirect theory in which Hs mark ϕ-phrases, we could abandon the well-established VP-fronting account for some syntactic analysis that would place a major syntactic constituency boundary preceding the subject in pseudo noun incorporation, but not preceding the subject in transitive clauses. I invite the reader to consider possible alternative, independently motivated syntactic analyses fitting this criteria. Another approach would be to refer to syntactic phases rather than constituents in syntax-prosody mapping; but doing so would still require meeting the challenge of fitting this asymmetry and would require stipulation of phases to fit this data and the distribution of other syntactically-conditioned Hs beside the absolutive high.

6.4 Calhoun (2017) proposal: Hs and information structure

Calhoun (2017: Sect. 2.2) argues that an indirect theory of Samoan edge tones is not well-supported because the current evidence is against Hs marking XPs. Instead, Calhoun (2017:36-37) proposes that sentence-medial high (and low) edge tones in Samoan are all phonological phrase boundary tones and states the following: the results “suggest it is important to consider information structure in tandem with syntactic influences on phrasing, so information structural effects are not mistaken for syntactic ones (cf. Schultze-Berndt and Simard 2012). The data presented here rather support the view that word ordering and prosodic structure in Samoan are strongly influenced by information structure.” The proposal is quoted in (34) and a strong version of the proposal with a syntax-less mapping from information structure to prosodic structure is schematized in Fig. 16, cf. my proposal in Fig. 14.

  1. (34)

    Summary of Calhoun’s (2017) proposal (Calhoun 2017:37)

    1. a.

      The default ordering of information in Samoan is rheme-theme. In this order, the rheme is normally phrased separately to the theme.

    2. b.

      If the theme contains a focus, it should normally precede the rheme, a focused theme following the rheme is dispreferred. In theme-rheme order, a prosodic boundary between the constituents is optional.

    3. c.

      H- phrase tones mark an information unit as incomplete. Typically, this marks the end of a rheme with a following theme. However, H- tones can also mark coordinated information units.

    4. d.

      L- phrase tones mark a completed information unit.

    5. e.

      A weak ((!)H*) or no accent on a constituent marks it as backgrounded.

Fig. 16
figure 16

Diagram schematizing a strong version of Calhoun’s (2017) proposal with a syntax-less information structure-to-prosody mapping, cf. diagram in Fig. 14 of my proposal for comparison

In principle, this proposal can account for the data in this paper, i.e., where edge tones appear and don’t appear. But the proposal—as currently stated—could reasonably account for an extremely wide range of distributions of edge tones because it isn’t readily falsifiable. First, specific information structural configurations are only ever proposed to variably trigger the presence of edge tones, e.g., the proposal states that an H- “typically” marks the end of a rheme with a following theme; in theme-rheme order, a prosodic boundary between the constituents is “optional”; H- tones “can” mark coordinated information units. Second, it is challenging to determine what the information structure is—in particular, how to identify and motivate the choice of the theme and rheme for any given sentence. The identification of the theme and rheme requires a precise theory of information structure, including what the relevant context for determining what the information structure is. And as Calhoun (2017:8) points out, establishing what the context in a linguistic elicitation is can be quite tricky. In addition, another challenge to testing the proposal is that it relies on categorical assessments of tonal events for diagnosing information structure, e.g., whether or not an accent is present. The presence of segmental perturbations in the f0 contour and allophonic variation due to tonal crowding and other factors can make it difficult to decide whether a pitch accent is present or not and also whether an L target is present or not, see e.g., the discussion of ‘echo accents’ in Pierrehumbert (1980:223). One other challenge is understanding how Hcoord fits into the theme/rheme analysis.

Despite these issues, Calhoun’s (2017) proposal clearly informs the challenge of understanding the distribution of edge tones in Samoan. While Calhoun (2017:37) emphasizes a separation between information structural and syntactic influences on prosodic phrasing, there are two potential ways to link our proposals. Under the VP-fronting analysis of Samoan, what determines if the subject gets ergative case is whether object shift occurs before VP-fronting. But what determines if object shift occurs? (In the terms of Calhoun 2017:38, this question is cast as: why do post-verbal absolutive arguments consistently mark the beginning of the theme?) In Niuean, it is specificity (Massam 2000, 2001), but this does not seem to be the case in Samoan, since the presence of Habs  is unaffected by specificity (Sect. 4.3.1). However, in Dyirbal and Nez Perce, it is topicality that determines whether object shift occurs (Woolford 2015), based on Dixon (1972), Rude (1988). If the object is topical, then the subject is ergative; under a VP-fronting account, a topical object shifts out of the VP before VP-fronting. If the object is nontopical, it does not shift out of the VP before VP-fronting, and the subject is nominative. Under Calhoun’s (2017) theme-rheme analysis, Habs occurs because absolutives (typically) mark the beginning of the theme. If topicality determined whether object shift occurs in Samoan, then the reason for an Habs might be because it is topical and has undergone object shift.Footnote 33 In addition, an information structure account like Calhoun’s (2017) is a starting point for uncovering sources of variability in the distribution of prosodic boundary tones, e.g., H% and L%.

7 Conclusion

The main empirical contribution of this paper has been to show that high edge tones reliably co-occur with absolutive arguments. The presence of this Habs is insensitive to the syntactic nature (subject of intransitive, object of transitive predicates, proper names, pronouns, and arguments internal to nominalizations), certain semantic properties (specificity and number), and certain aspects of pragmatic context (word order, informational/contrastive focus) of the marked nominal. Moreover, Habs does not appear where bare NPs are independently expected not to be case marked (pseudo noun incorporation; Massam 2001) or where ergative and oblique case marking are also banned (under focus-sensitive na‘o and before ‘o-marked fronted nominals). Yu and Özyıldız (2016) also shows that an optional segmental absolutive particle \(ia_{\text{abs}}\) is licit in syntactic configurations where the absolutive H- appears. However, \(ia_{\text{abs}}\) is illicit where Habs does not surface, as well as where Hcoord and Hfront appear. All together, this body of distributional evidence indicates that Habs is a case marker that is inserted in spellout as a reflex of the structural configuration of absolutive case.

There remains of course, much more empirical work to be done. I’ll highlight one such strand of future work here. Studying the prosodic realization of further syntactic constructions, in concert with independent syntactic tests, could help refine hypotheses about syntactic triggers for Hs and perhaps even inform syntactic theory. As an example, I have found that an H also occurs at the right edge of weather verbs (Mosel and Hovdhaugen 1992:107), as shown in (35). Is a meteorological expression another distinct syntactic trigger for an H? Or is it an instance of a more general syntactic configuration, such as the absolutive? Or perhaps even a more general syntactic configuration encoding something about information structure? Is the locative/temporal expression an adjunct, or can it be reified as an absolutive?

  1. (35)
    figure as

The theoretical contribution of this paper is the explication of a proposal about the syntax-phonology interface in Samoan that fits the current data: (i) there are high edge tones in Samoan that are syntactically determined and inserted in the spellout of distinct syntactic configurations (absolutive case, non verb-initial fronted expressions, coordination), and (ii) there are also variably appearing high and low edge tones in Samoan that typically are followed by a pause, and these mark prosodic domains, perhaps at the intonational phrase level. My defense of this proposal brings up a number of foundational issues in prosody and the syntax-phonology interface that are sometimes overlooked.

The first is that the phonetic alignment of tones with morphosyntactic word edges is often equated with the association of these tones to prosodic constituents in descriptions of the intonational phonology of languages. Slipping from the description of the edge-alignment of a tone immediately into the transcription of that tone as a prosodic boundary tone is not an inconsequential step. It assumes the fact that a tone is aligned to the edge—alone—is enough to diagnose some prosodic domain (as well as the existence of some prosodic hierarchy). While this may not be an unreasonable starting hypothesis, it remains a hypothesis, potentially to be revised. Adhering to the assumption can prevent us from considering other reasonable hypotheses that might fit the data (better). Moreover, the assumption that discrete tonal events are enough to diagnose a prosodic domain draws attention away from the search for additional phonological processes whose domains might coincide with the tonally marked one—or more generally, the search for any evidence that could bolster support for positing the prosodic domain. This is unfortunate. “Phonologists are often explicit about whether they subscribe to level ordering or output-output correspondence (rarely both). But we tend to help ourselves to prosodic domains without further comment” (Zuraw 2009:1).

The equation of edge tones with prosodic boundary tones is symptomatic of a larger issue: that work in syntax-phonology tends to be focused on the relation between syntactic domains and prosodic domains, e.g., evaluating if prosodic domains are needed for a proper description of phonological processes or if syntactic domains are enough, or what kind of syntactic domain is related to prosodic domains (e.g., phases or constituents). I have shown how approaching the high edge tones of Samoan with this focus traps us into forcing the contexts that trigger Hs to be unified syntactically and prosodically. Forcing this unified analysis might mislead us into considering a number of alternative syntactic analyses and proposals about syntax-prosody mapping or prosodic interfaces in general just to make the unification go through. But for Samoan high edge tones, I have shown that a proposal situated within the “two further core aspects” besides the relation between syntactic and prosodic domains mentioned in Selkirk’s (2011:435) quote in Sect. 1—the spellout of morphemes and lexical items and the linearization of syntactic structure into pronounced surface strings—can fit the data. Work in these other aspects is still relatively limited, see, for instance Heath and McPherson (2013), McPherson and Heath (2016) on ‘tonosyntax’ in Dogon languages, “whereby words or phrases of particular syntactic categories (e.g., adjective, possessor NP) systematically impose tone overlays on other words or word strings” (Heath and McPherson 2013:265), and Elfner (2012), Bennett et al. (2016), Richards (2016), Kusmer (2019) on prosodic linearization.

I hope that my analysis of Samoan high edge tones as the spellout of particular syntactic configurations in this paper encourages further work attending to aspects of the syntax-phonology interface in addition to the relation between syntactic domains and prosodic domains. The “in addition to” in the previous sentence is quite deliberate: in this paper, I have proposed that high edge tones in Samoan are inserted in spellout as well as in association with prosodic domains. One does not preclude the other, as is evident from prosodic systems in many tonal languages, e.g., in the discussion of Thai final particles in Sect. 6.3.2, see also Michaud (2008), Downing and Rialland (2017), among others. The consideration of syntactic configurations in addition to prosodic domains as triggers for tonal events also widens the way in which prosody might be informative about syntax. Beyond telling us about syntactic constituency, prosody might also diagnose particular syntactic configurations in languages, even if the languages are sparsely tonal. We should be on the lookout for more cases like the one here.