Introduction

Background

The morphological properties of a wordform interact with the phonetic articulation of that form (Plag, 2014). For instance, in German, vowels are longer before voiced obstruents than voiceless ones, though in most words this is only true word-medially. However, morphological relatives tend to pattern together: both the wordform Räder “wheels” and its relative Rad “wheel” have longer vowels than the unrelated Rat “council,” even though the voiced obstruent is medial only in Räder (Nicenboim et al., 2018). Similar morphologically conditioned phonetic patterns are attested in English and several other languages (especially Germanic; Ben Hedia & Plag, 2017; Davis, 2005; Hall & Scott, 2007; Kaplan, 2017; Lee-Kim & Davidson, 2013; Mackenzie et al., 2018; Smith, Baker, & Hawkins, 2012; Strycharczuk & Scobbie, 2016; see Garrett, 2015 §4.2 and Plag, 2014 for other English examples).

Interactive models of speech production offer a mechanistic account for these kinds of effects. In such models, related wordforms – including morphological relatives – are co-activated along with the target wordform during lexical planning (e.g., Goldrick, 2014; Rapp & Goldrick, 2000). Co-activated wordforms compete with the target wordform, and the form that is ultimately produced shows articulatory traces of co-activated forms (Goldrick et al., 2011; Yuen et al., 2010). Thus, it has been proposed that co-activation of morphological relatives during planning leads to otherwise unexpected articulatory similarities (Ernestus & Baayen, 2007; Roettger et al., 2014; Seyfarth et al., 2018; Winter & Roettger, 2011).

This interactive account, however, makes broad predictions about speech production. If the morphological relatives of an intended word form are invariably co-activated during speech production, then all words should be partially blended with their morphological relatives to some extent (Ernestus & Baayen, 2007; Seyfarth et al., 2018; though see Kaplan, 2017). While this correctly predicts the Germanic voicing pattern, many of the morphologically related phonetic patterns can also be explained by less-general proposals. For instance, Rad is spelled with a voiced <d>, and the morphological structure of a word is often apparent in its spelling (Aronoff, Berg, & Heyer, 2016), so such patterns may also reflect speakers’ orthographic awareness (Brewer, 2008; Ernestus & Baayen, 2006; Winter & Roettger, 2011). The German voicing effect is also argued to be the result of partial devoicing (van Oostendorp, 2008), while other patterns have been claimed to derive from a phonological concatenation mechanism (Cho, 2001; Mackenzie et al., 2018; Smith et al., 2012).

The current study

In this paper, we test the broad predictions of the interactive account on a Javanese verb alternation. In Javanese (Austronesian; Indonesia), active transitive verb forms are marked by means of a nasalization pattern. For words that otherwise begin with voiceless obstruents, such as padal “to press against”, the active transitive form is marked by nasalizing the onset, becoming madal. However, if the word begins with a nasal, both forms of the verb are identical: masak “to cook” is also masak in its active transitive form. There is also some lexical and phonological variation. For instance, if the word is monosyllabic, such as pèl “to mop”, then the active transitive form instead has a nge- prefix, becoming ngepèl (and not “mèl”) (Robson 2002: p. 46).

If spoken word production invariably involves the co-activation of morphologically related wordforms that partially blend with the intended wordform, we hypothesize that verb forms like madal, which alternates with padal, should be produced with less onset nasality than verb forms like masak, which does not alternate. Additionally, stop closures are much longer in Javanese voiceless stops than in nasal ones, and so the word-initial lip closure should have a longer duration in alternating madal compared to non-alternating masak.

The verbal alternation in Javanese serves as a good test case for the interactive account because many alternative proposals for morphologically conditioned effects can be pre-emptively ruled out. Both alternating madal and non-alternating masak are written with the same grapheme <m>, so there is no orthographic evidence for different pronunciations (unlike in Rad / Rat), nor are there any orthographic cues to a morphological alternation. The nasalization pattern involves substitution, so concatenation processes could not explain any observed effects. Finally, any such effects could not be due to incomplete application of phonotactic constraints (e.g., final devoicing in German): both initial nasals and initial voiceless stops are common in Javanese.

Methods

Data collection

Speakers

We recruited 27 native speakers of Javanese from Semarang, Java, Indonesia, including 21 women and six men, with a median age of 22 years (range 19–55). One additional speaker was excluded because of frequent disfluencies. All speakers reported speaking Semarang Javanese (a Central Javanese dialect) as their first or second native language, and all were also native speakers of Indonesian. Speakers gave informed consent and were compensated for their assistance, using a protocol approved by the University of British Columbia Office of Research Services and the UC San Diego Institutional Review Board.

Stimuli

Critical stimuli

To test the hypothesis that Javanese active transitive forms should show articulatory traces of their morphological relatives, we selected 24 matched pairs of active transitive forms. Pairs were matched in having the same initial syllable and number of syllables, and for 16 of the 24 pairs, the onset of the following syllable was also the same. An example of one matched pair is given in bold in the first column of (1-2), below.

 

Active transitive

Non-active

 

(1) Substituted:

madal [mad̪aɫ]

padal [pad̪aɫ]

“to press against”

(2) Non-alternating:

masak [masaʔ]

masak [masaʔ]

“to cook”

In this pair, one active wordform (1) has a morphological relative that begins with a voiceless stop, while the matched active wordform (2) has only nasal-initial relatives. The second column shows the relevant morphological relative for each member of the pair.

For seven of the 24 pairs, the substituted initial nasal (the [m] in madal) is matched to a non-substituted wordform (e.g., the [m] in masak). We hypothesize that, if both forms of a word are activated during speech production, the initial nasal stop should be less nasal and longer in duration for the substituted wordforms than the non-alternating ones.

Because we found few such pairs, 17 of the 24 substituted nasal forms were instead matched to an active transitive form with a nge- prefix, as shown in the first column of (3-4) below.

 

Active transitive

Non-active

 

(3) Substituted:

ngepèh [ŋəpɛh]

kepèh [kəpɛh]

“to chew on”

(4) Prefixed:

ngepèl [ŋəpɛɫ]

pèl [pɛɫ]

“to mop”

In these pairs, the substituted initial nasal (the [ŋ] in ngepèh) was compared to a prefixed initial nasal (the [ŋ] in ngepèl). Because the prefixed [ŋ] in ngepèl is never a substitute for a voiceless stop, the interactive account does not predict phonetic blending. On the other hand, the [ŋ] in ngepèh is a morphological substitute for voiceless [k], and so if both ngepèh and its relative kepèh are activated during speech production, we hypothesize that the [ŋ] should be less nasal and longer than other [ŋ] segments.

Acoustic benchmarks

While we generally predict that the substituted nasals should be less nasalized than other nasal stops, the acoustic realization of nasality can vary between languages (Garellek, Ritchart, & Kuang, 2016; Styler, 2017) and the relevant acoustic measures have not been previously described in Javanese. Thus, in addition to the critical stimuli, we selected 25 additional unmatched words in which the active transitive form begins with a nasal, and the non-active form begins with a voiceless stop. Example (5) below shows one example lexeme.

Active transitive

Non-active

 

(5) mancal [mantʃaɫ]

pancal [pantʃaɫ]

“to kick accidentally”

In (5), the initial nasal in the active form was acoustically compared to the initial voiceless stop in the non-active in order to measure the acoustic correlates of the nasal-voiceless contrast in Javanese.

Elicitation and recording procedure

Each speaker was recorded in a quiet room provided by Universitas Diponegoro or Universitas Dian Nuswantoro using a Blue Yeti USB microphone. Each speaker read the 98 target wordforms from a randomized list, which also included 73 filler words with diverse phonological properties (stimuli for a separate experiment). Speakers read the target words in the carrier phrase Aku nulis ___ sepisan “I write ___ once”. All stimuli and elicitation materials were in the informal ngoko speech level of Javanese.

For the critical stimuli, we elicited only the matched active transitive forms, which are predicted to be distinguished from each other due to interactivity with their morphological relatives. However, we confirmed in a pre-test that the Javanese speakers were familiar with and can use both forms of each word.Footnote 1 For the acoustic benchmark stimuli, we elicited both forms of each word.

Annotation and measurement procedure

For each recorded word, we hand-annotated the initial closure and the following vowel using Praat (Boersma & Weenink, 2019). We used VoiceSauce (Shue, Keating, Vicenik, & Yu, 2011) with the STRAIGHT algorithm for pitch tracking (Kawahara, de Cheveigne, & Patterson, 1998) to estimate the harmonic amplitudes of the vocal spectrum intervals during the vowel at 1-ms intervals.

As a likely acoustic correlate of nasalization, we extracted the amplitude of the harmonic closest to 250 Hz from each vowel, averaging over the first third of the vowel (i.e., the portion immediately adjacent to the stop). This amplitude (P0) is greater with increasing nasalization (Chen, 1997). In addition to P0, we also extracted the duration of the stop closure, which is longer for Javanese voiceless stops than for nasal ones.

Analysis

Acoustic benchmarks

Before proceeding with the main analysis, we first used the acoustic benchmark stimuli to estimate the differences in P0 and closure duration between the voiceless and nasal stops in Javanese. Each speaker produced both the active and non-active forms of each word in the benchmark stimuli list. For each speaker, we calculated the median difference in P0 and closure duration between the two forms, then calculated the median of median differences across words. The differences were 8.3 dB for P0 (greater for nasals than voiceless stops) and 41 ms for closure duration (greater for voiceless stops than nasals). We hypothesize that traces of these differences should be apparent in the contrast between the substituted and non-substituted (i.e., non-alternating or prefixed) members of each critical matched pair.

Model procedure

Nasality

The P0 values for the critical stimuli were modeled with a Bayesian multilevel linear regression using the brms package for R (Bürkner, 2018; Carpenter et al., 2017; R Core Team, 2018; Stan Development Team, 2018). The model included a population-level parameter for morphological alternation (coded as -0.5 for substituted and 0.5 for non-substituted), as well as group-level intercepts by word pair and by speaker, and group-level morphological alternation slopes by word pair and by speaker. The parameters were estimated via Markov chain Monte Carlo with four chains and 10,000 samples per chain (discarding the first 2,000 from each chain), and convergence was assessed via the potential scale reduction statistic \( \hat{\mathrm{R}} \) (all <1.001) and visual comparison of the observed and posterior predictive distributions (Gabry et al., 2019).

To evaluate the evidence for the predicted phonetic effect of morphological alternation, we fit a second model in which the population-level parameter for morphological alternation was omitted. The Bayes factor between the two models was estimated with ten replicates using the bridge sampling R package (Gronau & Singmann 2018). The Bayes factor indicates the relative odds (given the data, model, and prior) for the null model over a model that includes the hypothesized effect of morphological alternation on P0 (following guidelines in Vasishth et al., 2018).

Both models were fit with a weakly informative Gaussian prior on each population-level parameter (μ = 0 dB, σ = 25). Priors on other parameters used the brms defaults. To assess how the prior affected inference, we re-fit the full model with other priors on the population-level morphological alternation parameter. Two other Gaussian priors had μ = 0 but a smaller scale (σ = 10 and σ = 3), which imply stronger beliefs that the interactive effect is small. The fourth prior had μ = 3 dB, σ = 1. This prior is based on the effect size estimated in a previous meta-analysis of the German partial voicing pattern (Nicenboim et al., 2018), which was proposed to be the result of the same interactive mechanism as the effect being explored here.Footnote 2

Closure duration

The nasal closure durations were modeled using the same procedure as P0, except as follows. Duration was modeled with an exponentially modified Gaussian residual distribution, which produced a better match between the empirical and posterior predictive distributions. The models for closure duration had additional sum-coded population-level and speaker-level parameters for the nasal place-of-articulation. The closure duration models were fit with these priors: a weak Gaussian prior (μ = 0 ms, σ = 50), a medium Gaussian prior (μ = 0 ms, σ = 20), a strong Gaussian prior (μ = 0 ms, σ = 10), and a Gaussian prior derived from the meta-analysis (μ =  − 13 ms, σ = 3; see footnote 2).

Results

Data summary

There were 1,296 tokens of the critical stimuli (27 speakers × 24 pairs × 2 words per pair). All measurements from 57 tokens (4.4%) were excluded due to disfluencies (e.g., prolongations, pauses, restarts). Additionally, 14 tokens (1.1%) were excluded from the analysis of P0 (only) due to visibly unreliable pitch tracking. For these tokens, either the entire pitch track lay outside the speaker’s usual range (based on visualization of per-speaker f0 distributions), or the estimated pitch changed by more than five semitones in 1 ms.

After performing these exclusions, 47 tokens (3.6%) were excluded from the analysis of P0 (only) because their P0 value was ≥ 2.5 median absolute deviations from the speaker’s median, and 64 tokens (4.9%) were excluded from the analysis of closure duration (only) because their closure duration was ≥2.5 MAD from the speaker’s median. In total, there were 1,178 tokens included in the analysis of P0, and 1,175 in the analysis of closure duration.

Nasality

For the analysis of P0, Table 1 provides a summary of the estimates for the crucial morphological alternation parameter, given each of the four priors. In these models, a positive effect is consistent with the hypothesis. The right column provides Bayes factors (the range for 10 replicates per model) that indicate the odds in favor of the null model (i.e., no effect of morphological alternation) relative to a model that includes an effect of morphological alternation.

Table 1 Summary of estimates and Bayes factors for the population-level morphological alternation parameter in the model of P0

Regardless of the prior, the median estimate of the morphological alternation is small (< 0.2 dB). All models suggest that no more than a minimal effect would be consistent with the data (< 0.5 dB, about 6% of the 8.3 dB difference between nasal and voiceless stops). Under the symmetrical priors, a minimal negative effect is nearly as likely as an effect in the hypothesized direction. Under all priors, the Bayes factors indicate a strong preference for the null model (17-to-1 or greater odds).

Closure duration

For the analysis of nasal closure duration, Table 2 provides a summary of the estimates for the crucial morphological alternation parameter, given each of the four priors, as well as the Bayes factors for comparisons with a null model. In these models, the hypothesized effect is negative. The likely estimates of the effect size for morphological alternation are small, and are equally distributed around zero, except under the meta-analysis prior. Under all priors, the Bayes factors favor the null model by 3-to-1 odds or higher.

Table 2 Summary of estimates and Bayes factors for the population-level morphological alternation parameter in the model of closure duration

Discussion

Based on an interactive model of speech production (Goldrick, 2014; Rapp & Goldrick, 2000), we tested the hypothesis that the articulation of a wordform should be influenced by the form of its morphological relatives (Ernestus & Baayen, 2007; Roettger et al., 2014; Seyfarth et al., 2018; Winter & Roettger 2011). In Javanese, we predicted that initial nasals (e.g., in madal and ngepèh) that alternate with initial tense stops in morphologically related wordforms (padal and kepèh, respectively) should be phonetically less nasal than initial nasals that do not alternate (e.g., masak). If so, this would support an interactive theory of speech production, and help develop a mechanistic account of morphologically conditioned phonetic effects.

From an acoustic analysis of wordforms produced in isolation, we found good evidence against the hypothesis: A Bayesian analysis favored a model in which the nasal resonance P0 and closure duration were not substantially different between nasal-initial words with different kinds of morphological relatives. Thus, a strong version of the interactive theory of speech production does not make correct predictions about the phonetics of the Javanese alternation. To better fit the data, the interactive production theory might be modified so that either co-activated wordforms do not lead to phonetic blending with the intended wordform, or that morphological relatives are not obligatorily co-activated during planning. The theory could also be modified to exclude the specific kind of phonetic effects that we predicted here. For example, one possibility might be that the influence of co-activated wordforms on their morphological relatives is limited to prosodic differences, such as those involving syllable or word position.

If, however, this theory does not predict morphologically conditioned phonetic effects, how else can they be accounted for?Footnote 3 Several theories have proposed variants of the basic idea that morphologically conditioned effects often involve competition between language-wide phonological patterns and word-specific structures (for three different perspectives on this idea, see Gafos, 2006; van Oostendorp, 2008; and Winter & Roettger, 2011, p. 64). The competition arises when a morphological alternation requires a form that is inconsistent with the language-wide patterns, and the result is phonetically intermediate in some way.

In computational approaches (e.g., Gafos 2006), a speaker optimizes their phonological output given the goals and constraints in the system. These might include high effort associated with producing an atypical phonological structure (e.g., final voiced obstruents in German), pressure to communicate a particular word (Port & Crawford 1989), or a preference for fluent or native-like production. Because these vary from moment to moment, both phonetic variability and phonetically intermediate forms are expected.

This account, while not mechanistic, might predict both previous positive findings and the current negative one. For example, final voiced obstruents do not occur in most of the German lexicon, but exceptions arise through a morphological alternation. Such an exception will be phonetically variable, depending on the relevant constraints and goals during speech production. A similar explanation might account for variability in English l-velarization (Lee-Kim & Davidson, 2013; Strycharczuk & Scobbie, 2016): velarized [ɫ] usually occurs only in syllable codas as in tall, but when the form taller is produced, competition between the language-wide dispreference for syllable-initial [ɫ] and the word-specific expectation for [ɫ] leads to a variable and intermediate form.

On the other hand, the Javanese alternation studied here does not involve competition between language-wide and word-specific phonological patterns. Javanese has no phonological dispreference for initial nasals, and therefore – in this framework – there would be no reason to expect that alternating nasals should be blended with voiceless relatives. Future work might experimentally manipulate possible constraints and goals on the system to evaluate how they interact with morphologically conditioned phonetic variation for other phenomena.

Author Note

This work was partially supported by the Social Sciences and Humanities Research Council of Canada (SSHRC) #430-2016-00220 to Jozina Vander Klok. We thank Universitas Diponegoro and Universitas Dian Nuswantoro in Semarang, Indonesia, for providing recording space. For assistance with annotation, we thank Neeloo Rahbari, Kristen Wong, Yalin Deng, Katherine Lott, Adriana Barrios, Stuart Madison, Alexandra Wei, Karla Quezada, and Monique Gentz.