Cross-clause planning in Nungon (Papua New Guinea): Eye-tracking evidence

Sarvasy, Hannah S.; Morgan, Adam Milton; Yu, Jenny; Ferreira, Victor S.; Momma, Shota

doi:10.3758/s13421-021-01253-3

Cross-clause planning in Nungon (Papua New Guinea): Eye-tracking evidence

Published: 01 March 2022

Volume 51, pages 666–680, (2023)
Cite this article

Download PDF

Memory & Cognition Aims and scope Submit manuscript

Cross-clause planning in Nungon (Papua New Guinea): Eye-tracking evidence

Download PDF

Hannah S. Sarvasy ORCID: orcid.org/0000-0002-9551-480X¹,
Adam Milton Morgan²,
Jenny Yu¹,
Victor S. Ferreira³ &
…
Shota Momma⁴

1868 Accesses
4 Citations
45 Altmetric
Explore all metrics

Abstract

Hundreds of languages worldwide use a sentence structure known as the “clause chain,” in which 20 or more clauses can be stacked to form a sentence. The Papuan language Nungon is among a subset of clause chaining languages that require “switch-reference” suffixes on nonfinal verbs in chains. These suffixes announce whether the subject of each upcoming clause will differ from the subject of the previous clause. We examine two major issues in psycholinguistics: predictive processing in comprehension, and advance planning in production. Whereas previous work on other languages has demonstrated that sentence planning can be incremental, switch-reference marking would seem to prohibit strictly incremental planning, as it requires speakers to plan the next clause before they can finish producing the current one. This suggests an intriguing possibility: planning strategies may be fundamentally different in Nungon. We used a mobile eye-tracker and solar-powered laptops in a remote village in Papua, New Guinea, to track Nungon speakers’ gaze in two experiments: comprehension and production. Curiously, during comprehension, fixation data failed to find evidence that switch-reference marking is used for predictive processing. However, during production, we found evidence for advance planning of switch-reference markers, and, by extension, the subjects they presage. We propose that this degree of advance syntactic planning pushes the boundaries of what is known about sentence planning, drawing on data from a novel morpheme type in an understudied language.

Vision and Language in Cross-Linguistic Research on Sentence Production

When eye fixation might not reflect online ambiguity resolution in the visual-world paradigm: structural priming following multiple primes in Portuguese

Article 10 April 2019

Binding Out of Relative Clauses in Native and Non-native Sentence Comprehension

Article Open access 22 February 2022

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Imagine a speaker telling a story, and upon describing the current action, they must announce in advance whether the next action will be done by the same actor (“NOSWITCH”) or instead will be done by a different actor (“SWITCH”). A simple story—I walked to the store. My friends were standing outside. They waved to me. I waved back. I did my shopping, then came home—would sound something like: I walked-SWITCH to the store. My friends were standing-NOSWITCH outside. They waved-SWITCH to me. I waved-NOSWITCH back. I did-NOSWITCH my shopping, then came home. Is this difficult to do? Apparently not, judging by the ease with which it is done by native speakers of numerous indigenous languages of the Amazon, North America, and New Guinea.

This “switch-reference marking” (Haiman & Munro, 1983; van Gijn & Hammond, 2016) is intriguing from a language processing perspective. There is extensive literature on how speakers track relationships between words within a clause (e.g., agreement; Wagers et al., 2009), and where two elements in different clauses share a referent (e.g., long-distance dependencies; Clifton & Frazier, 1989). To our knowledge, however, there is no previous research on processing of a feature like switch-reference marking, where speakers must compute relations between distinct referents across different clauses.

To understand switch-reference marking, one must first understand the sentence type in which it occurs. In English (and other languages of Europe), clauses can be combined in one of two ways: coordination (i.e., use of conjunctions like and, and or, as in The dog barked and the cat ran away) or subordination (e.g., relative clauses, as in The dog barked at the cat that ran away). However, in a number of languages, including Japanese, Korean, Turkish, Tibetan, Chechen, and Burmese, there is a third way to combine clauses. In “clause chains” (Dooley, 2010; Longacre, 1985, 2007; Sarvasy, 2021), multiple clauses describing sequences of actions or events can be uttered one after another, forming a long sentence, as in (1), where brackets indicate clauses:

1.
[The cat biting the dog], [running under the table], [finding its bowl empty], [the dog still barking at it], [the cat fled outside].

‘The cat bit the dog. It then ran under the table, where it found its bowl empty. The dog was still barking at it. The cat then fled outside.’

Clause chains may contain 20 or more clauses, yet only the very last verb conveys tense, while the rest appear in an un-tensed form. If the sentence lacks temporal adverbs such as yesterday, a listener must wait for the last verb to find out whether the sequence of events is construed as past, present, or future (Sarvasy, 2020), apparently presenting a processing challenge.

Among clause chaining languages, a subset (largely in Amazonia, North America, and New Guinea) requires speakers to announce in advance whether the subject of the following clause will be the same or different from the current subject, by way of a particular suffix (or other type of marker) on the verb. If English were a language with switch-reference marking, the example clause chain in (1) might look something like:

2.
[The cat biting-NOSWITCH the dog], [running-NOSWITCH under the table], [finding-SWITCH its bowl empty], [the dog still barking-SWITCH at it], [the cat fled outside].

The present paper presents the first psycholinguistic investigations of switch-reference marking of which we are aware.

When listening to speech, it is generally agreed that sentences are processed incrementally (Altmann & Mirković, 2009) and predictively (DeLong et al., 2005). Comprehenders use various sources of information for prediction. For instance, Mitsugi (2017) showed that Japanese speakers use case morphology (markers of grammatical role: subject, object, indirect object, etc.) as cues for predictive processing. Similarly, Altmann and Kamide (1999) found that comprehenders’ gaze travels more to a cake after hearing the verb eat but more to a ball after hearing the verb move. However, relatively little is known about how or whether morphological features on verbs (e.g., suffixes for agreement, tense, or switch-reference) are used to predictively guide comprehension (but see Pizarro-Guevara & Wagers, 2020). This is in part because in most well-studied languages, verbal morphology often does not contain clues to upcoming information. For instance, in English, verbs agree with subjects, so verbal morphology could in principle be used to predict the subject. But English verbs almost always come after the subject (although Lukyanenko & Fisher, 2016, show that in questions, where the English verb precedes the subject, number agreement inflection on verbs does aid prediction). To study prediction on the basis of verbal morphology, one needs morphological cues to upcoming information. Switch-reference markers therefore present a prime case for studying prediction on the basis of verbal morphology.

It is also important to validate the finding of predictive processing during comprehension in nonindustrialized populations who speak lesser-studied languages. Because most studies to date have relied on a certain type of participant (university students in industrialized nations), a finding of prediction based on switch-reference marking in a language of rural New Guinea would complement existing evidence from well-studied languages like English (DeLong et al., 2005), German (Kamide, Scheepers, et al., 2003b), and Japanese (Kamide, Altmann, et al., 2003a; Yoshida, 2004). Expanding the list of languages is important in establishing the generality of the claim that language processing operates predictively.

Switch-reference marking also has implications for research into language production, since speakers must know the subject of the next clause in order to produce switch-reference marking correctly. It is generally accepted that speakers plan speech in advance to some degree, although the mechanisms for planning various components of a sentence remain unclear. Eye-tracking studies targeting simple English transitive sentences (subject-verb-object) consistently find an “eye-voice span” of roughly 1 second—that is, a speaker’s gaze shifts to the picture of an object about 1 second before uttering its name (Griffin & Bock, 2000), suggesting a relatively narrow scope of planning. However, a recent series of studies suggests that advance planning is grammatically conditioned. For example, Momma et al. (Momma & Ferreira, 2019; Momma et al., 2016, 2018) showed that speakers plan verbs before the articulation of their grammatical object, but not before the articulation of their subject, suggesting that specific types of grammatical relationships among words determine aspects of their advance planning. Planning has been shown to be incremental in at least some cases—that is, the speaker may plan the last parts of a sentence while uttering earlier parts, although such incrementality can be strongly influenced by strategic factors (e.g., Ferreira & Swets, 2002).

In general, it is accepted that a clause can be a unit of planning at some level of representation (Smith & Wheeldon, 1999). For instance, Ford and Holmes (1978) found that when English speakers were forced to respond to tones played in the midst of their five-minute extemporaneous monologues on a theme, their longest reaction times to the tones occurred near the end of a clause. Ford and Holmes interpreted these results to indicate that speakers conceive of their speech in one-clause units, and that planning for the upcoming clause occurs near the end of the current clause. Pawley and Snyder (2000) also concluded from an English corpus study that speakers plan one clause at a time, and a number of other studies have yielded results that imply clausal scope for planning (Beattie, 1980; Ford, 1982; Garrett, 1975; Meyer, 1996; Wijnen, 1990).

English and related languages that lack switch-reference marking have played a dominant role in the development of psycholinguistics (Mulak et al., 2021), so it is unsurprising that there is little in the literature to serve as a guide to how switch-reference in clause chains may be planned and produced. Smith and Wheeldon (1999) found that speakers took longer to begin coordinated two-clause sentences, such as [The dog and the foot move up] and [the kite moves down] than single-clause sentences, such as The dog and the foot move up. This was taken as an indication that some planning of the second clause already occurs before the speaker begins to utter the first clause. On the other hand, they also found that speakers were slower to start producing two-clause sentences that had complex first subjects (the dog and the foot), but simple second subjects (the kite) than two-clause sentences in which the first subject was simple and the second subject was complex. This was taken to show that speakers conceived of the second clause in a less detailed manner than the first clause during initial planning. Ferreira and Swets (2005) used pictures to elicit English sentences comprising three clauses: a main clause, an embedded subordinate clause, and another subordinate clause embedded within the first subordinate clause, such as [This is the donkey that [doesn’t know [where it is from]]]. They showed that the amount of time that speakers took to begin the first clause varied depending on the grammaticality of the third clause, indicating that speakers were in some cases planning the entire structure in advance. These studies seem to support Garrett’s (1982) proposition that sentence planning could sometimes span two clauses.

Clause chains are multiclause sentences that differ from those tested by Smith and Wheeldon (1999) or Ferreira and Swets (2005). The simple coordinate structures tested by Smith and Wheeldon (1999) were conceptually repetitive, involving separate entities doing the same action. In clause chains, consecutive clauses most often describe different actions. The structure targeted by Ferreira and Swets (2005) involved subordination, in which one or more clauses are embedded in a main clause; the clauses in clause chains are not embedded. Further, the first embedded clause in the Ferreira and Swets schema was a relative clause, and relative clauses (unlike clauses in clause chains) function to provide information about an entity that acts in the main clause. Given Ferreira’s (1991) finding that relative clauses can be planned alongside the main-clause nouns they accompany, it could be argued that a relative clause functions as a part of the main clause rather than as a full additional clause.

Here we use a visual world paradigm to investigate comprehension and production of switch-reference marking in the Papuan language Nungon, spoken by about 1,000 people in remote villages of Papua New Guinea. To our knowledge, planning during sentence production has never been studied in a language with switch-reference marking.

In visual world eye-tracking, participants’ gaze (on average) is assumed to reflect the focus of attention (Altmann, 2004; Altmann & Kamide, 2004, 2009; Huettig et al., 2011). Based on this working assumption, we can infer when participants begin processing a particular word or phrase by determining when their gaze shifts to the corresponding image. In Experiment 1, to understand whether listeners use Nungon switch-reference marking to predict during comprehension, we tracked participants’ eyes as they were presented with audio recordings of brief narratives and images of characters in those narratives. In Experiment 2, to examine the time-course of Nungon speakers’ planning of the subject of the upcoming clause, we tracked participants’ looks to the current versus next subject while they recounted the same narratives.

Expanding research on cognition to communities outside industrialized societies brings challenges and compromises. For instance, while pressing keys on a laptop keyboard and answering multiple-choice questions are second nature to many reading this article, these are hardly natural in more remote communities around the world. Thus, among the challenges in field psychology is designing a task that is not so artificial that participants struggle to complete it, but not so open-ended that meaningful comparisons cannot be made. We therefore presented participants with naturalistic stimuli in the comprehension experiment and open-ended prompts in the production experiment. A feature of this design is that we were able to characterize processing that is more ecologically valid, although we lost some of the analytic power of comparing across controlled conditions.

These challenges are even more acute when experiments use advanced equipment—here, an eye-tracker. Running a portable eye-tracker that uses two laptop hosts at one time in a region without electricity was accomplished through a long-term solar system setup in the Nungon-speaking area, with enough power to run both display and control laptops.

Strong community relations are crucial to the success of field-based experiments, and to laying the foundations for further work with the same community. If community members are uncertain about the intentions of a researcher, or the purpose of the research, they may abstain from participation and decide not to support similar research in the future. The first author has maintained a close relationship with the Nungon-speaking community of Towet village since 2011, beginning with immersion linguistic fieldwork there. She is adopted into a local clan.

Months before the research team traveled to Towet to run the suite of experiments that included these eye-tracking experiments, Towet community members Stanly Girip, James Jio, and Lyn Ögate began planning for the “experiment fair” of which the current experiments were a part (see Method). They recruited four research assistants from among Towet adults who had obtained at least a 10th-grade diploma (a rare accomplishment, requiring boarding at distant schools), and convinced all 30 households in Towet village to take two weeks off from all regular duties in order to be available as participants for the planned experiments. This 2-week break from farming was possible because the community stockpiled crops and firewood for months to ensure that no one would go hungry. Overall, the Towet village community went to extraordinary lengths to ensure the success of these experiments. Their major effort is testament to the specialness of this community, and to the first author’s long-standing collaborations with them (see Dobrin, 2008, on the importance of long-term research collaborations in Melanesia).

Switch-reference marking in Nungon

Nungon is a Papuan language of the Finisterre-Huon family, spoken in six villages in the Uruwa River valley in the Saruwaged Mountains of Morobe Province, Papua New Guinea (Sarvasy, 2017). There are about 1,000 speakers, but—typifying the staggering diversification of languages in Papua New Guinea—they are spread across six distinct dialects, with no more than about 350 speakers of any one dialect. All local people grow up with Nungon as their first language; most have some familiarity with the English-based creole Tok Pisin, but this is not used outside the local schools and church services. Basic literacy levels in Nungon and Tok Pisin are high, but most adults do not read or write on a daily basis. The Uruwa River valley is remote and accessible only by small plane or foot (a difficult multi-day hike through alpine forests to the port city of Lae). The region lacks electricity and only recently gained a cell phone tower. Most adults work as self-sufficient small-holder farmers. The community is special, even in an overwhelmingly rural nation like Papua New Guinea, in that they rejected the notion of establishing an internal market economy, in favor of maintaining age-old traditions of sharing crop surpluses.

The Nungon language has complex verbal morphology. For instance, verbs can be marked for one of five tenses. Subject and object noun phrases are often omitted in Nungon discourse (“argument dropping”). Like English, Nungon has clausal coordination and subordination. However, clause chains are extremely common; for instance, text messages in Nungon often comprise one or more clause chains with four or more clauses apiece (Sarvasy, 2021). Clause chains are highly predictably distributed in narratives, but other sentence types, which lack switch-reference marking, can predominate in other genres. In a sample of 49 Nungon narrative monologues (including 1,742 clause chains), the longest clause chain had 22 clauses, while the average length was 3.4 clauses (Sarvasy, 2021).

The verbs in nonfinal clauses in a Nungon clause chain are obligatorily marked with a switch-reference suffix. These suffixes encode two different possibilities: same-subject (SS), after which the subject of Clause A is maintained in the following Clause, B, and different-subject (DS), after which the subject of Clause B differs from that of Clause A.

3.
[Kurawiöng o-unya], Kurawiöng descend-ds.2/3du [urop y-aa-gu-ng], amna nangnang. enough 3nsg-see-remote.past-2/3pl^{Footnote 1} man eater “The two of them descending at Kurawiöng-SWITCH, that’s it, they saw them, man-eaters.”

Example (3), from a recorded narrative (and one of the audio stimuli for Experiment 1 here), illustrates the Nungon penchant for omission of subject and object arguments. The first clause has just a single proper noun (a place name), followed by a DS-marked verb. In Nungon, the DS suffixes encode both DS marking and the person/number of the current clause’s subject, while the SS suffix encodes only SS marking, and involves no subject agreement. In the first clause here, the DS suffix is the only grammatical indication that the subject is second or third person and dual number (that is, two): there is no subject noun phrase in the clause. The second clause has an adverb and a verb that is inflected for remote past tense, and both subject and object person/number; again, subject and object are referenced solely through affixes on the verb, which is always the final element in the clause. As it happens, the object of the second clause refers to the pair of man-eaters who are the implied subject of the first clause. An explanatory noun phrase, “man-eaters,” follows the second clause.

Since this sentence occurs in the middle of a narrative, characters and situation are understood from the established discourse context. In such a small, close-knit community, omission of subject and object arguments in quotidian conversation more generally is enabled by the fact that people often share much background information about events and people in their communities (cf. Wray & Grace, 2007).

Nungon switch-reference strictly tracks grammatical subjects, even when someone other than the subject is the real actor. For instance, in expressions like “I feel angry,” Nungon speakers actually put “anger” as the subject of the verb, and “me” as the object: iik na-mo-ha-k “anger 1sg-give-present-3sg,” or, roughly: “anger affects me.” Several negative emotions and sensations are described in this way, such as “feeling tired,” “feeling heavy,” and “feeling bored.” Crucially, because Nungon switch-reference marking strictly tracks the syntactic subject, even when the “notional” subject does not change from clause to clause, speakers use DS markers prior to expressions like these. For instance, in (4), even though the notional subject remains the same throughout, the syntactic subject changes from “I” to “anger” to “I” again, so a speaker must use the DS marker at the end of each nonfinal clause:

4.
[E-waya], [iik na-m-una], [bög-in come-ds.1sg anger 1sg-give-ds.3sghouse-locative ongo-go-t]. go-remote.past-1sg

“I coming-SWITCH, anger affecting-SWITCH me, I went home.”

This implies that there must be a detailed grammatical element to switch-reference planning, such that it does not just occur at a broader conceptual level.

Children learning Nungon produce two-clause chains by age 2.5, and three-to-five-clause chains beginning around age 3 (Sarvasy, 2019, 2020). Both SS and DS markers are evident in their early clause chains, and 60%–80% of switch-reference morphemes in parental speech are SS.

Experiment 1: Comprehension

An intriguing possibility is that switch-reference marking could exist in part to provide comprehenders with a cue that might facilitate processing of the subsequent clause. This may be especially helpful in an argument-dropping language like Nungon, where subjects are sometimes not overtly expressed. To understand how switch-reference marking affects online processing during comprehension, we tracked participants’ gaze while they listened to 15 short speech samples that included clause chains (Fig. 1). We expected that comprehenders’ fixations would differ depending on whether they heard an SS or DS switch-reference marker. The precise timing of this difference would enable us to assess whether speakers use the morphemes as cues for predictive processing. We expected that, in a DS condition, comprehenders’ gaze would begin to shift away from the “same subject,” or the subject of the clause the switch-reference marker appears in, before the identity of the next subject was clarified in the next clause.

Method

Participants

Sixty-six adult participants were recruited from Towet village, Uruwa Ward 1, Kabwum District, Morobe Province, Papua New Guinea. Participants were each paid 50 Papua New Guinean kina, approximately 15 U.S. dollars.^{Footnote 2} Participation occurred as part of a four-experiment “science fair” (see Mulak et al., 2021). Local project managers oversaw recruitment. Participants were read an information sheet in Nungon before starting the experiment, and signed a consent form. Two participants were later excluded because they were nonnative Nungon speakers who had married into the region from elsewhere; all other participants were native speakers of Nungon. Three other participants’ data were not recorded by the experimental software, such that 61 participants’ data were included in analyses.

Materials

From a corpus of more than 200 Nungon personal experience narratives compiled during fieldwork on the Nungon language (Sarvasy, 2017), 15 short audio stimuli were selected. These stimuli ranged in duration from 2 to 29 seconds (mean duration: 9.7 seconds, standard deviation: 7.1 seconds), and had been recorded by nine different adult speakers (five males). Stimuli were selected if they comprised at least one clause chain, including at least one switch-reference marker; were easy to visually represent; and were produced clearly. Eight stimuli involved two different, nonoverlapping grammatical subjects, four stimuli involved three different grammatical subjects, one stimulus involved five different grammatical subjects, and two stimuli involved only a single grammatical subject. These two stimuli were also the only stimuli to lack DS switch-reference markers altogether. In the other 13 stimuli, either the sole switch-reference marker was DS, or of multiple switch-reference markers, one or more were DS.

Each audio stimulus was paired with a display comprising one interest area for each subject argument in the audio stimulus. This meant that: the displays for the eight audio stimuli with two grammatical subjects had two interest areas, placed apart on the screen, either in different corners or far apart along a horizontal axis; the displays for the four stimuli with three grammatical subjects had three interest areas, again, spread apart on the screen, and the display for the stimulus with five grammatical subjects had five dispersed interest areas. In the displays for the two stimuli with just a single subject maintained throughout the clause chain (and only SS switch-reference markers), there were two interest areas: one containing a representation of the actual subject, and another containing a “distractor” image. Displays were hand-drawn by the first author; characters depicted wore culturally appropriate clothing and used appropriate tools (such as bows and arrows and string bags, as mentioned in the stimuli). An example of a display is in Fig. 2; here, interest areas as programmed into the experiment are shown with boxes; the pink circle shows gaze at one time-point within the lower interest area. Note that the two subjects in the stimulus accompanying the display in Fig. 2 are ‘they’ and ‘he.’ Looks to the individual men within the upper-left-hand interest area were not differentiated for the purposes of the experiment, and this is the case for all dual and plural subjects.

Procedure

The experiments were run in one room on the second floor of a purpose-built building with woven bamboo walls and floors in Towet village, in the Nungon-speaking area. The building is equipped with three 100-W solar panels and accompanying 12-V batteries, charge controllers and AC/DC inverters. The eye-tracking experiments were part of an “experiment fair,” in which four foreign researchers, four local research assistants, and three local organizers ran four psychological and psycholinguistic experiments over 2 weeks in mid-2019. Each experiment took place in one room of the building or in a temporary enclosure outside. Local organizers tracked community members’ participation in the five experiments, such that participants moved seamlessly between experiments, and all those who wished to participate in all four experiments could do so (see also Mulak et al., 2021). The eye-tracking experiments were run jointly by the first author and organizer Lyn Ögate, who took turns running participants.^{Footnote 3}

The eye-tracking comprehension and production experiments were created as a single experiment using Experiment Builder software (SR Research) and administered using an EyeLink Portable Duo eye-tracker, which recorded participants’ eye movements while they listened to and produced sentences. Participants were seated a comfortable distance from the presentation laptop and a target sticker was placed on each participant’s forehead. This allowed accurate eye-tracking without impairing movement (e.g., during production). Viewing was binocular, but fixation location was monitored from their right eye following a 9-point calibration.

Participants were tested in one session lasting approximately 30 minutes, with the experiment divided into two blocks—comprehension and production. All participants first completed the comprehension block before the production block, though items were randomized within blocks for each participant.

Before the comprehension block, participants were told that they would need to keep their eyes on the screen while listening to speech in Nungon. Each trial began with the presentation of a fixation cross in the center of the screen. To control looking bias, the experiment was programmed so that the visual scene appeared only after participants had fixated on the cross for 500 ms. Then, 1,000 ms after the scene was presented, an auditory stimulus sentence was played over headphones. The experimenter pressed the space bar to move onto the next item once the recording was finished. Each recording was presented once.

Unfortunately, during testing, the experiment software repeatedly crashed, which could only be worked around by using a “demo” version of the experiment. This version was identical to the licensed version, except that it had the words “DEMO VERSION” in approximately 12-point red type in the center of each display screen. We saw no evidence during experimentation that participants’ eyes were drawn to these words. In the end, 25 participants of the original 66 completed the experiment using the demo version of the display.

Analysis

Using Praat software (Boersma & Weenink, 2019), switch-reference markers in the 15 audio stimuli were coded, and their onset and offset times extracted. Where a switch-reference marker occurred on a verb that also bore an object prefix referring to a character in another interest area, this was excluded from consideration here. Switch-reference markers in clauses with unclear or ambiguous reference were also excluded from the analysis. Where preceding material undergoes phonological change with the addition of a switch-reference suffix, the onset of the syllable before the suffix was extracted; otherwise, the onset of the morpheme itself was extracted. These were then coded for the subject of the clause in which they occurred. Finally, the onset of the morpheme’s own clause and the onset of the following clause were extracted.

Prior to analysis, eye-tracking data were epoched into 1,500 ms trials, each time-locked to the onset of the switch-reference morpheme. The 15 stimuli combined included a total of 49 switch-reference markers (23 DS and 26 SS). Each of these morphemes was treated as an independent stimulus. The model was thus fed data from 49 items per participant.

Since each interest area was the visual representation of a grammatical subject, the eye-tracking data could be analyzed in terms of whether, for each trial, a participant was looking at the subject of the clause bearing the switch-reference morpheme, or not. In other words, we investigated gaze patterns after the onset of the switch-reference marker in terms of whether, at each time point, the participant looked to the interest area depicting the subject of the first clause (“looks to same subject”). Note that for stimuli including more than one switch-reference marker and at least one DS marker, the interest area associated with “same subject” can change for each trial (each switch-reference marker within the stimulus). In the English pseudo-clause chain in (1), for instance, the “same subject” for “finding its bowl empty” would be “cat,” while the “same subject” for “the dog chasing it” would be “dog.” This means that all data pertaining to one stimulus could not simply be coded according to “looks to interest area A.”

Eye-tracking data were thus coded in a binary fashion as “looking at same subject”—the subject of the clause with the switch-reference morpheme—or “looking at another interest area.” Time points were excluded if the participant was not looking in one of the pre-defined interest areas on the screen (or when the participant’s eye could not be detected by the eye-tracker).

The data were analyzed using a logistic mixed effects regression (Baayen et al., 2008). We analyzed two factors: morpheme type had two levels, SS (same subject) and DS (different subject), and was treatment-coded. Because we expected any effect of morpheme type to emerge over time, we included time as a continuous factor. Gaze data were sampled in 150 ms intervals starting at morpheme onset (time zero) and ending 1,500 ms later. Prior to analysis, the time variable was centered and scaled such that it ranged from −1 to 1.

Following Barr et al. (2013), we report the results of the model with the maximal random effects structure that converged, having removed random effects in order of least variance accounted for to most. In addition to fixed effects terms for morpheme type, time, and their interaction, the final model had random intercepts for participants and items, and random slopes for morpheme type within participants.

Results

Results are shown in Fig. 3. At morpheme onset (t = 0), the proportion of looks to the same subject was roughly equal in the two conditions. In the DS condition, the frequency of looking to the same subject was relatively constant over time. But in the SS condition, looks to the same subject increased with time, leading to a significant difference starting 1,164 ms after morpheme onset (grey bar). This was after the mean onset time of the next clause (672 ms after morpheme onset; arrow).

The model detected no differences in looks to the same subject between DS and SS conditions when collapsing across time (the main effect of morpheme type was not significant, β = −0.504, z = −0.690, p = .490). The model also failed to detect a significant change in the proportion of looks to the same subject over time when collapsing across DS and SS conditions (the main effect of time was not significant, β = 0.011, z = 0.333, p = .739). Crucially, however, the model did detect an increasing tendency over time to look at the first subject in the SS condition relative to the DS condition (the interaction of morpheme type and time was significant, β = 0.256, z = 5.608, p < .001).

To determine the earliest point at which there was evidence for a difference between the SS and DS conditions, a series of 1000 fixed-effects-only logistic regressions analyzing looks as a function of morpheme type were performed on each sample between 0 and 2,000 ms (i.e., one model for each sample at a sampling rate of 500 Hz). The resulting 1,000 p values for the morpheme type term were FDR-corrected for multiple comparisons. Time points for which these adjusted p values were below .05 are indicated with the grey bar in Fig. 3. The earliest time that showed a significant difference between the SS and DS conditions was 1,164 ms after morpheme onset.

Discussion

The results of the comprehension experiment failed to support the notion that listeners use switch-reference markers as cues for prediction. As expected, participants looked more to the same subject after hearing a SS morpheme than after a DS morpheme. However, timing indicates that this difference does not stem from information in the morphemes themselves. While the divergence in looks between the two conditions appears to begin around 500 ms after morpheme onset, the difference does not achieve significance until 1,164 ms after morpheme onset. This is well after the mean onset time for the next clause, which was 672 ms after morpheme onset, indicating that the difference likely does not reflect predictive processing. Indeed, by the time the difference achieves significance, participants have in most cases had several hundred milliseconds to discern the identity of the next subject from information contained in the next clause.

Why do listeners not seem to look immediately away from the current subject, on hearing a DS morpheme? If valid, our results here could indicate that the information carried by the switch-reference morpheme is not used during comprehension, although previous findings about the use of morphological cues to guide comprehension in European languages (Hanne et al., 2015; Meir et al., 2020) and non-European languages (Mitsugi, 2017), imply that this is improbable and that the switch-reference morpheme should help guide comprehension at some level.

Alternatively, the key could lie in the amount of information encoded in the morphemes themselves, together with the nature of the visual world task. DS marking in Nungon, as in most languages with switch-reference,^{Footnote 4} simply indicates that the upcoming subject will differ. It does not necessarily help the listener determine who or what the new subject will be. In an artificial task with just two visual interest areas to choose from (say, A or B), each representing an actor, a listener might be able to use switch-reference morphemes to guide prediction (since not-A could imply B). But in a task with more interest areas, and, for that matter, in natural discourse, where the choice of upcoming subjects is unconstrained, the information encoded through switch-reference morphemes is insufficient to choose an alternative interest area/upcoming subject (since not-A could imply B or C). In such situations, the listener may well wait until more information is available to actually shift gaze from the current subject.

To test this possibility, we re-ran analyses using only the set of data from the 10 stimuli with two visual interest areas, but no clearer picture of use of switch-reference morphemes emerged from this modeling. This does not necessarily mean that the account outlined above is incorrect: Participants could remain open to the possibility that an upcoming clause could have a subject not depicted on the screen, in which case the number of images should not necessarily be expected to constrain predictions. Further, although these 10 stimuli have only two interest areas each, listeners could still attend to switch-reference morphemes in their usual way, which could be, in discourse, to wait for clarification in the upcoming clause itself before predicting its subject. Finally, it is possible that participants do in fact use switch-reference morphemes for prediction, but that this is not reflected in patterns of looking.

Of course, these results could also be clouded by the inherent problem of using naturalistic stimuli: While they afford ecological validity, because conditions are not controlled manipulations, there is no guarantee that other cues and processes were the same in the two conditions. Had all else been kept equal, it is possible that a difference in gaze would have emerged much earlier.

An ideal follow-up to the present work, then, would be to attempt to replicate the findings with controlled stimuli. Specifically, recordings could be spliced such that narratives come in pairs of stimuli which are identical up until the verb, at which point a verb with an SS morpheme is spliced into one of the recordings and a verb with a DS morpheme spliced into the other. A number of other considerations would likely be important to control, such as the a priori likelihood of an SS versus DS morpheme at that point in the stimulus, as well as the number of candidates for the next subject on the display. Such a design would allow us to disregard the possibility that any differences observed (or not observed, as in the present study) are due to differences in the preceding context, and to directly interpret any differences as reflecting switch-reference morpheme-specific processing.

Although the DS morpheme can be analyzed as providing insufficient information for a listener to fully predict the identity of the upcoming subject, the situation is different for a speaker. The speaker is obligated to produce a switch-reference morpheme, and it seems that they must process the upcoming clause’s subject in order to produce the correct morpheme. We investigate this possibility in Experiment 2.

Experiment 2: Production

In production, we aimed to estimate how far in advance (in seconds, and in clauses) speakers plan when they utter clause chains in Nungon, in the hopes of comparing this to estimates based on processing of more heavily studied languages like English, German, and Japanese. We did so by presenting the same group of participants in the comprehension experiment with the same images viewed during the comprehension experiment, but asking participants to narrate the story for each set of images themselves. We then determined when looks to the same subject diverge in the seconds leading up to production of either an SS or a DS switch-reference marker. An estimate of about 1 second, or one-to-two clauses, of advance planning would be consistent with previous experimental literature, and would validate this finding with data from a vastly different language and population. Another possibility would be that, because the syntax of Nungon requires advance planning of the next clause in a way that more heavily studied languages’ grammars do not, Nungon speakers would plan even farther in advance. This would call into question the generality of previous estimates of the scope of advance planning, and would highlight the need for psycholinguistic research on a more diverse set of languages and participant populations.