1 Introduction

In many languages with subject-before-object as a syntactically basic word order, transitive sentences in which the subject precedes the object (SO) have been reported to have a processing advantage during sentence comprehension compared with those in which the subject follows the object (OS) (Bader and Meng 1999 for German; Kaiser and Trueswell 2004 for Finnish; Kim 2012 for Korean; Koizumi and Imamura 2017; Mazuka et al. 2002; Tamaoka et al. 2005 for Japanese; Sekerina 1997 for Russian; Tamaoka et al. 2011 for Sinhalese).Footnote 1 For example, previous event-related potential (ERP) experiments showed that OS sentences elicit a late positivity effect, called a P600 effect, and/or a (sustained) left anterior negativity (SLAN) in comparison with SO sentences (Erdocia et al. 2009 for Basque; Rösler et al. 1998 for German; Hagiwara et al. 2007; Ueno and Kluender 2003 for Japanese). ERPs are electrical brain responses (electroencephalography: EEG) recorded on the scalp, which are time-locked to an event (e.g., the presentation of a word) and then averaged across trials/participants (Kutas and Van Petten 1994; Kutas et al. 2006). A P600 is a positive component with the peak latency of approximately 600 ms post-stimulus. A SLAN is a long-lasting negativity that appears around the left anterior region of the scalp. Both ERP components have been interpreted as a reflection of sentence processing costs. Functional magnetic resonance imaging (fMRI) studies have found a greater activation at the left inferior frontal gyrus (LIFG) in the processing of OS word order in comparison to SO word order (Grewe et al. 2007 for German, Kim et al. 2009; Kinno et al. 2008 for Japanese).

A possible factor that derives this word order preference is conceptual accessibility, which is defined as “the ease with which the mental representation of some potential referent can be activated or retrieved from memory” (Bock and Warren 1985, 50; Bornkessel-Schlesewsky and Schlesewsky 2009a, b; Kemmerer 2012; Tanaka et al. 2011). In SO languages, a conceptually more accessible agent precedes a conceptually less accessible patient in basic SO orders, whereas the opposite order occurs in non-basic OS orders. Several studies have reported that prominent entities such as an agent, animates, concretes, and prototypicals tend to appear as sentence-initial subjects (Branigan et al. 2008; cf. Bock and Warren 1985; Bornkessel-Schlesewsky and Schlesewsky 2009a; Hirsh-Pasek and Golinkoff 1996; Primus 1999; Slobin and Bever 1982). Accordingly, the SO advantage may be derived from the preference for agent-patient order.

Another possible source of the SO order preference involves syntactic complexities of non-basic sentences. Assuming that the dislocated constituent (filler) is associated with its original position (gap) (Frazier and Clifton 1989), the storage and integration cost should increase in OS sentences of SO languages (Gibson 1998, 2000). This hypothesis has been supported by ERP experiments. As mentioned above, OS sentences elicit a sustained left anterior negativity (SLAN) from a filler to its gap, followed by a P600 effect at the gap position. SLAN and P600 have been proposed to reflect the processes of actively maintaining a filler in the working memory and syntactically integrating it with its original position, respectively (e.g., Erdocia et al. 2009; Kaan et al. 2000; Ueno and Kluender 2003).

These two hypotheses focus on sentence-internal features of non-basic sentences to account for the SO preference. However, the word order preference may also pertain to discourse factors because the felicitous use of non-basic word orders correlates with discourse factors, such as givenness, as well as sentence-internal, non-syntactic factors, such as heaviness of displaced constituents (e.g., Aissen 1992; Birner and Ward 2009; Kuno 1987, inter alia). In other words, basic word order is a default option to describe an event and occurs in a wide range of contexts, whereas non-basic word order is a marked choice, and its use must be well-motivated. This issue has been discussed in Kaiser and Trueswell (2004) (Clifton and Frazier 2004; Grodner et al. 2005; Meng et al. 1999; Sekerina 2003). They conducted a self-paced reading experiment to examine the processing of the non-basic OVS order in Finnish (an SVO language) with two types of context, as shown in (1) below. The supportive context in (1a) referred to an O of the target sentences in (2) to license a felicitous use of OVS, in which the O must be discourse-old information in Finnish, whereas the non-supportive context in (1b) did not. The result showed a significant interaction at DP2 (“hare-part” and “mouse-nom”), due to a longer reading time in OVS than in SVO, only in the non-supportive context.

(1)

Context

       
 

Lotta

etsi

eilen

sieniä

metsässä.

   
 

Lotta

looked-for

yesterday

mushrooms

forest-in

   
 

Hän

huomasi

heinikossa

(a)jäniksen/(b)hiiren

joka

liikkui

varovasti

eteenpäin.

 

She-nom

noticed

grass-in

hare-acc/mouse-acc

that

was.moving

carefully

forward.

 

“Lotta looked for mushrooms in the forest yesterday. She noticed {(a) a hare/(b) a mouse} moving forward carefully in the grass.”

(2)

a. SVO

     
 

Hiiri

seurasi

jänistä

ja

linnut

lauloivat.

 

mouse-nom

followed

hare-part

and

birds

were.singing.

 

b. OVS

     
 

Jänistä

seurasi

hiiri

ja

linnut

lauloivat.

 

hare-part

followed

mouse-nom

and

birds

were.singing.

 

“The mouse followed the hare and birds were singing.”

Yano and Koizumi (2018) also examined the effect of context on the processing of the non-basic OSV order in Japanese (an SOV language). The result of their ERP experiment showed a larger SLAN and P600 effect for OSV in the non-supportive context but did not show any ERP effect in the supportive context in comparison to the basic SOV. In other words, there was no measurable processing cost for OSV relative to SVO when the supportive context was provided. These results suggest that the unsatisfied discourse requirement of non-basic sentences may induce a processing difficulty indexed by SLAN and P600.

In sum, there are three hypotheses that account for the word order preference, namely, conceptual accessibility (the order of thematic roles), syntactic complexity (filler-gap dependency), and pragmatic requirement. All of these three hypotheses can correctly predict an SO preference in SO languages.

The present ERP study examined the effect of these factors on the processing of SO and OS sentences in an OS language, Seediq. Before turning to the details of our experiments, we briefly overview Seediq syntax to explain why Seediq provides a good ground for testing these hypotheses. Although the conceptual accessibility hypothesis and the syntactic complexity hypothesis both predict the SO preference in SO languages, as mentioned above, the investigation of Seediq enables us to tease them apart by examining the ramifications of their respective predictions. It is this goal that motivates the present study.

1.1 Seediq

Seediq belongs to the Atayalic branch of the Formosan languages (an Austronesian language). This language has a symmetric voice system, which is also referred to as a focus system. In transitive sentences in the Actor Voice (AV), the subject is an agent or experiencer, whereas the patient or location is projected as a subject in the Goal Voice (GV), as exemplified in (3a) and (3b), respectively. In the Convey Voice (CV), the subject refers to an instrument or beneficiary, as in (3c) (CV is irrelevant for the present study). These voice alternations are distinguished by an affix attached to the verb.

(3)

a. m-egay

buNa

leqi-‘an

ka

bubu.

 
 

AV-give

sweet.potato.DIR

child-OBL

NOM

mother Footnote 2

 
 

“The mother gave sweet potato to a/the child.”

 
 

b. biq-an

buNa

bubu

ka

laqi.

 
 

give-GV

sweet.potato.DIR

mother.GEN

NOM

child

 
 

c. se-begay

bubu

leqi-‘an

ka

buNa.

 
 

CV-give

mother.GEN

child-OBL

NOM

sweet.potato

(Tsukida 2009: 158)

Seediq has a syntactically basic word order of VOS, as shown in (4a). The S of VOS is marked with ‘ka’ (Aldridge 2004, 2014; Tsukida 2007, 2009).Footnote 3 In addition to the basic VOS order, SVO is also available in Seediq by preposing S over VO, as in (4b). Evidence that SVO is derived from VOS, but not vice versa comes from syntactic diagnoses, such as the availability of quantifier floating (Sportiche 1988, Tsukida 2009, 314). The examples in (5a) and (5b) show that VOS and SVO are both acceptable in the case in which the quantifier “kana” (all) is adjacent to the noun “kiyi-kuyuh” (women). The SVO sentence in (5c) is also acceptable, in which S is at the sentence-initial position while the quantifier strands at the sentence-final position. Assuming that a quantifier and its associate must be in a local relation at the base-generated position (Sportiche 1988), the acceptable example in (5c) illustrates that S is base-generated within VP and moves to the sentence-initial position. On the other hand, the unacceptability of the VOS sentence in (5d) shows that the sentence-initial position is not the position where S originates. S cannot move to the right with the quantifier staying at the sentence-initial position. Therefore, VOS is not derived from SVO. This asymmetry also applies to GV, as shown in (6).Footnote 4

(4)

a. b-en-arig

kumu

laqi = na

ka

patas

niyi.

 
 

CV.PRF-buy

Kumu.GEN

child.OBJ = 3.GEN

NOM

book

thisFootnote 5

 
 

“Kumu bought this book for her child.”

 
 

b. patas

niyi

‘u,

b-en-arig

kumu

laqi = na.

 
 

book

this

CNJ

CV.PRF-buy

kumu.GEN

child.OBL = 3s.GEN

 
 

c. *laqi = na

‘u

b-en-arig

kumu

ka

patas niyi.

 
 

child = 3s.GEN

CNJ

CV.PRF-buy

kumu.GEN

NOM

book this

(Tsukida 2009: 318)

(5)

a. ga

h-em-aNut

siyaN

ka

kana

kiyi-kuyuh.

(Tsukida 2009, 314)

 

PRG

AV-cook

pork-OBL

NOM

all

PL-woman

 
 

“All the women are cooking pork.”

 

b. kana kiyi-kuyh

‘u,

ga

h-em-aNut

siyaN.

  
 

all PL-woman

CNJ

PRG

AV-cook

pork-OBL

  
 

c. kiyi-kuyh

‘u,

ga

h-em-aNut

siyaN.

kana.

 
 

PL-woman

CNJ

PRG

AV-cook

pork-OBL

all

 
 

d. *kana

ga

h-em-aNut

siyaN

ka

kiyi-kuyh.

 
 

all

PRG

AV-cook

pork-OBL

NOM

PL-woman

 

(6)

a. heNed-un = deha

ka

semka

siyaN.

(Tsukida 2009, 314)

 

cook-GV = 3p.GEN

NOM

half

pork

 
 

‘They will cook half of the pork’

 

b. siyaN

‘u,

heNed-un = deha

semka.

 
 

pork

CNJ

cook-GVl = 3p.GEN

half

 
 

c. semka

siyaN

‘u,

heNed-un = deha.

 
 

half

pork

CNJ

cook-GVl = 3p.GEN

 

d.

* semka

heNed-un = deha

ka

siyaN.

 
 

half

cook-GV = 3p.GEN

NOM

pork

 

In contrast to the availability of the S fronting, non-S arguments are basically not accessible for extraction, as shown by the ungrammaticality of (4c) above.Footnote 6 This means that, although the sentence-initial DP is not case-marked, it can be unambiguously analysed as an S in the processing of SVO. Importantly, SVO has a pragmatic function that topicalizes an S or contrasts it with other relevant objects in question (Tsukida 2009, p. 336).

1.2 The purpose and prediction of the present study

The present study tested the three hypotheses regarding the processing cost of non-basic sentences with and without supportive context using ERPs. They offer different predictions for the processing of SO and OS sentences in Seediq. To assess the processing cost of sentences, a late positive ERP component called P600 was used. Previous studies have consistently reported a P600 effect at the gap position of non-basic sentences compared to basic sentences (Kaan et al. 2000; Phillips et al. 2005; Ueno and Kluender 2003). In the present case, the gap position of the fronted S is the third region (R3) of SVO (i.e., O of SVO). Thus, a P600 should appear at R3 (i.e., O of SiVO ti vs. S of VOS). Therefore, R3 is a region of interest for the syntactic complexity hypothesis. At this region, a second DP is encountered in both VOS and SVO sentences, which enables the parser to recognise whether a conceptually more accessible agent precedes a less accessible patient. Accordingly, R3 is also of interest under the conceptual accessibility hypothesis. P600 is also known to be sensitive to semantic and pragmatic manipulations, including semantic violations and presupposition accommodation, and a conflict between syntactically supported interpretation and world knowledge (e.g., Burkhardt 2006; Domaneschi et al. 2018; Kim and Osterhout 2005; Kutas and Hillyard 1980; Van Petten and Luka 2012). Since the fronted constituents, such as S of SVO, should be a topic (discourse-old information) in Seediq, P600 could reflect a non-syntactic processing cost when the supportive context was not provided, as will be investigated in Experiment 1. R1 and R2 were not compared because the comparison between SVO and VOS involves a number of differences, including grammatical category, frequency, the number of phonemes, and the number of morphemes.

If the processing load reflects the cost of building a syntactically more complex representation due to a filler-gap dependency, we expect that SVO is more difficult to process than VOS because the Seediq parser has to associate the displaced S with its gap. Concretely, SVO is predicted to elicit a larger P600 than VOS in both AV and GV, unlike SO languages, which show a P600 for OS sentences.

On the other hand, if the order of thematic roles affects the processing load, we predict that the agent-patient order would be preferred to the patient-agent order. Thus, VOS in AV and SVO in GV would be more difficult to process than SVO in AV and VOS in GV. Statistically speaking, this hypothesis predicts an interaction between VOICE and WO. To summarize, in contrast to SO languages, the conceptual accessibility hypothesis and syntactic complexity hypothesis predict a different result in AV in Seediq. Hence, the result of AV plays an important role in determining a crucial factor of word order preference in sentence comprehension.

As a third hypothesis, the processing cost of non-basic sentences is likely due to the lack of supportive context, as demonstrated by Kaiser and Trueswell (2004) and Yano and Koizumi (2018). If this hypothesis is correct, SVO would be more difficult to process than VOS in the non-supportive context (Experiment 1). However, providing supportive context for SVO should ameliorate its processing cost (Experiment 2).

2 The present study

2.1 Stimuli

The present study examined the preference of word order in Seediq sentence comprehension using ERPs. To this end, we created four types of sentences by manipulating Voice (Actor Voice/Goal Voice) and Word Order (VOS/SVO) as shown in (7) below (192 sentences in total).

(7)

a. AV-VOS

   
 

qmqah

emqliyang niyi

ka

embanah niyi.

 

kick.AV

blue DET

NOM

red DET

 

‘The red kicks the blue.’

 

b. AV-SVO

   
 

embanah niyi

o

qmqah

emqliyang niyi.

 

red DET

CNJ

kick.AV

blue DET

 

c. GV-VOS

   
 

qqahan

embanah niyi

ka

emqliyang niyi.

 

kick.GV

red DET

NOM

blueDET

 

d. GV-SVO

   
 

emqliyang niyi

o

qqahan

embanah niyi.

 

blue DET

CNJ

kick.GV

red DET

The sentences are all transitive sentences. Eight transitive verbs were selected that are commonly used in Truku Seediq and easy to distinguish in pictures (see Fig. 1): kick (AV: qmqah, GV: qqahan), hit (AV: smipaq, GV: epaqan), push (AV: smikul, GV: skulan), chase (AV: mhraw, GV: bhragan), throw (AV: qmada, GV: qada), pull (AV: brbil, GV: bbilan), call (AV: mlawa, GV: plwaan), and scold (AV: msang, GV: ksengan). The DPs consist of four familiar color terms (embanah ‘red’, emqliyang ‘blue’, mqalux ‘black’, and bhgay ‘white’) plus a definite article (niyi ‘the’). The abstract noun phrases, such as “the red” and “the blue” were employed as S and O to avoid a thematic bias for agents or patients (i.e., they are thematically reversible). If S and O were thematically biased (e.g., The police chased the thief), participants could guess the event described by a sentence without parsing its syntactic structure, which should undermine the purpose of the present experiments. Furthermore, since VOS and SVO sentences were compared at the third region (i.e., S of VOS vs. O of SVO), the lexical properties of S and O (e.g., frequency, length) had to be matched. Common nouns that are thematically reversible and lexically matched were hard to find, due to the lack of a comprehensive dictionary in Truku Seediq.

Fig. 1
figure 1

An example of the pictures used in Experiments 1 and 2

The sentences were recorded by a male native speaker of Truku Seediq. They were slightly edited by removing a short pause between phrases to match the duration of the critical region (the third region: R3), the duration from the onset of R1 to that of R3, and the total duration across four conditions (all ps > 0.10) (see Table 1 for the duration of each region).Footnote 7 The duration of R1 (V in VOS and S + o in SVO) was significantly longer for SVO than VOS [F(1, 47) = 1380.33, p < 0.01] and the duration of R2 (O + ka in VOS and V in SVO) was significantly longer for VOS than SVO [F(1, 47) = 1284.05, p < 0.01]. The main effect of VOICE and the two-way interaction were not significant in any analysis of the duration. After editing stimuli, they were checked for naturalness by native Seediq speakers.

Table 1 Mean duration (ms) of each phrase (n = 48)

Because the non-basic SVO in Seediq needs to satisfy discourse requirements for its use, we conducted two experiments with these materials, manipulating the presence/absence of context (picture depicting an event) to assess the effect of contextual support for it. In Experiment 1, the experimental sentences were presented without a picture to participants, and thus there was no contextual support for SVO. In Experiment 2, the experimental sentences were preceded by a picture that rendered DPs discourse-given information to a listener.

2.2 Procedure

In Experiment 1, a sentence was first aurally presented through earphones. During the sound presentation, participants were instructed to gaze at the fixation presented in the centre of the screen and to not blink or move. The screen was placed approximately 100 cm in front of the participants. After a blank screen for 500 ms, a picture was presented in the centre of the screen, which either matched or mismatched the event described by the preceding sentence. To check whether the participants understood the sentences (e.g. The red kicks the blue), the participants were asked to judge whether the picture was congruent with the sentence and then to press a ‘YES’ or ‘NO’ button. Half of the sentences were followed by congruent pictures and half were followed by incongruent pictures. Incongruent pictures depicted an event in which a different agent was involved (i.e., The white kicks the blue), a different patient was involved (i.e., The red kicks the white), the agent and patient were reversed (i.e., The blue kicks the red), or the action was not correct (i.e., The reds pushes the blue). The pictures were presented until they pressed either button. The responses were collected using a response pad (Cedrus RB-740).

The picture-sentence matching task was employed because the participants were not able to read or rarely read the Seediq language and therefore it was difficult to present a comprehension question visually, like in standard ERP experiments. Although the comprehension question could be given aurally, it took more time and was burdensome for senior participants and, therefore, we did not employ the comprehension question.

In Experiment 2, the trial started with a picture presented for 2000 ms in the centre of the screen. After a blank screen for 200 ms, a sentence was aurally presented through a speaker.Footnote 8 The participants were asked to respond to the task upon seeing a response cue, which appeared 500 ms after the offset of the sentence. In Experiment 2, the experimental sentences were always preceded by congruent pictures. In addition, 48 filler sentences paired with incongruent pictures were intermixed in the list for NO responses. Although the number of YES/NO responses was not balanced in Experiment 2, this decision was made to not impose an extra load on senior participants.

All sentences were presented in a randomized order for each participant, using Presentation version 16.3 (Neurobehavioral Systems). Prior to the main experiment, 24 practice trials were completed to familiarize participants with the experimental procedure.

2.3 Participants

In Experiments 1 and 2, 25 and 28 native speakers of Truku Seediq were recruited in Hualien, Taiwan, respectively (Experiment 1: 18 females and seven males, M = 61.6, SD = 12.6; Experiment 2: 20 females and eight males, M = 59.6, SD = 10.6). Although 14 of the participants participated in both Experiments 1 and 2, this likely had no significant impact on results because Experiment 2 was conducted a year after Experiment 1. All participants were classified as right-handed based on the Edinburgh handedness inventory (Oldfield 1971), and all had normal or corrected-to-normal vision. None of them were color-blind and thus could distinguish colors in pictures to perform the task. Written informed consent was obtained from all participants prior to each experiment. This study was approved by the Ethics Committee of the Graduate School of Arts and Letters, Tohoku University.

2.4 Electrophysiological recording

The experiments were conducted in a small non-sound-proofed classroom in Hualien, Taiwan. Because the room was not shielded, recorded data included power supply noise, which was removed during pre-processing (see the next section). The room was air-conditioned throughout the experiments to avoid perspiration artifacts.

EEGs were recorded from 17 Ag electrodes (QuickAmp, Brain Products) located at F3/4, C3/4, P3/4, O1/2, F7/8, T7/8, P7/8, Fz, Cz, and Pz according to the international 10–20 system (Jasper 1958). Additional electrodes were placed below and to the left of the left eye to monitor horizontal and vertical eye movements. The online reference was set to the average of all electrodes, and EEGs were re-referenced offline to the average value of the earlobes. The impedances of all electrodes were maintained at less than 10 kΩ throughout the experiment. The EEGs were amplified with a bandpass of DC to 200 Hz, digitized at 1000 Hz.

2.5 Electrophysiological data analysis

In Experiment 1, all congruent and incongruent trials were grouped together because the participants were not able to predict a (mis)match between a sentence and its picture while listening to the sentence. In Experiment 2, only the congruent trials were analyzed (i.e., sentences for YES responses) because the participants could detect anomalies while listening to sentences in this experiment.

Independent component analysis (ICA) was applied using EEGLAB (Delorme and Makeig 2004) to reduce artifacts induced by eye and body movements. ICs to be rejected were selected in an objective way with the toolbox SASICA (Semiautomatic Selection of Independent Components for Artifact Correction, Chaumon et al. 2015) (Rejection rate: 29.0% in Experiment 1 and 26.1% in Experiment 2). EEGs were time-locked to the onset of R3 and the baseline was set to 100 ms prior to it. Trials with large artifacts (exceeding ± 100 µV) were removed from the analysis (Rejection rate: 1.4% in Experiment 1 and 4.0% in Experiment 2). All EEGs were filtered offline using a 5 Hz low-pass filter only for presentation purposes.Footnote 9 EEGs that were band-pass filtered at 0.1–30 Hz were used for statistical analyses.

The ERPs were quantified by calculating the mean amplitude for each participant relative to the baseline using four time-windows: 100–300 ms, 300–500 ms, 500–700 ms, and 700–900 ms. The analyses were conducted separately at the midline (Fz, Cz, and Pz), lateral (F3/4, C3/4, and P3/4), and temporal (F7/8, T3/4, T5/6, and O1/2) arrays. The midline analysis consisted of repeated measures ANOVAs with three within-group factors: VOICE (AV/GV) × WORD ORDER (WO) (VOS/SVO) × ANTERIORITY. The lateral and temporal analyses involved four within-group factors: VOICE (AV/GV) × WO (VOS/SVO) × ANTERIORITY × HEMISPHERE. The factors of primary interest were the main effect of WO and its interaction with VOICE. Because the main effect of ANTERIORITY and HEMISPHERE which does not involve experimental conditions were of no interest, we did not report them below. The Greenhouse–Geisser correction was applied for all effects involving more than one degree of freedom (Greenhouse and Geisser 1959). In these cases, the original degrees of freedom and the corrected p value were reported.

2.6 Results

2.6.1 Behavioral data

The accuracy of the behavioral task was examined in each experiment with three-way ANOVA: RESPONSE TYPE (RT) (YES/NO) × VOICE (AV/GV) × WO (VOS/SVO). In Experiment 1, the effects of RT and WO were significant [RT: YES 91.2% vs. NO 70.6%, F (1, 24) = 26.7, p < 0.01; WO: VOS 79.8% vs. SVO 82.05%, F (1, 24) = 7.13, p < 0.05] (Fig. 2). The effect of VOICE was marginally significant [AV 82.1% vs. GV: 79.7%, F (1, 24) = 3.63, p = 0.68]. Because the three-way interaction was also significant [F (1, 24) = 4.80, p < 0.05], post hoc analyses were conducted at each level of the RT. For YES responses, the effect of VOICE was significant only at SVO, due to a higher accuracy rate of AV-SVO than GV-SVO. The effect of WO was significant only at AV, because of a higher accuracy at AV-SVO than AV-VOS. For NO responses, none of the effects reached a significant level.Footnote 10

Fig. 2
figure 2

Mean accuracy (%) in the behavioral task. Error bars indicate standard errors

In Experiment 2, the effect of RT was significant [YES: 95.9% vs. NO: 85.7%, F (1, 27) = 7.20, p < 0.05]. The RT interacted with VOICE [F (1, 27) = 13.5, p < 0.05], indicating that the accuracy of AV was significantly higher than that of GV only in the YES response [F (1, 27) = 11.30, p < 0.05] and the accuracy of the YES response was significantly higher than the NO response in AV and marginally higher in GV [AV: F (1, 27) = 11.30, p < 0.05; F (1, 27) = 3.22, p = 0.08]. In Experiment 2, although the number of YES/NO responses was not balanced, as mentioned above, the high accuracy of the YES and NO response suggests that our participants paid enough attention to the content of the sentences.

The response time was not analyzed since the participants’ response was delayed to avoid the contamination of activities related to the task into the ERPs of R3.

2.6.2 Electrophysiological data

Experiment 1

Figure 3 shows the grand average ERPs of R3 in Experiment 1. A visual inspection suggested that the SVO showed a larger positivity than VOS in both AV and GV.

Fig. 3
figure 3

Grand average ERPs at R3 in Experiment 1

The overall flatness of ERPs in Experiment 1 (compared to Experiment 2, see Fig. 4) is probably because a greater number of ICs were rejected due to blink and movement-related artifacts in Experiment 1 (29%) than in Experiment 2 (26%). Furthermore, this may also be related to the fact that the present participants are senior because previous studies observed a flatter ERP morphology for elderly adults compared to young adults (Federmeier et al. 2002; Kemmer et al. 2004).

Fig. 4
figure 4

Grand average ERPs at R3 in Experiment 2

The X-axis represents the time duration, and each hash mark represents 100 ms. The Y-axis represents the voltage, ranging from −3 to 3 μV. Negativity is plotted upward.

The result of repeated-measures ANOVA revealed a significant effect of WO at the midline and lateral arrays in the time-window of 100–300 ms, due to a positivity for SVO compared to VOS (Table 2). At all arrays, the interaction of WO and ANTERIORITY was significant, indicating a centro-parietal distribution of the positivity [Cz: F (1, 24) = 8.29, p < 0.01, Pz: F (1, 24) = 10.78, p < 0.01, C3/4: F (1, 24) = 7.35, p < 0.05, P3/4: F (1, 24) = 10.06, p < 0.01, P7/8: F (1, 24) = 7.21, p < 0.05, O1/2: F (1, 24) = 10.96, p < 0.01].

Table 2 Statistical results (F values with degrees of freedom in parentheses) for the R3 in Experiment 1

At 300–500 ms, a similar positivity was observed in SVO. The effects of WO and the WO × ANTERIORITY interaction were significant. Furthermore, the three-way interaction of VOICE × WO × ANTERIORITY reached a significant level at the lateral and temporal arrays. The post hoc analyses at each level of VOICE revealed a greater WO effect at GV than at AV.

At 500–700 ms, the WO effect was only significant at the temporal array, which also indicates a positivity for SVO [T7/8: F (1, 24) = 5.41, p < 0.05, P7/8: F (1, 24) = 4.11, p = 0.05; O1/2: F (1, 24) = 4.58, p < 0.05]. At the lateral array, the interaction of VOICE × WO × ANTERIORITY was significant. The post hoc analyses showed a significant WO effect at the right hemisphere only at AV [F (1, 24) = 7.46, p < 0.05].

At 700–900 ms, the WO × HEMISPHERE interaction was significant at the lateral array. The post hoc analyses showed a positivity for SVO at the right hemisphere [F (1, 24) = 5.05, p < 0.05]. In addition, the interaction of VOICE × WO × ANTERIORITY was significant at the lateral array, showing a WO effect at the right hemisphere only at AV [F (1, 24) = 12.43, p < 0.01].

In sum, SVO elicited a significant positivity in comparison with VOS, irrespective of voice alternation. The peak latency of the positivity was early (M = 472 ms, SD = 153 ms).Footnote 11 This is probably because the repeated presentation of DPs (i.e., four color terms + definite article) facilitated lexico-semantic processing and the subsequent process started earlier. The positivity prolonged compared typical P600 effects in reading experiments, because of larger variation of available information in the time-course of auditory stimuli. However, the positivity was distributed at the centro-parietal regions, where the typical P600 has been observed. We took this positivity as a type of P600 that has been observed in non-basic sentences. As an anonymous reviewer pointed out, it is possible that the positivity reflects a summation of different types of ERPs because its topography changed during the time-windows of interest. This issue awaits further investigation because the topographical difference of ERP effects is not so informative as to the function of underlying cognitive processes.Footnote 12

Experiment 2

Figure 4 shows the grand average ERPs of R3 in Experiment 2. A visual inspection suggested no comparable positivity for SOV relative to VOS.

The X-axis represents the time duration, and each hash mark represents 100 ms. The Y-axis represents the voltage, ranging from −3 to 3 μV. Negativity is plotted upward

In the time-window of 100–300 ms, repeated-measures ANOVA showed a significant four-way interaction at the lateral array, due to a greater positivity for AV sentences than GV sentences at P4 [P4: F (1, 29) = 5.05, p < 0.05] (Table 3). In the 300–500 ms time-window, none of the effects of interest was observed.

Table 3 Statistical results (F values with degrees of freedom in parentheses) for the R3 in Experiment 2

At 500–700 ms, although the interaction of VOICE × WO × ANTERIORITY was significant at the midline array and that of WO × ANTERIORITY × HEMISPHERE was significant at the lateral array, none of the simple effects reached a significant level.

At 700–900 ms, the VOICE effect was significant at the midline array, indicating a larger positivity for the AV sentences than the GV sentences. The post hoc analyses of the significant interaction of WO × ANTERIORITY × HEMISPHERE at the lateral array showed a larger negativity for SVO at F3 and P3/4 [F3: F (1, 27) = 5.64, p < 0.05; P3/4: F (1, 27) = 4.35, p < 0.05]. This effect can also be interpreted as a larger positivity for VOS than SVO. Further investigation is required to decide how to interpret this effect.

3 Discussion

The present study conducted two ERP experiments to examine the effect of word order, voice, and discourse factors on Seediq sentence comprehension. More concretely, we tested three hypotheses that have been proposed to explain word order preference in the SO languages. The result of Experiment 1 showed that, unlike SO language speakers, native Seediq speakers preferred VOS (OS word order) to SVO (SO word order) when there was no supportive context for the non-basic SVO. Experiment 2 demonstrated that the supportive context significantly alleviated the processing difficulty indexed by a P600 effect.

3.1 Word order preference in Seediq sentence comprehension

The result of Experiment 1 is not consistent with the results expected by the conceptual accessibility hierarchy (agent-patient order). This hypothesis correctly predicts a VOS preference in GV because the agent precedes the patient in GV-VOS (VOAGENTSPATIENT). However, it fails to explain the VOS preference in AV (VOPATIENTSAGENT) because it predicts an opposite pattern. Contrary to the prediction that SVO would be favored over VOS in AV, the ERP evidence suggests that VOS was easier to process than SVO.

One might think that a greater WO effect for GV than AV at 300–500 ms in Experiment 1 is consistent with the hypothesis. If one speculates that the preference for the agent-patient order interacted with another factor, syntactic complexity, these two factors are expected to affect the processing in opposing directions in AV (agent-patient preference for SAGNETVOPATIENT and syntactic preference for VOS) but in the same direction in GV (agent-patient and syntactic preference for VOAGENTSPATIENT). Hence, this hybrid hypothesis predicts a greater WO effect at GV compared to AV. However, the WO effect at GV was not as robust and the analyses of subsequent time-windows showed a greater WO effect for AV. Furthermore, the hypothesis cannot explain why no word order preference existed in Experiment 2, in which contextual information was provided for the felicitous use of SVO.

The syntactic complexity hypothesis aligns well with the result of Experiment 1. In SVO, the Seediq parser was expected to associate a fronted S with its gap following an O at R3. Thus, this hypothesis was borne out by a positivity in response to SVO in AV and GV. However, this hypothesis also likely fails to account for the result of Experiment 2, in which we did not observe a positivity for SVO, unlike in Experiment 1.

One might think that because the participants could predict a sentence upon seeing a picture in Experiment 2, they could predictively associate an S and its gap, resulting in the lack of the positivity at R3. This possibility leads one to expect a positivity at R1 or R2 of SVO, but since SVO and VOS involve a categorical difference at these regions, they are difficult to compare. However, we believe that this possibility is unlikely because previous studies have consistently shown a positivity at the gap position despite the parser being able to predict a gap position prior to it. For example, in the processing of English object relative clauses and wh-questions (Kaan et al. 2000; Phillips et al. 2005), the parser can posit an O gap upon encountering an overt S. This idea is supported by an observation that the parser predicts a transitive verb that hosts a filler as its O but does not predict an intransitive verb (despite the fact that an intransitive verb can follow the S, such as in ‘The book that the author chatted regularly about …’, Omaki et al. 2015). Additionally, in non-basic OSV sentences in Japanese, the parser can posit an O gap when reading an initial word of the S (e.g., ‘sono’ in ‘sore-oisono inochishirazuno bokenka-ga __imitsuketa-ndesu-ka’, that-ACCi the reckless adventure-NOM finally __i discovered-POL-Q. Did the reckless adventurer finally discover that?). To our knowledge, however, there is no evidence for P600 at the (initial word of) S, suggesting that the filler-gap integration consistently occurs at the gap. Thus, although the context might trigger an expectation for the gap location when processing SVO in Experiment 2 of the present study, it is unlikely that a P600 appears at S or V of SVO.

As an alternative hypothesis to the syntactic complexity hypothesis, one can imagine that, unlike the non-D-linked S, the D-linked S does not have a filler-gap dependency; rather, it originates as a topic where it appears. This alternative hypothesis predicts a P600 effect for SVO in Experiment 1 but no P600 in Experiment 2, which is consistent with our observation. This possibility needs a future investigation into Seediq syntax-pragmatics interface.

The results of the present study can be most consistently explained by the discourse hypothesis (cf. Kaiser and Trueswell 2004). In SVO, the S functions as a topic of a sentence, which means that SVO presupposes that there is a shared referent in a discourse that directly or implicitly refers to an S (i.e., discourse-given information). Because the visual context presented prior to the presentation of a sentence satisfied the presupposition for this structure in Experiment 2, SVO was not expected to induce an extra processing load compared to VOS. In Experiment 1, on the other hand, the presupposition was not satisfied. Hence, the syntactic information of SVO signalled that the parser had to associate an S at the derived position into its original position, whereas the infelicitous use of SVO did not validate that the S is located at the topic position because the movement was not well-motivated. Consequently, the participants had to accommodate the unsatisfied presupposition encoded by SVO to build a coherent discourse representation, which induced an additional processing difficulty. An increasing number of recent ERP studies have argued that the P600 is not a manifestation of pure syntactic processing difficulty (Bornkessel-Schlesewsky and Schlesewsky 2008; Brouwer et al. 2012; Brouwer et al. 2017; Kuperberg 2007; Vissers et al. 2006). Instead, it indexes a process of integrating several types of information, such as syntax and semantics. Thus, under the discourse hypothesis, the P600 likely reflects the resolution of a conflict between syntactic structure and information structure of SVO. This interpretation is also consistent with the result of Yano and Koizumi (2018), who observed a P600 in the non-basic OSV in Japanese when it was used within the infelicitous context but not within the felicitous context. If discourse factors affect a P600 in the processing of filler-gap dependency, the traditional functional interpretation of P600 as an index of syntactic integration difficulty (Kaan et al. 2000) needs to be clarified in future work.

3.2 The interaction of sentence processing and event apprehension

At the end of each trial, the participants judged whether the content of a sentence matched a picture. This task required them to apprehend an event depicted in the picture and then compare the events of the picture and sentence. Interestingly, the accuracy of AV-SVO was higher than that of GV-SVO and AV-VOS in Experiment 1 despite the observation that AV-SVO was more difficult to process than AV-VOS. This pattern was not observed in Experiment 2.

The behavioral result of Experiment 1 is similar to the result of a previous experiment using the same task in Kaqchikel, a Mayan language spoken in Guatemala (Yano et al. 2017; see also Yasunaga et al. 2015). In Kaqchikel, VOS is a syntactically basic order and other orders, including SVO, VSO, and OVS, are derived through the movement of DPs. Yano et al. (2017) observed that VOS was easier to process than the other three possible orders (in the active voice). The behavioral result, however, revealed a higher accuracy for SVO, VSO, and VOS than OVS, despite the fact that the orders other than VOS, such as SVO and VSO, are syntactically non-basic (SVO: 94.8%, VSO: 93.2%, VOS: 90.1%, OVS: 74.5%). Assuming that the agent-patient order is favored in the event apprehension of the picture (cf. Sauppe et al. 2013), they hypothesized that, because the S and O correspond to the agent and patient, respectively, in the active voice, the SO order had an advantage in checking potential mismatches between the picture and the sentence. Their interpretation can apply to the result of Experiment 1. In AV-SVO, the agent comes before the patient, whereas the agent comes after the patient in AV-VOS and GV-SVO, which showed a lower accuracy rate than AV-SVO. Thus, this result suggests the agent-patient order preference.

However, it remains unclear how to explain that the agent-patient order preference did not exist between AV-VOS and GV-VOS in Experiment 1 or Experiment 2. This issue requires further investigation into the relationship between sentence processing and event apprehension.

4 Conclusion

The present study conducted two ERP experiments to explore word order preference in Truku Seediq. The result demonstrated that SVO was more difficult to process than VOS when there was no supportive context for it but not when discourse requirements were satisfied. This result was not predicted by the hypotheses that the word order preference derives from the conceptual accessibility (the order of agent-patient order) and syntactic complexity (filler-gap dependency formation). Instead, we took the present result as evidence that the processing cost of non-basic word orders is associated with a discourse-level processing difficulty.