Introduction

Linguists working on Chinese have placed much emphasis on syntactic differences between Chinese and Indo-European languages (Chappell et al., 2007; Wu & He, 2015), with many studies discussing the unique characteristics of certain Chinese syntactic constructions (e.g., the ba construction and the serial verb construction) (Paul, 2008; Shi, 2000; Sun, 2018). Somewhat surprisingly, however, a more fundamental issue pertaining to the nature of basic syntactic units, i.e., the notion of sentencehood or sentence boundaries in Chinese, has not been systematically examined in comparison to the notion in other languages. In written texts in English and many other languages, sentence-final punctuation marks such as the period are used to indicate the completeness of a sentential structure, constrained largely by well-established syntactic rules (Huddleston & Pullum, 2002, pp. 1723–1732; Partridege, 1998, pp. 9–13). For example, a simple declarative sentence in English is “a complete unit of meaning which contains a subject and a verb, followed, if necessary, by other words which make up the meaning” (Alexander, 2019, p. 4), marked by a period at the end. However, the concepts of sentence and sentence boundary in Mandarin Chinese are both quite distinct from those in English. A complete sentence in written Chinese as punctuated by sentence-final punctuation represents the writer and reader’s judgment of the completeness of the meaning or idea being expressed rather than of the completeness of a sentential structure. Indeed, a sentence is often described as “the completeness of an idea or meaning” (Lu, 2013, p. 21) by Chinese grammarians and linguists (e.g., Huang & Shi, 2016; Li & Thompson, 1989).

As illustrated in Example (1), multiple clauses can be joined using commas without conjunctions in Chinese texts, with the period (i.e., “。” in Chinese) occurring at the end of the block of clauses to indicate the completeness of the meaning or idea therein rather than the completeness of a sentential structure (Huang & Liao, 2007; Lu & Zhu, 2013, p. 322; Xue & Yang, 2011).

[Ex1:]

[i]

(a)

 

天,

        
  

dàn

dào

 

le

èr

tiān

        
  

but

arrive

 

PFV

No

two

day

        
 

(b)

 

床,

         
  

rén

suī

 

le

chuánɡ

         
  

he

although

 

rise

PFV

bed

         
 

(c)

  

沉沉

         
  

ø1

 

tóu

hái

chénchén

de

         
  

(grandpa)

 

head

still

dazed

PTCP

         

[ii]

(d)

祖父

当真

          
  

zǔfù

dànɡzhēn

bìnɡ

le

          
  

Grandpa

really

already

sick

PFV

          

[iii]

(e)

翠翠

显得

懂事

些,

          
  

Cuìcuì

xiǎnde

dǒnɡshì

le

xiē

          
  

Cuicui

seem

sensible

PTCP

more

          
 

(f)

 

祖父

一罐

大发药,

        
  

ø2

wèi

zǔfù

jiān

le

yīguàn

dàfāyào

        
  

(Cuicui)

for

grandpa

concoct

PFV

a M

medicinal herbs

        
 

(g)

 

祖父

喝,

          
  

ø2

zhe

zǔ fù

hē,

          
   

force

PTCP

grandpa

drink

          
 

(h)

 

屋后

菜园

摘取

蒜苗

泡在

米汤

蒜苗,

  

ø2

yòu

zài

wūhòu

càiyuán

zhāiqǔ

suànmiáo

pàozài

mǐtānɡ

zuò

suān

suànmiáo

   

also

at

houseback

vegetable

plot

in

pluck

garlic-shoot

soak

rice-soup

in

make

sour

garlic-shoot

 

(i)

 

 一面

 照料

 船只,

           
  

ø2

yīmiàn

zhàoliào

chuánzhī

           
   

while

take-care-of

boat

           
 

(j)

 

一面

时时刻刻

抽空

回家

祖父,

    
  

\(\phi\) 2

yīmiàn

hái

shíshíkèkè

chōukònɡ

ɡǎn

huíjiā

lái

kàn

zǔfù

    
   

while

also

constantly

find-time

rush

home

in

come

see

grandpa

    
 

(k)

 

这样

那样

           
  

\(\phi\) 2

wèn

zhèyànɡ

nàyànɡ

           
   

ask

this

that

           

[Translation] But on the next day, although he (grandpa) was out of bed, his head was still heavy. Grandpa was really sick. Cuicui, rising to the occasion, prepared a cooling concoction and made him take it; she then picked some garlic shoots from the vegetable garden behind the house and soaked them in rice water to make sour garlic shoots; she took care of the boats and found time to rush back home to check on grandpa whenever possible, asking how he was doing.Footnote 1

Example (1) contains three sentences punctuated with the period. The fact that the third sentence [iii] contains seven clauses joined with commas without conjunctions well exemplifies the point that the period in Chinese marks the completeness of a meaning or idea (as judged by the writer) rather than the end of a complete sentential structure. This point can be further illustrated by two additional facts: (1) a comma could have been used to join the first two sentences without adding any conjunction, and (2) a period could have been inserted at the end of several clauses in the third sentence, such as (e), (g) or (h) simply by replacing the empty category ø2 (refer to “Cuicui”) with a pronoun. The alternative ways to punctuate the sentences in Example (1) also indicate that the judgment of meaning completeness is a more subjective task than that of the completeness of a sentential structure, given the absence of well-defined rules. All in all, the same rules that govern the use of the period as a sentence-final punctuation mark in English written texts do not fully apply in Chinese written texts.

Previous psycholinguistic research on sentence boundary perception in English has focused on spoken language, as the task in written language is relatively established with the existence of clear syntactic constraints. Such research has reported important effects of prosodic cues (e.g., pauses) and syntactic/semantic contextual variables on native English listeners’ sentence segmentation and utterance understanding (e.g., Marslen-Wilson, 1975; Steinhauer & Friederici, 2001). Psycholinguistic research on Chinese sentence boundary perception has so far focused on prosodic and phonological boundaries in spoken Chinese as well (Lai et al., 2016). More broadly, a body of theoretical and experimental studies have explored the prosodic, syntactic, and semantic functions of punctuation in spoken or written texts and the ways in which punctuation may affect sentence processing and comprehension (e.g., Baron, 2001; Heggie & Wade-Woolley, 2018; Hirotani et al., 2006; Liu et al., 2010; Niikuni & Muramoto, 2014; Pynte & Kennedy, 2007; Scholes & Willis, 1990; Schou, 2007). Some researchers have also profiled the frequency distribution of punctuation marks (e.g., Kulig et al., 2017; Sun & Wang, 2019) and developed algorithms for automatic text punctuation (e.g., Christensen et al., 2001; Liu et al., 2006). Despite the issues surrounding Chinese sentence boundaries discussed above, however, the potential factors that affect native Chinese speakers’ meaning completeness judgments and sentence boundary perception have not been systematically explored. The importance of this issue is similar to that of the issues investigated in studies of sentence boundary perception in spoken English language (Marslen-Wilson, 1975; Steinhauer & Friederici, 2001) or word boundary segmentation in Chinese (given the lack of word boundary markers in Chinese) (Li et al., 2009; Ma et al., 2014; Yen et al., 2012).

In light of the research gap, the current study sets out to determine the potential role of various syntactic and semantic factors in native Chinese speakers’ sentence boundary perception. To this end, we administered a re-punctuation task to two groups of native Chinese speakers (a training group and a testing group) using two different stimuli texts. We annotated the stimuli texts for a number of syntactic, semantic and textual factors, and employed logistic regression and the Bayesian statistical methods to examine the potential effects of such factors on the participants’ responses. The logistic regression model trained on the data from the training group was then used to predict the responses by the testing group, and the performance of the model is compared against that of a machine learning model.

Research questions and hypothesis

Specifically, the current study aims to address the following two research questions:

  1. (1)

    What syntactic and semantic factors may affect native Chinese readers’ meaning completeness judgments and sentence boundary perception?

  2. (2)

    How well can a model of such factors predict native Chinese readers’ sentence boundary perception?

Based on our own observations and informed by the findings from the specific body of studies on sentence boundary perception in spoken language and the broader body of studies of the functions of punctuation marks in spoken and written texts, we hypothesize that native Chinese speakers’ period use is affected by a combination of syntactic, semantic, and textual features.

Our first hypothesis is that native Chinese speakers’ sentence boundary perception may be influenced by the syntactic structure and length of a clause, particularly with respect to whether a single clause may be a standalone sentence. A single clause with either a full subject-predicate structure or a phrasal structure (e.g., a verb phrase or noun phrase) could be a standalone sentence on its own, and a single-clause sentence may be either long or short. However, it remains to be seen whether longer clauses or clauses with a full subject-predicate structure are more likely to be perceived as shorter clauses or clauses with a phrasal structure, given the difference in the amount of information encoded in such clauses.

Our second hypothesis is that the semantic relations between clauses may influence native Chinese speakers’ judgments of whether the clauses are parts of the same complete meaning or different meanings. In particular, we hypothesize that two clauses with the following five semantic relations, adopted (along with their abbreviations) from The Penn Discourse Treebank (PDTB, Webber et al., 2019) and the Chinese Discourse Treebank (Zhou & Xue, 2015), will likely be judged to be parts of the same complete meaning: (1) temporal te, including succession, precedence, and simultaneity; (2) contingency (ce), including cause-effect, conditional, and purpose; (3) comparison (cm), including contrast and concession; (4) expansion (ex), including conjunction, succession, coordination, progression; and (5) elaboration (el), including further explanations or provisions of additional details in different categories. We also hypothesize that the use of explicit markers to indicate these semantic relations may make it more likely for the two clauses to be judged as parts of the same complete meaning.

Our third hypothesis is that certain types of semantic shifts at the textual level may affect native Chinese speakers’ judgments of meaning completeness. The first type of semantic shift hypothesized to affect meaning completeness judgment is “topic shift”. In Chinese, a block of clauses may form a “topic chain” when they share the same topic, which is explicitly mentioned in a topic clause but implicitly referred to with an empty category in several comment clauses (Li, 2004; Sun, 2019), as illustrated in the third sentence in Example (1), in which the topic “Cuicui” is explicitly mentioned in clause (e) and stays the same through the end of the topic chain. A “topic shift” occurs if a different topic arises in the next block of clauses. A block of clauses may also be put in the same sentence if their topics are different but related semantically and thematically, as illustrated in the first sentence in Example (2), in which the topics of the clauses all pertain to the natural environment. In this case, a topic shift occurs when the topic of the next block of clauses changes thematically, as illustrated in the second sentence in Example (2), whose topic is “Wukui,” a person. The point at which a topic shift occurs may be taken as a point of meaning completeness, indicated by the use of a period, as Example (2) exemplifies.

[Ex 2:]

[i]

(a)

一日,

太阳

没有

出来,

村口

河岸

   
  

zhōnɡ

yīrì,

tàiyánɡ

hái

méiyǒu

chū lái,

cūnkǒu

héàn

   
  

finally

one-day,

sun

still

NOT

come-out,

village-entrance

riverbank

   
  

一层

薄雾

闪动

蓝光。

      
  

yīcéng

báowù

shǎndònɡ

zhe

lánɡuānɡ 。

      
  

a-layer

thin-mist

shine

PTCP

blue-light

      

[ii]

(b)

五魁

瞧见

女人

篮子

河边

衣服

了。

  

Wǔkuí

qiáojiàn

nǚrén

zhe

lánzi

dào

hébiān

yīfu

le

  

Wukui

see

woman

carry

PTCP

basket

arrive

riverbank

wash

clothes

PFV

Second, a “character shift” occurs when the character or person of concern shifts from that in one block of clause to a different one in a subsequent block of clause. When the characters are also the topics of the two blocks of clauses, a character shift becomes a subtype of topic shift. In Example (1), the character of concern is “grandpa” in clause (d) and changes to “Cuicui” in next block of clauses (e-k). A character shift may indicate meaning completeness, as the period at the end of clause (d) in Example (1) illustrates.

Third, a “category shift” occurs when the category of activities or behaviors described changes from one block of clauses to another block (e.g., from physical activities to psychological activities), even if these behaviors or activities are performed by the same person. This is illustrated in Example (3), in which clauses (d-f) talk about physical activities of the “Third Master,” while clause (f) shifts to describing his psychological activities. Such a category shift may indicate meaning completeness.

[Ex 3:]

[i]

(a)

 

盐坨

一天一夜,

     
   

zài

yántuó

cáng

le

yī tiān yī yè

     
  

ø1(he)

at

salt-pile

in

hide

PFV

one-day-and-one-night

     
 

(b)

 

饿

末子 往

 

抹抹。

   

è

le

jiù

zhuā

diǎn

yán

mòzi

wǎng

zuǐ

shàng

mǒmǒ

  

ø1

hungry

PFV

then

grab

some

Salt

bits

toward

mouth

up

rub

[ii]

(c)

 

第二

清早

出来,

     
   

dìèr

tiān

qīngzǎo

cái

chūlái

     
  

ø1

Next

day

morning

just

clime

out

     
 

(d)

 

走到

宫北,

        
   

gāng

zǒudào

gōngběi

        
  

ø1

just

walk-arrive

Gōngběi

        
 

(e)

 

有人

“三爷”。

      
   

tīng

yǒurén

jiào

sānyé

      
  

ø1

suddenly

hear

someone

call

“Third Master”

      

[iii]

(f)

 

心里

一惊。

        
   

xīnlǐ

yījīng

        
   

he

heart

shocked

        
 

(g)

 

因为

几个

“三爷”

了。

   

yīnwèi

zhè

jǐgè

yuè

méi

tīng

rén

jiào

sānyé

le

   

because

these

few

months

NOT

hear

people

call

him

“Third Master”

PFV

Finally, a “time shift” or “space shift” occurs when there is an obvious change in time or space from one block of clauses to another, even if the topic, character of concern, and category of activities remain the same. For example, a block of clauses may describe the physical activities of a person at one time and/or in one place, while the next block may continue talking about the same person’s physical activities at a different time and/or in a different place. A time or space shift, generally marked explicitly by a time or place expression, may indicate completeness. For example, in Example (3), there is a time shift between clauses (a-b) and (c-e), as indicated by “the next morning” at the beginning of clause (c), and the preceding clause is punctuated with a period.

Notably, the factors of character, time, and space have been extensively explored in text processing studies. In particular, the Event Indexing Model (EIM) proposes that people use the general perceptual apparatus to build situation models from narrative texts (Zwaan et al., 1995). In the EIM, events are conceptualized as activated memory nodes, and a story is represented as a set of memory nodes and the connections between them. Each memory node is coded for time, space, characters, objects, and goals (or causes), and a change in these elements activates a new memory node. Our hypothesis that a shift in character, category of activities, time, or space may indicate meaning completeness and therefore prompt the start of a new meaning aligns with the ideas of the EIM.

Methods

Materials

We selected 15 short passages from a number of well-known modern Chinese novels and removed all punctuation marks from those passages. The original passages contained commas and periods only. Eight passages were assigned to the training group and the other seven to the testing group (see the Participants section below).Footnote 2 The actual passages assigned to the training and testing groups (henceforth stimuli texts) are presented in Online Supplementary Material A and B and detailed information about the original, modified, and annotated passages is summarized in Table 1.

Table 1 Information about the passages used in the training and testing groups

Participants

Altogether, 130 native Mandarin Chinese speakers (95 female, 35 male) volunteered to participate in the study, and all participants received a small remuneration for their time. Participant age ranged from 21 to 29 years (M = 24.5, SD = 0.75). Among the participants, 56 were undergraduate students majoring in English-Chinese translation, 20 were undergraduate students majoring in computer science, 52 were postgraduate students majoring in Chinese linguistics or English-Chinese translation, and two had a PhD in linguistics. Given their native speaker status and educational background, all participants were highly proficient in reading Chinese and familiar with the use of punctuation marks in written Chinese. The 130 participants were divided into two groups, with 80 in the training group and 50 in the testing group.

The re-punctuation task

The participants were asked to re-punctuate the stimuli texts assigned to their corresponding group. Specifically, each participant received a sheet of paper containing the stimuli texts with all punctuation marks replaced with blanks and was required to fill in those blanks using commas and periods only with a pen. There was no time requirement for the re-punctuation task, and all participants completed the task in 10 to 20 min. The participants also provided information about their age, gender, and educational background at this time. After collecting the sheets from the participants, we recorded their responses and background information in the computer.

Stimuli text annotation

To test the three hypotheses discussed earlier, we analyzed and annotated each stimuli text in a number of ways, including the length and syntactic status of each clause prior to each blank, the semantic relation between each adjacent pair of clauses (i.e., temporal, contingency, comparison, expansion, elaboration, or other/no relation), the use of any explicit semantic markers indicating one of the five semantic relations of interest, and all instances of the types of semantic shift discussed above (i.e., topic, character, category, time, or space). The annotation was done by two L1 Chinese postgraduate students majoring in Chinese linguistics. The annotators received training from the first author and achieved over 80% agreement on a pilot passage. They then independently annotated all passages and met to resolve all discrepancies. This annotation work was carried out before the re-punctuation test was taken. The participants in the experiments were not provided with the annotation information. The annotators did not know how the participants filled in punctuation marks.

Statistical and machine-learning methods

For both the training group and the testing group, the data gathered from the re-punctuation experiment and the annotated stimuli texts were merged.Footnote 3 We refer to the merged data for the two groups the Training dataset and the Testing dataset. The former was used to train statistical and machine learning models, and the latter was used to test those models. As the two datasets used different stimuli texts, this would give us a good sense of the reliability of the trained models on new data. Furthermore, we also randomly divided the Training dataset into two smaller parts, referred to as the smaller training dataset (75% of the Training dataset) and the smaller testing dataset (25% of the Training dataset), and used them to train and test statistical and machine learning models as well. Given that the smaller training and testing datasets used the same stimuli texts, we could expect the trained model to perform somewhat better in this case, but a comparison of the performance of the models trained and tested in these two configurations would shed useful light on the stability of the models.

Across the Training and Testing datasets and the smaller training and testing datasets, the same 11 independent variables were hypothesized to affect the same response variable. The response variable, named ‘Punctuation,’ was a binary variable, coded as either “comma” or “period” depending on what a participant provided in each blank in the stimuli texts. Among the 11 independent variables, six were binary variables, namely, topic shift, character shift, category shift, time shift, space shift, and explicit markers, all of which were coded as either “1” (if present) or “0” (if absent). Four were categorical variables, namely, gender (male or female), education (undergraduate, postgraduate, or Ph.D.), syntactic status (subject-predicate, verb phrase, noun phrase, or adjectival phrase), and semantic relation (temporal, contingency, comparison, expansion, or elaboration). The last variable, clause length (i.e., number of characters in the clause) was numeric. The distribution of the semantic shift variables, explicit markers, and different categories of semantic relations in the Training and Testing datasets can be found in Table 1.

Given the binary nature of the response variable and the diversity of the types of independent variables in our dataset, categorical logistic regression modeling appears to be especially appropriate for assessing the effects of the independent variables on the response variable (Palei & Das, 2009; Sperandei, 2014). As such, we used the “glm” function in R to train and test two logistic regression models using the data partitions described above. Several additional analyses were then performed to verify the effects observed in the logistic regression models and evaluate the performance of their predictions on the testing datasets. First, we confirmed the significant fixed effects observed in the logistic regression models using Bayesian methods (Gelman et al., 2013), implemented with the R package “brms” (Bürkner, 2017). Second, we used the R package “lme4” (Bates et al., 2014) to train and test two linear mixed-effect models using the same data partitions and compared the performance of these models against that of the logistic regression models. Third, we used receiver operating characteristic (ROC), k-fold cross-validation, and bootstrap cross-validation to test the validity of the statistical models. Finally, we also trained two decision tree-based machine learning models (Hothorn et al., 2006; Song & Ying, 2015) using the same data partitions and compared their performance with that of the logistic regression models. The road map of the current study is summarized in Fig. 1.

Fig. 1
figure 1

Road map for the current study

Results

Predictors of native Chinese readers’ sentence boundary perception

To determine whether significant differences in period use existed among the participants, we ran two ANOVAs separately on the Training set and the Testing set using the R function aov(). The tests revealed significant differences in period use among the 80 participants in the Training set (F(2, 87) = 72.48, p < 2e−16) as well as among the 50 participants in the Testing set (F(2, 86) = 85.55, p < 2e−16). These results confirmed that the native Chinese readers varied substantially in terms of their period use.

We trained a fitting regression model on the smaller training dataset (Model 1) and a second fitting regression model on the entire Training dataset (Model 2) with the same variables discussed above. The following variables: educational level, gender, and syntactic status were not significant predictors in the logistic regression tests. All five types of semantic shifts were significant predictors in both models. Three categories of semantic relations were also found to be significantly predictors in both models, while the contingency and temporal relations were not significant predictors. Explicit semantic relation markers and clause length were also significant predictors in both models. The significant predictors in the models are listed in Table 2 along with their coefficients and the corresponding 95% confidence intervals.

Table 2 Significant predictors in the logistic regression models

To better compare the strength and directionality of the significant predictors, we plotted their coefficients in Fig. 2. It is clear that the semantic shift variables have stronger effects than all other significant predictors. While the five semantic shift variables and clause length all had positive effects, explicit markers and the three semantic relation categories all had negative effects.

Fig. 2
figure 2

Coefficient uncertainty as normal distributions. The left panel shows the raw coefficients, and the right panel shows the exponentiation coefficients. SemRelcm = the comparison relation; SemRelel = the elaboration relation; SemRelex = the expansion relation

We further used Bayesian methods to confirm the significant fixed effects observed in the logistic regression model with the “glm” function in R. The same significant effects were replicated in the best Bayesian fitting model, as indicated by the “rhat” values of 1.0 of the same significant predictors in the “brms” package (Bürkner, 2017). Figure 3 presents the posterior distribution of the significant predictors in the best Bayesian fitting model. The effects of the significant predictors in this model largely converged with those observed in the logistic regression model. Again, five semantic shift variables and clause length positively affected native Chinese readers’ sentence boundary judgments, while three semantic relation categories and the use of explicit markers negatively affected such judgments.

Fig. 3
figure 3

The posterior distribution of significant predictors in the Bayesian fitting model

To evaluate the reliability of the two logistic regression models trained, we used Model 1 and Model 2 to predict native Chinese readers’ period use on the smaller testing data and the Testing data, respectively. Model 1 correctly predicted 76.36% of the periods used in the smaller testing dataset, while Model 2 correctly predicted 75.47% of the periods used in the Testing dataset, indicating good stability of the models trained. The performance of these models was comparable to the performance of linear mixed effect models (implemented with the “lme4” function in R) trained and tested on the same datasets, which predicted 77.01% of the periods used in the smaller testing dataset and 75.01% of the periods used in the Testing dataset. The R scripts for training and testing the logistic regression models and linear mixed-effect models can be found in Online Supplementary Material C. The validity of the logistic regression model was further confirmed through ROC, k-fold cross-validation, and bootstrap cross-validation, as detailed in Online Supplementary Material D.

Results of a decision tree-based machine learning model

The results of the logistic regression model (Model 2) were also compared against those of a decision tree-based machine learning model that considered the same variables. Binary classification with the decision tree-based model was executed using an R package (party) (Hothorn et al., 2006) (see Online Supplementary Material C for the R scripts). The decision tree model trained on the smaller training dataset achieved 84.75% precision on the smaller testing dataset, higher than the 76.36% achieved by the logistic regression model. The decision tree model trained on the Training dataset (shown in Fig. 4) achieved 81.63% precision on the Testing dataset, also higher than the 75.47% achieved by the logistic regression model. While the logistic regression model achieved somewhat lower precision than the decision tree-based machine learning model, it is nevertheless more cognitively interpretable than the latter. Meanwhile, the results of the decision tree-based machine learning model further attested to the effectiveness of the variables considered in the current study for predicting native Chinese readers’ sentence boundary perception.

Fig. 4
figure 4

A decision tree-based machine learning model for sentence boundary detection

Discussion

The logistic regression analysis yielded a number of factors that significantly affected native Chinese readers’ sentence boundary perception. Several predictors negatively affected the participants’ sentence boundary judgments, indicating native Chinese readers were more likely to use a comma instead of a period in a blank when these predictors were present. Specifically, when an explicit marker of semantic relation was present or when the semantic relation between two clauses was that of comparison, expansion, or elaboration, the participants were more likely to use a comma than a period between those two clauses. These results appear to align with the positive effects observed for semantic shifts in topic, character, category, time, and space. In other words, when a clause appears to be a semantic continuation of the previous clause, particularly when that continuation is marked by an explicit semantic marker, native Chinese readers are more likely to see the two clauses as parts of the same complete meaning and thus less likely to insert a period as a sentence boundary marker between them. On the other hand, when a clause exhibits a clear semantic shift from the previous clause, native Chinese readers are more likely to see the current clause as initiating a new meaning or idea and thus more likely to insert a period at the end of the preceding clause to indicate the completeness of a meaning at that point. It can be seen that explicit markers, semantic relations, and semantic shifts function not at the level of syntactic structures but at the level of meaning and discourse flow (e.g., Moder & Martinovic-Zic, 2004). In addition, native Chinese readers were also more likely to use periods after longer clauses than after shorter ones. This may not be surprising, as on average longer clauses provide more room for meaning expression and also impose higher cognitive loads to language users than shorter clauses (Mikk, 2008). In terms of the magnitude of the effects observed for the significant predictors, the semantic shift variables exhibited the largest effect, followed by the three semantic relation categories, while explicit markers and clause length demonstrated smaller effects. These differences shed light on the greater importance of semantic shifts and relations than explicit markers and clause length in native Chinese readers’ sentence boundary perception.

Gender, educational level, and the “syntactic status” of the clause (i.e., subject-predicate, verb phrase, noun phrase, or adjectival phrase) did not significantly affect the participants’ sentence boundary perception. The insignificant effect of “syntactic status” is particularly worth noting. As mentioned earlier, in many other languages such as English, sentence boundaries in written texts tend to be constrained by well-established syntactic rules. That is, a complete sentence is generally expected to have a complete sentential structure, with some stylistic and/or contextual exceptions. Our results indicate that native Chinese readers’ sentence boundary perception in written Chinese texts is affected primarily by semantic factors pertaining to meaning completeness at the discourse level rather than syntactic factors pertaining to structural completeness. In this sense, the concept of sentencehood in Chinese as tacitly understood by native Chinese speakers appears to be different from that defined syntactically in other languages.

The model trained and tested on data from the same stimuli texts and one trained and tested on data from different stimuli texts demonstrated comparably satisfactory performance (76.36% vs. 75.47%) in predicting the participants’ period use. The effects observed for the significant predictors in the logistic regression model were also confirmed using Bayesian methods and validated using ROC, k-fold cross-validation, and bootstrap cross-validation. A decision tree-based machine learning model, which achieved somewhat better performance in predicting the participants sentence boundary perception than the logistic regression model, further confirmed the usefulness of the independent variables considered in the current study. Overall, these results suggest that the logistic regression was stable and reliable and may serve as an efficient and generalizable model for sentence boundary detection in written Chinese texts. It can also be treated as a simple cognitive model for understanding how native Chinese speakers judge meaning completeness and determine sentence boundaries. Meanwhile, the model’s 24.53% error rate on the Testing dataset may be explained by the subjectivity in meaning completeness judgment discussed in the Introduction and evidenced in the significant differences among the participants revealed by the ANOVA results. It could also be the case that additional factors beyond those explored in the current study may be at play.

A theoretical implication of our findings is that sentences in written Chinese texts may resemble “text-like sentences” often found in spoken or digital English (e.g., Lotherington & Xu, 2004), characterized not by complete sentential structures but by blocks of clauses judged to contain a complete meaning. This is the case as native Chinese speakers primarily draw on semantic and discourse information rather than syntactic structural information to determine meaning completeness and sentence boundaries in a block of clauses. Theoretical explanations of the significant positive effects observed for semantic shifts on native Chinese readers’ sentence boundary perception are possible from the lens of the Event Indexing Model (EIM; Zwaan et al., 1995). As mentioned earlier, this model conceptualizes events as activated memory nodes, each coded for time, space, characters, objects, and goals (or causes); a change in these elements activates a new memory node, and the memory nodes and their connections form the representation of the story in a narrative text. Several semantic shifts (i.e., time, space, and character shifts) defined in the current study overlap with the elements in the EIM, and it may be the case that the activation of a new memory node is associated with the initiation of a new meaning or idea in Chinese. Finally, the significant negative effects observed for semantic relations concur with findings from previous efforts in annotating semantic relations between clauses in discourse corpora in Chinese that such relations often reside within sentence boundaries (Webber et al., 2019; Zhou & Xue, 2015).

Conclusion

Using data from a re-punctuation experiment and annotated stimuli texts, this study examined the effects of a set of syntactic and semantic factors on native Chinese readers’ sentence boundary perception. Our findings revealed significant positive effects of five types of semantic shifts and clause length on native Chinese readers’ sentence boundary perception and significant negative effects of explicit markers and several categories of semantic relations on such perception. No significant effects were observed for the syntactic status of the clauses or the two basic sociolinguistic factors of gender and educational level. These results showed that native Chinese readers relied primarily on semantic factors relevant to meaning completeness instead of syntactic factors pertaining to structural completeness in judging sentence boundaries in written Chinese texts. The findings also suggest that sentences in written Chinese texts may to a large extent resemble “text-like sentences” in spoken or digital English, characterized not by complete sentential structures but by blocks of clauses judged to contain a complete meaning. Building upon the initial findings of the current study, future research can fruitfully employ additional experimental methods to investigate the effects of additional factors and their interactions (e.g., working memory) to better understand the mechanisms underlying native Chinese speakers’ meaning completeness judgment and sentence boundary perception, including in particular the subjectivity and individual variation that exists in such judgment and perception. The ways in which native Chinese speakers’ linguistic knowledge of the materials being processed may affect their sentence boundary perception is also worth investigating (e.g., by having them punctuate translations of foreign novels, as one reviewer suggested). Finally, future research can also investigate differences in the mechanisms underlying native Chinese speakers’ sentence boundary perception in spoken and written Chinese as well as differences in the mechanism underlying Chinese-English bilingual speakers’ sentence boundary perception in spoken and/or written Chinese and English.