Introduction

Language facilitates communication between individuals and the outside community. Children with typical developmental patterns gradually begin to communicate with the outside community through language to meet their own development needs between 9 and 20 months after birth. Children with autism spectrum disorder (ASD), however, are less likely to interact with others in the same manner as typically developing (TD) children (Cummings, 2014; Gernsbacher & Pripas-Kapit, 2012; Whyte, Nelson, & Khan, 2013). The core manifestation of their language barrier is the loss of social functions in language communication (Finnegan, Asaro-Saddler, & Zajic, 2020; Tager-Flusberg, 2004). In the process of using language, the first step in social interaction is to distinguish the relationship between the people in a discourse. The proper use and processing of pronoun reference is the basis for this social function of language (Zimmerman, Wolf, Bock, Peham, & Benecke 2013). Therefore, clarifying the mechanism by which children with ASD process pronoun reference, hence scientifically guiding them to improve their language communication skills and integrate into society, is a major focus in autism research.

Reference is a common language phenomenon in the process of daily language communication. It plays a very important role in simplifying expression, connecting context, and the coherence of meaning. In the process of daily communication, listeners must quickly and accurately identify the referent of the pronoun in the current context to ensure smooth communication (Zimmerman, Wolf, Bock, Peham, & Benecke 2013). In most cases, listeners use a variety of information (such as syntax restrictions and semantic information) to assign a unique referent to a pronoun, that is, to determine its antecedent. However, the pronoun reference can cause ambiguity in understanding when there are multiple possible referents in the context. At that time, the listener must use prominent clues to determine the relationship between pronouns and antecedents to determine the correct (or intended) referent (Swaab et al., 2004). In the processing of pronoun reference, the prominence of clue information is mainly affected by syntax and semantics. Of these, syntactic prominence can be reflected by the superficial syntactic structure in which the clue information is located. For example, an antecedent in the subject position or an antecedent mentioned for the first time has a higher syntactic prominence (Foraker & McElree, 2007; Gordon et al., 1993). However, semantic prominence must be reflected through the establishment of deep semantic associations, which refer to different individuals activating related antecedents in the cognitive process and establishing psychological connections with them (Speas, 1990). This kind of deep semantic association is particularly important for the resolution of reference ambiguity. If an antecedent can establish a close psychological connection with the listener, it can help the listener quickly activate the relevant concept, which can then be used to determine the reference object.

The comprehension of pronouns and reflexives in children with autism

From a developmental perspective, children with typical developmental patterns can complete pronoun reference processing and use syntactic and semantic clues at approximately four years of age (Lukyanenko, Cynthia, Conroy, Anastasia, Lidz, & Jeffrey, 2014; Pyykkönen-Klauck, Matthews, & Järvikivi, 2010). Regarding language research on autism, existing studies have not yet reached a consensus on the ability of reference processing in children with autism spectrum disorders. Studies have found that children with autism spectrum disorders have obvious difficulties in reference processing, and they cannot correctly use semantic clues to determine the referent of pronouns (Edelson, 2011; Terzi, Marinis, & Francis, 2014, 2016). Nevertheless, some studies have pointed out that children with autism spectrum disorders have normal pronoun processing ability because they can correctly use syntactic clues to complete pronoun anaphora processing (Janke & Perovic, 2015; Perovic et al., 2013a, 2013b). This shows that children with autism spectrum disorders can correctly use syntactic information in the process of pronoun reference. The main challenge, then, is their grasp and use of semantic and pragmatic information in the process of pronoun reference. Analysis of the above research shows that researchers predominantly adjust the strength of syntactic prominence by changing the morphological syntactic information or the position of the antecedent in the sentence structure, whereas the strength of semantic prominence is adjusted more often by changing the semantic degree of verbs (such as transitive verbs or intransitive verbs) or prosodic information. As mentioned above, syntactic prominence can be embodied by changing the shallow syntactic structure, whereas semantic prominence requires the establishment of deep semantic associations of different strengths for adjustment. Therefore, previous studies have not fully reflected semantic prominence as these studies have merely changed the semantic degree of verbs or prosodic information. In this study, the second-person pronoun “you” and the subject’s name, which can establish a deep semantic relationship with subjects, are selected as semantic information with strong prominence so as to investigate the mechanism of pronoun reference processing by children with autism spectrum disorders.

The processing of Chinese reflexive “ziji”

This study focuses on the reference processing of the most characteristic reflexive pronoun, “ziji” (oneself), in Chinese to fully investigate the processing of semantic prominence clues during reference processing in children with ASD. The Chinese reflexive pronoun “ziji (oneself)” has unique anaphora characteristics. Generally, the reference of the reflexive pronoun “ziji (oneself)” should comply with the Theory of Government and Binding proposed by Chomsky. The anaphor (i.e., ziji(oneself)) must be bound within the governing category (also called the “local constraint”) (Chomsky, 1981). However, the Chinese reflexive pronoun “ziji (oneself)” does not fully comply with the principle of constraint. It can be constrained within its governing category as well as outside the governing category (also known as “long-distance constraint”).

For instance, in example sentence (1), the antecedent of the English reflexive pronoun “himself” can only refer to Jack in the local governing category, but the Chinese reflexive pronoun “ziji(oneself)” can refer to both the near subject “Jack” in the local governing category and the distant subject “Ben” over a long distance.

(1)

i

知道

杰克j

相信

自己i/j

 

Ben

zhidao

Jieke

xiangxin

xiangxin

 

Beni

knows

Jackj

believes

himselfj

Compared with the two antecedents, the antecedents in the local governing category have higher syntactic prominence because the antecedents are closer to the reflexive pronouns ziji. However, the long-distance constraint of “ziji (oneself)” cannot always be achieved. Researchers have found that when the near subject within the governing category of the reflexive pronoun “ziji (oneself)” has strong semantic prominence (the first/second personal pronouns), this near subject can block the long-distance constraint of “ziji(oneself).” As exemplified below where the near subject ni ‘you’ blocks the long-distance antecedent Xiaohong in the matrix clause from binding ziji self.

(2)

小红i

认为

j

自己*i/j

 

Xiaohong

renwei

ni

hen

ziji(oneself)

 

Xiaohong

thinks

you

hate

himself

This phenomenon is called the blocking effect (Chien, Kenneth, & Hsing, 1993; Giorgi, 2006; Huang, & Liu, 2001; Pan, 1995). Pan (1995) employs self-ascription to account for the blocking effect. According to the interpretation of self-ascription theory, the first/second personal pronouns belong to obligatory self-ascribers. Once the first/second person pronoun appears in the antecedent, the experimenter will process information from the first-person perspective of "I". Therefore, the first/second person pronouns have strong semantic prominence, while the third person pronouns or nouns are weak semantic salience information.

The above contents demonstrated that the anaphora of the Chinese reflexive pronoun “ziji(oneself)” not only relies on the restriction of syntactic rules but is also largely affected by the semantic information of the antecedent itself. Whether the specific reflexive pronoun “ziji(oneself)” follows local constraint or long-distance constraint depends on the location of strong semantic information (Li, & Zhou, 2010).

Aims of the present work

Therefore, this study manipulates the position of strong semantic information to investigate the mechanism of semantic information processing during anaphora processing of the Chinese reflexive pronoun “ziji (oneself)” by children with autism spectrum disorders. If children with autism spectrum disorders can normally process strong semantic information during the anaphoric processing of the Chinese reflexive pronoun “ziji (oneself),” they should give priority to the entity in which strong semantic information is located when determining the antecedent of the reflexive pronoun “ziji (oneself).” In other words, when strong semantic information clues appear in the long-distance subject position, they tend to yield long-distance anaphora processing of the reflexive pronoun “ziji (oneself)”; however, when strong semantic information clues appear in the near subject position, subjects tend to result in the near anaphora processing of the reflexive pronoun “ziji (oneself)” in accordance with the principle of local constraint.

In order to more accurately measure the reference processing of children with autism, this study employed a combination of visual-world paradigm and eye movement technology to conduct experiments. Tanenhaus et al. (1995) first used the visual world paradigm in sentence comprehension tasks. Thereafter, the paradigm was used in research on such issues as syntactic disambiguation, lexical semantic information extraction, and reference processing (Novick et al., 2008; Pyykkönen-Klauck, Matthews, & Järvikivi, 2010; Zhou, Ma, Zhan, & Ma, 2018). The visual world paradigm aids the observation of subjects’ real-time representations during the sentence comprehension process. In addition, this method can be applied to special groups of subjects who cannot perform language expression easily. Therefore, in this study, we presented both visual and auditory stimuli to the subjects, that is, when visually presenting pictures of scenes, sentence information was played over earphones, and in the process, we recorded the subjects’ eye movements while viewing the pictures.

Experiment 1

In Experiment 1, “ni (you)” was used, with the semantically prominent function, to examine whether children with ASD are able to break through syntactic rules to use such prominent information and then determine who the referent of “ziji(oneself)” should be. According to Pan (2001) and Naigles & Chin (2017), both first/second person pronouns are obligatory self-ascribers which implies that they have they have strong semantic prominence. In this experiment, the experimental material is auditory presentation. The participant is the listener, so "ni (you)" is chosen as the strong semantic information with self-attribution. Therefore, being able to identify the distant “ni” as the antecedent of “ziji (oneself)” indicates that the participants are able to use prominent semantic information to determine the referent. The visual world paradigm and eye movement detection were used to explore participants’ anaphoric processing of “ziji (oneself).”

Method

Experimental design

This study adopted a two-factor mixed experimental design of 3 (antecedent word category: third-person noun + third-person noun / third-person noun + second-person noun/ second-person noun + third-person noun) × 3 (subject group:

ASD/TD/ID). The group is the inter-subject variable, and the antecedent category is the intra-subject design. The "Antecedent Category" variable contains three levels:

c1 (third-person noun + third-person noun): Both long-distance entities and short-distance entities are third-person nouns (e.g., “The little mouse lets the little piggy buy ice cream for himself.”);

c2 (third-person noun + second-person noun): the long-distance entity is the third-person noun, and the short-distance entity is the second-person noun “you” (e.g., “The little mouse lets you buy ice cream for yourself.”);

c3 (second-person noun + third-person noun): the long-distance entity is the second-person noun “you”, and the short-distance entity is the third-person noun (e.g., “You let little mouse buy ice cream for himself.”).

Participants

The participants of the experiment consisted of three groups: 21 children with ASD, 20 TD children, and 23 children with intellectual disability (ID). Participants from TD group were from ordinary kindergartens in China, and participants from the ASD group and ID group were from special education schools. All ASD and ID participants were diagnosed by professional clinicians and met the required diagnostic criteria in the DSM-IV [APA, 1994]. The non-verbal and verbal IQs of the three groups were evaluated by the Combined Raven Test (CRT) and the Peabody Picture-Vocabulary Test (PPVT). As shown in Table 1, the verbal and non-verbal IQ of the ASD group was similar to that of the ID group, whereas both the ASD and ID groups’ IQs were lower than the TD group’s IQ. The ASD and ID groups were matched in terms of chronological ages, and both groups were older than TD children (see Table 1 for details). Since the current sample of ASD children were not high functioning enough to match the IQ of the typical children, we added a group of children with ID as a control group to examine the possible impact of IQ.

Table 1 Characteristics of the participants in Experiments 1, & 2

Our inclusion criteria were that all participants had to successfully establish a temporary connection between the self and the virtual subject. Initially, each group had 23 participants. However, two participants from the ASD group and three from TD group were unable to establish the temporary connection required between themselves and the virtual subject (Panda); hence, they were excluded from the experiment.

Two studies were approved by the Human Subjects Review Committee of Peking University. We obtained informed consent from participants’ parents and oral assent from participants before the experiment.

Materials

Thirteen images of animals and twelve photographic images depicting.

merchandise available for purchase in the supermarket (e.g., food and daily goods) were taken from the Internet. Twelve mini-stories were created in this experiment (see Table 2 for discourse structure). Each set of discourses consists of three single sentences. The first two sentences are content sentences, which involving two protagonists (e.g., a panda and a pig are neighbors) and provide the background of the story. The third sentence is the target sentence which contain two protagonists and corresponding object picture. In the target sentence, we use “let” as the verb of the main sentence and “buy, take, and give” as verbs of the subordinate clause (e.g., “The little mouse lets the little piggy buy ice cream for himself.”) (as shown in Fig. 1).

Table 2 The discourse structure in experiments 1
Fig. 1
figure 1

The materials sample in Experiments 1. Each paragraph consists of three simple sentences, which the first two are content sentences that provide discourse context and the third is a target sentence. Content sentences are personification sentences, each of which has two coordinate subjects (e.g., “You and little mouse are neighbors”/ “You and little mouse go to play”). In the target sentence, we use “let” as the verb of the main sentence and “buy, take, and give” as verbs of clause (e.g., “You lets the little mouse buy ice cream for himself/yourself”). When the content sentence is present, the visual material serves as the long-distance entities and short-distance entities animal picture (e.g., lesser panda and mouse). When the target sentence appears, the visual material presents the long-distance entities and short-distance entities animal picture and corresponding object picture

The experimental material was recorded by two people (one male and one female) who spoke standard Mandarin. The audio editing software Cool Edit Pro was used to Edit the recording. At the same time, the speech intensity of the stimulus was adjusted to achieve the same (Average RMS Power = −23 dB), to ensure that different stimuli were presented with a unified sound intensity in the experiment.

We adopt Tobii T120 eye tracker and its software package to record each subject’s gazing behavior in the experimental tasks. The eye tracker is connected to two computers, one of which is installed with Tobii Studio software package to record the subjects’ gazing behaviors, and the other is installed with Matlab software to complete the experimental procedure presentation. The computer is an entirely non-contact oculomotor system that can achieve real-time eye tracking. The system’s allowable head moving range is 44 cm × 22 cm × 33 cm, sampling precision is 0.5 degrees, and sampling frequency is 120 Hz. Five-point calibration is performed on each subject's eyes. The point at which gazing lasts for at least 80 ms is the fixation point. The subject sits in front of the eye tracker in a natural posture, with the eyes and stimulants on the same level with 60 cm.

Procedure

Temporary self-connection phase

Before the formal experiment, each participant needs to establish a temporary self-connection with "panda". In the mini-stories presented later, the panda represents the participant himself /herself. First, the experimenter needs to introduce the experimental steps and help participant complete the temporary self-connection of the panda and self. The instructions are given in the following manner. “Hello kids. Today we're going to finish a game. First of all, let's look. What is this animal in the middle of the screen? Do you like panda? In the following game, you need to play the role of a little panda and imagine yourself as the little panda in the game. From now on, you are this little panda, and this little panda is *** (name of participant). Now, would you please tell me who will play the panda now? The next phase can be conducted after the participant can smoothly answer "I am a panda" or "the panda is me". If the participants can answer that “I am a panda”, it means that they have temporarily established a connection between themselves and the panda. Therefore, the little panda in the game represent themselves.

Calibration

The subjects sat 60 cm from the eye tracker. Five-point calibration is performed on each subject's eyes. The experimenter sits near the control computer to avoid interrupting the children during the experiment. Owing to the particularity of autistic children, teachers assist the experimenter to give commands and control the behavior of the autistic children.

Test phase

Conducting the formal experiment on the subjects. As sentences are read to the subjects, the computer screen presents the corresponding discourse entities or scenes of discourse and records the gaze behaviors of subjects during the experiment. After each paragraph was presented, the participants rested for 10 s and did not need to complete any detection tasks.

Data analysis and hypothesis

In order to investigate how children process prominent semantic information during the processing of reflexive pronoun reference, this study focused on two dependent variables (time to first fixation & fixation duration) of each group participants after the presence of the reflexive pronoun "ziji (oneself)" in the target sentence. Because in the first part of the experiment, the participants have established a temporary self-connection with the little panda, in the following data analysis, the target object representing the participants themselves is the little panda picture.

First, we analyzed the processing sequence of different groups of participants on three target images (long-distance object, short-distance object, and image of food/daily goods) under different experimental conditions. If the participants processed the reflexive pronoun "zjji" with long-distance anaphora, they would first look at the long-distance object when "ziji (oneself)" appeared; if the participants processed the reflexive pronoun "zjji" with the principle of local constraint, they would focus on the short-distance object first after the presence of “ziji (oneself)". The processing sequence was measured as the time to first fixation, the first observed target image is assigned value of 1, and so on. The processing sequence was analyzed using repeated-measures ANOVA and ordinal logistic regression model. The ordinal logistic regression is a more appropriate technique for the categorical data (Franklin et al., 2017). The robust variance estimator at the cluster level approximates a comparable Generalized Estimating Equations (GEE) ordinal model.

Additionally, we compared the fixation time of different groups of participants on three target images. If children can normally process strong semantic information during the anaphoric processing of the Chinese reflexive pronoun “ziji(oneself)”, they would look more at the object of strong semantic information (panda) than other two objects. Linear regression with generalized estimating equations was performed to test the correlation between fixation time index and intergroup variables.

This study adopts Tobii interest elliptical tool and takes the area of interest (AOI) within 1° visual Angle (1.4 cm) at the edges of three target images as the area of interest (AOI).

Results and discussion

Each child was presented with all 12 stories from one of three lists counterbalanced for the antecedent word category in the target sentence and order of presentation of stories.

Analysis of the processing sequence

The processing sequence is analyzed according to the eye movement index in the duration before first fixation. Results of analysis on variance of repeated measurements show that the main effect of the antecedent category is significant (F(2,122) = 0.188, p = 0.829, \({\upeta }_{\mathrm{p}}^{2}\)=0.003). The main effect of group is not significant (F(2,61) = 0.933, p = 0.399, \({\upeta }_{\mathrm{p}}^{2}\)=0.30), that is, no differences are observed among the target object fixation sequences of different groups. In addition, the main effect of different regions of interest is significant (F(2,122) = 164.934, p < 0.001,\({\upeta }_{\mathrm{p}}^{2}\)=0.73). The interaction between antecedent category and region of interest is significant (F(4,244) = 112.353, p < 0.001,\({\upeta }_{\mathrm{p}}^{2}\)=0.648) (see Fig. 2). Simple-effect analyses revealed that time to first fixation to each AOI was different under different antecedent categories. Specifically shown as when the long-distance entities are a third-person noun, the children spent less time before looking at the short-distance object (F (2,189) = 147.074, p < 0.001,\({\upeta }_{\mathrm{p}}^{2}\)=0.609). When the long-distance entities are a second-person noun, the children spent less time before looking at the long-distance object (F (2,189) = 137.843, p < 0.001,\({\upeta }_{\mathrm{p}}^{2}\)=0.593). To further investigate whether there was any difference in the processing sequence among each group, we used the ordered logistic regression model to fix the in-group effects and assess the relationship between the groups and the order of observation. In the ordinal logistic regression for long-distance object, there was no significant effect of Group (ASD vs ID: β = 0.009, OR = 1.009, 95% CI: 0.785–1.296; TD vs ID: β = 0.086, OR = 1.09, 95% CI: 0.791–1.502). Identically, the significant effect of Group was absent for short-distance object and image of food/daily goods. The results are summarized in Table 3.

Fig. 2
figure 2

Indexes before the first fixation for the ASD, TD, and the ID groups in Experiment 1 (Error bars depicts standard errors)

Table 3 Experiment 1: Results of ordinal logistic regression for processing sequence (df = 1)

Analysis of the distribution pattern of gaze duration

Analysis on the fixation duration of subjects on the target object can help determine the subjects’ cognitive processing and interest level on the target object. A longer fixation duration corresponds to increased interest of the subject. Off-screen looks were treated as missing data. The percentage of loss trials was 14.68%, 16.38%, and 5.2% for ASD, TD and ID groups, respectively. Analysis of variance of repeated measurements is carried out on fixation duration on the target object. Results show that the main effect of the antecedent category is significant (F(2,122) = 1.433, p = 0.242,\({\upeta }_{\mathrm{p}}^{2}\)=0.023), whereas the main effect of the group is not significant (F(2,61) = 0.095, p = 0.91, \({\upeta }_{\mathrm{p}}^{2}\)=0.003). That is no difference was observed among the target object fixation sequences of the different subject groups. In addition, the main effect of the different regions of interest is significant (F(2,122) = 43.022, p < 0.001, \({\upeta }_{\mathrm{p}}^{2}\)=0.414). Moreover, the interaction between antecedent category and the region of interest is significant (F(4,244) = 24.956, p < 0.001, \({\upeta }_{\mathrm{p}}^{2}\)=0.29) (see Fig. 3). Simple-effect analyses revealed that when the long-distance entities are a third-person noun, the children the children spent the longest time at the image of food or daily goods, followed by the short-distance object, and the long-distance object was the shortest (F(2,189) = 34.614, p < 0.001, \({\upeta }_{\mathrm{p}}^{2}\)=0.268). When the long-distance entities are a second-person noun, children have the longest gaze time on the image of food or daily goods, followed by long-distance object, and the shortest gaze time on the short-distance object (F(2,189) = 37.395, p < 0.001, \({\upeta }_{\mathrm{p}}^{2}\)=0.284). Fixation time were also analyzed using Generalized Estimating Equations (GEE) to avoid type I errors that might be caused by ANOVAs. GEE was used to model fixation time towards the image of food/daily goods, the long-distance object, and short-distance object, as a function of participant within each condition. A separate cell means model was estimated for each outcome variable. Antecedent Category was included as a fixed effects term in the model and the within-subject correlation was modelled with a compound symmetry correlation matrix structure. See Appendix A for all model parameter estimates. Results of the GEE suggested that, in accordance with the processing sequence, fixation time on the three target objects were the same across groups.

Fig. 3
figure 3

Fixation duration for the ASD, TD, and the ID groups in Experiment 1 (Error bars depicts standard errors)

The results showed that, when the proximate and distant entities of “ziji (oneself)” were both third-person nouns (such as a mouse and a calf), participants from both experimental and control groups were able to follow the principle of local binding and use syntactic strategy to determine that the proximate entity was the antecedent of “ziji(oneself).”

Experiment 2

In Experiment 1, “ni” was used with the prominent semantic function, as the second-person pronoun has a strong self-referencing property. However, in addition to the second-person pronoun, the listener’s name also has a self-referring function. Furthermore, many studies have shown that self-name processing is unique and easier to be identified among a range of varied information, and is thus unlikely to be inhibited by other information (Mack, Pappas, Silverman, & Gay, 2002). In fact, even in sleep or anesthesia, an individual is still able to react to their own name (Perrin et al., 1999; Shelley-Tremblay & Mack, 1999). The question, therefore, is whether children with ASD are able to use self-name as a semantic function with a prominent self-reference effect during anaphoric processing. In order to answer this question, Experiment 2 was designed, introducing the concept of self-name.

Method

Participants

The recruitment methods and criteria were the same as in Experiment 1. We initially recruited 23 children with ASD, 20 TD children, and 23 ID children with ID. However, 3 children with ID and 2 children with ASD dropped out due to their difficulties establishing the temporary connection required between themselves and the virtual subject. 20 TD children, 20 children with ID, and 21 children with ASD contributed to the final analysis. All the children with ASD and ID met the required diagnostic criteria of DSM-IV (American Psychiatric Association, 1994). The ASD and the TD groups were matched by their chronological age and IQ (measured by CRT and PPVT). The TD children were younger and had a higher IQ than the children in the other two groups (see Table 1 for more details). None of the participants in Experiment 2 participated in Experiment 1.

Materials and procedure

Thirteen images of animals and twelve photographic images depicting merchandise available for purchase in the supermarket (e.g., food and daily goods) were taken from the Internet. In addition to using third-person nouns and second-person pronouns as in Experiment 1, this experiment also uses the subject’s name as the antecedent. Visual material is shown in Fig. 4 (see Table 4 for discourse structure).

Fig. 4
figure 4

The materials sample in Experiments 2. The design of experiment materials is the same as Experiment 1. Each paragraph consists of three simple sentences, which the first two are content sentences that provide discourse context and the third is a target sentence. The difference from Experiment 1 is that, in addition to the second-person pronoun and nouns, we choose the listener’s name as the long-distance entity. Content sentences are personification sentences (e.g., “Doudou and little mouse are neighbors”/ “Doudou and little mouse go to play”). In the target sentence, we also use “let” as the verb of the main sentence and “buy, take, and give” as verbs of clause (e.g., “Doudou lets the little mouse buy ice cream for himself/yourself”). When the content sentence is present, the visual material serves as the long-distance entities and short-distance entities animal picture (e.g., lesser panda and mouse). When the target sentence appears, the visual material presents the long-distance entities and short-distance entities animal picture and corresponding object picture

Table 4 The discourse structure in experiments 2

All procedures and data analysis for this experiment were identical to those used in Experiment 1.

Results and discussion

Analysis of the processing sequence

The time until first fixation was analyzed. The results of repeated measures variance analysis (ANOVA) showed that the main effect of the antecedent was significant (F (2,466) = 6.38, p < 0.01, \({\upeta }_{\mathrm{p}}^{2}=\)0.52). The main effect of the participant type was not significant (F (2,466) = 0.65, p = 0.79, \({\upeta }_{\mathrm{p}}^{2}=\)0.07), indicating that there was no difference in the time until first fixation between participants from different groups. In addition, the main effect of different areas of interest was significant (F (2,466) = 18.72, p < 0.001, \({\upeta }_{\mathrm{p}}^{2}=\)0.39), and the interaction effect between the antecedent and the area of interest was also significant (F (4, 932) = 21.35, p < 0.001,\({\upeta }_{\mathrm{p}}^{2}=\)0.55). The simple effect analysis revealed that, when the long-distance object was a third-person noun, children first looked at the short-distance object picture, and the long-distance object picture at the latest (F (2, 195) = 90.807, p < 0.001,\({\upeta }_{\mathrm{p}}^{2}=\)0.482). However, when the long-distance object was either the participant’s name or a second-person pronoun, children first looked at the long-distance object, while the short distance object took the longest time before staring (F (2, 195) = 110.456, p < 0.001,\({\upeta }_{\mathrm{p}}^{2}=\)0.513; F (2, 195) = 91.01, p < 0.001,\({\upeta }_{\mathrm{p}}^{2}=\)0.483) (Fig. 5). The same as Experiment 1, to further investigate whether there was any difference in the processing sequence among each group, we used the ordered logistic regression model to fix the in-group effects and assess the relationship between the groups and the order of observation. In the ordinal logistic regression for long-distance object, there was no significant effect of Group (ASD vs ID: β = 0.064, OR = 1.066, 95% CI: 0.686–1.658; TD vs ID: β = 0.157, OR = 1.17, 95% CI: 0.759–1.802). Identically, the significant effect of Group was absent for short-distance object and image of food/daily goods. The results are summarized in Table 5.

Fig. 5
figure 5

Indexes before the first fixation for the ASD, TD, and the ID groups in Experiment 2 (Error bars depicts standard errors)

Table 5 Experiment 2: Results of ordinal logistic regression for processing sequence (df = 1)

Analysis of the distribution pattern of gaze duration

Analyzing the gaze duration allows for better understanding of participants’ cognitive processing and of the degree of interest in the target: a longer gaze duration indicates greater interest. Off-screen looks were treated as missing data. The percentage of loss trials was 7.6%, 7.5%, and 8.3% for ASD, TD and ID groups, respectively. The results of the repeated measures ANOVA showed that the main effect of the antecedent type was significant (F (2,466) = 8.98, p < 0.01, \({\upeta }_{\mathrm{p}}^{2}=\)0.59). Furthermore, the main effect of the participant type was not significant (F (2,466) = 0.08, p = 0.77,\({\upeta }_{\mathrm{p}}^{2}=\)0.08), indicating that there was no difference in the gaze duration between participants from different groups. In addition, the main effect of different areas of interest was significant (F (2,466) = 11.54, p < 0.001,\({\upeta }_{\mathrm{p}}^{2}=\)0.65), and the interaction effect between the antecedent type and the area of interest was also significant (F (4, 932) = 17.39, p < 0.001,\({\upeta }_{\mathrm{p}}^{2}=\)0.76). The simple effect analysis showed that, when the long-distance object was a third-person noun, the gaze duration on the image of the food or daily goods was the longest, followed by the gaze duration on the image of the short-distance object; thus, the gaze duration on the image of the long-distance object was the shortest (F (2, 195) = 53.472, p < 0.001,\({\upeta }_{\mathrm{p}}^{2}=\)0.354). However, when the long-distance object was either the participant’s name or a second-person pronoun, the gaze duration on the image of the food or daily goods was the longest, followed by the gaze duration on the image of the long-distance object; thus, the gaze duration on the image of the short-distance object was the shortest (F (2, 195) = 58.522, p < 0.001,\({\upeta }_{\mathrm{p}}^{2}=\)0.375; F (2, 195) = 56.965, p < 0.001,\({\upeta }_{\mathrm{p}}^{2}=\)0.369) (Fig. 6). The same as Experiment 1, GEE was used to model fixation time towards the image of food/daily goods, the long-distance object, and short-distance object, as a function of participant within each condition. See Appendix B for all model parameter estimates. Results of the GEE also suggested that fixation time on the three target objects were the same across groups.

Fig. 6
figure 6

Fixation duration for the ASD, TD, and the ID groups in Experiment 2 (Error bars depicts standard errors)

In Experiment 2, self-name was introduced as prominent semantic information to explore how children with ASD process the referent of “ziji(oneself).” The results showed that, when self-name and second-person pronouns were used as the distant entity, participants from the experimental group were more inclined to perform long-distance anaphora and use semantic strategy, and identify the distant entity as the antecedent of the reflexive pronoun.

General discussion

This study introduced the visual world paradigm and eye movement detection technique to explore participants’ anaphoric processing of “ziji(oneself).” In both experiments, participants with ASD exhibited a similar level of processing ability to that of children with typical development and those with intellectual disabilities. Specifically, they were found to prioritize prominent semantic information. In the absence of prominent semantic information, participants with ASD were able to perform anaphoric processing of the reflexive pronoun in the same way as did participants from the control groups. The results of this study indicate that Chinese children with ASD are not only able to use syntactic rules, but also prominent semantic information when reference-processing the reflexive pronoun.

Both experiments showed that, when the distant and proximate entities of “ziji(oneself)” were both third-person nouns, all participants were able to use the entity within the binding domain as the antecedent of the reflexive pronoun, which is in line with the Theory of Government and Binding. The findings of this study echoed the findings of previous studies on pronoun anaphoric processing among children with ASD. Those studies have also found that children with ASD are able to correctly use syntactic rules to determine the referents of the reflexive pronouns (Perovic et al., 2013a, 2013b; Janke & Perovic, 2015; Terzi, Marinis, and Francis 2014, 2016). Terzi et al., (2014, 2016) investigated pronoun anaphoric processing among children with ASD in Greece and found that their performance was similar to that of children with typical development in terms of using only syntactic rules to process reflexive pronouns. However, the children faced difficulties in understanding the attached pronouns that required the use of semantic information. The acquisition of syntactic rules for children with ASD is earlier than the acquisition of semantic and pragmatic information. This finding is also in line with the expected pathway of children’s language acquisition. Regardless of the target language, children tend to first master general grammatical rules, and then master the specific phonetic, semantic, and pragmatic information of the language (Chien., et al., 1993; Siguriónsdóttir and Hyams 1992; Mckee, 1992; Jakubowicz, 1994; Ruigendijk et al., 2010; Lukyanenko et al., 2014; Tek, & Naigles, 2017).

More importantly, consistent with the research hypothesis, the results showed long-distance anaphora when processing the Chinese reflexive “ziji(oneself).” Specifically, when the distant entity was “ni” (second-person pronoun) or a self-name, participants with ASD, like the participants from the control groups, were able to identify the distant entity as the antecedent of “ziji(oneself)” correctly. This finding suggests that Chinese children with ASD have the ability to use prominent semantic information in reference-processing, by breaking the syntactic rules and using semantic information. The results of this study are different from those of previous studies on children who speak Indo-European languages. According to those studies, children with ASD are unable to use semantic information accurately in anaphoric processing (Terzi, Marinis, and Francis ). Such differences can be caused by the following two reasons:

(1) The characteristics of processing Chinese language are different from those of processing Indo-European languages. The words of most languages in the Indo-European language family (such as English) have various morphological changes and strict syntactic rules. Individuals who speak these languages tend to adopt syntactic rules to understand a sentence. Conversely, researchers have found that non-syntactic cues, such as semantics and context are very important in the understanding of Chinese sentences (Chen, 1992; Zhang et al., 2010, 2013). Therefore, Chinese language is referred to as a context-dependent language. Particularly, due to the lack of explicit cues provided by morphological changes, determining the anaphora of Chinese pronouns mainly relies on cues such as semantics and context owing to the lack of explicit cues provided by morphological changes (Xu et al., 2013). Consistently, this study found that participants of all three groups were able to use semantic information for long-distance anaphora identification.

(2) In addition, different from the comprehension/binding of clitics task used in previous studies, this study introduced “ni” and the participant’s name as prominent semantic information cues. Some researchers have pointed out that first and second-person pronouns and self-name have compulsory self-referring attributes. In other word, the semantic prominence becomes more pronounced when the component is more closely associated with the listener (closer to the self) (Pan, 1995). Similarly, Ariel (1990) also pointed out that based on the economic principle of language expression, when an entity has high prominence, the listener will automatically refer to the pronoun as the most prominent antecedent. Therefore, the prominence of the entity in the text has shifted to become the key information that affects the processing of pronouns. From the perspective of language acquisition, children with typical development begin to show sensitivity to text prominence and can use prominent cues to process pronouns from approximately three years of age (Song, 2007). By the age of four years, they are already able to comprehend pronouns like adults do and are more inclined to use semantically prominent and discourse-salient information rather than syntactic principles (Van Rij, Hollebrands, and Hendriks 2016). In this study, participants with ASD were able to use the prominent semantic information of the antecedent to process pronouns as well as do children with typical development or intellectual disabilities.

It is worth noting that, in this study, participants with ASD were able to perform dominant processing using their own name; however, past studies on the self-name of individuals with ASD found that they were not able to decipher the referent when their name was used during processing. Cygan et al. (2014) used electroencephalography (EEG) technology to investigate self-name recognition ability among individuals with ASD. The results showed that individuals from the typical development group tended to induce more P300 components when processing their own names, while individuals from the ASD group showed no difference in the P300 components between processing their own names, the names of individuals that they were familiar with, and faces. Therefore, the researchers concluded that individuals with ASD are not able to develop specific representation of self-names. The findings of this study, regarding the fact that participants with ASD were able to prioritize prominent semantic information, may be more closely associated to the speaker’s perspective (first-person perspective) and the concept of interpretation. At its semantic core, “ziji(oneself)” is reflexive; although the antecedent and the reflexive pronoun refer to the same entity, they serve different semantic roles (that of the performer and of the receiver of the action). Therefore, when the participants recognized their self-name as the subject of the action, they were able to process the self, prior to performing the action (Wuyun, Wang, Zhang, Wang, Yi, & Wu, 2020), thereby applying the prominent semantic information to identify the referent of the reflexive pronoun. This study found that, differently from people speaking languages of the Indo-European family, native Chinese-speaking individuals with ASD could prioritize processing of prominent semantic information. This finding thus serves as important guidance for language therapy interventions to improve the semantic and pragmatic information processing abilities of such children. In order to strengthen their ability to process semantic and pragmatic information, interventions are recommended to help participants establish a first-person perspective. For example, we can allow children with autism to establish a temporary connection with the protagonist in the story, or perform the content of the story regarding themselves as the action agent in language processing.

Adding a group of children with ID in our study allowed us to examine the possible impact of IQ. Our finding of the similar patten in the ID as the TD group implied that the priority processing about prominent semantic information among individuals with ASD could not be attributed to differences in IQ, but rather to autistic traits. However, since the current samples of children with ASD were not high functioning enough to match the IQ of typical children, further research on children with higher functions is needed when generalizing our findings.

Limitations

In addition, this study has the following several limitations. First, we only used the PPVT for language testing. However, the PPVT focuses more on individuals’ understanding of vocabulary, while the research focused mainly on language comprehension. Hence, future studies are suggested to adopt other language ability tests. Additionally, because ADOS-2 and CARS-2 have not been translated or normalized to be used in China when we did the study, other instrument for diagnosis were not used in this study. Future studies could consider using other scales to confirm the diagnosis of ASD. Another limitation is that no fillers were used in this study. Considering the attention limits of the children and the absence of reading comprehension tasks in this study, filler materials were not set in this study. However, because sufficient filler is important to the experiment, sufficient filler can improve the reliability of the measurement (Runner et al., 2006; Su, & Su, 2015). Therefore, future studies could consider using more fillers for language comprehension.

Conclusions

In summary, this study used eye-tracking techniques and the visual world paradigm to investigate the reference-processing ability of Chinese children with ASD. The results showed that participants were able to prioritize prominent semantic information: in addition to respecting the syntactic rules, they were also able to use prominent semantic information to assist language understanding. The findings provide evidence for better understanding of the language processing mechanisms of individuals with ASD and provide important guidance for effective clinical speech therapy interventions in the future.