Introduction

Formulaic sequences (FSs), which are frequently occurred in our everyday language interactions, are considered as fixed expressions or unanalyzed multi-word chunks. These sequences are often perceived as fossilized forms due to their high frequency and the fact that they are recognized and used as complete semantic units without individual word analysis. The meaning of a formulaic sequence cannot be deduced by combining the individual meanings of its constituent units. FSs are comparable to water for fish in terms of their significance in language use, and they constitute a significant portion of native speakers’ oral and written communication. To achieve a writing style that closely resembles that of native speakers, second language (L2) learners are advised to include approximately 52.3% of formulaic sequences in their articles (Erman and Warren 2000). Moreover, when aiming to convey discourse that is similar to that of native speakers, it is expected that the proportion of formulaic sequences would be even higher (Ellis 2008). Formulaic sequences can be regarded as a highly influential factor that significantly contributes to the fluent, accurate, and natural utilization of the target language in communication and writing for L2 learners. The incorporation of FSs can also serve as a linguistic indicator for assessing the progress of L2 learners in their mastery of the target language (Perera, 2001). In addition to understanding the meaning and usage of individual words, the L2 learner possesses the knowledge of combining units in a collocation to create formulaic sequences within an L2 context. This ability proves valuable in enabling them to generate language that closely resembles that of native speakers.

In recent years, there has been a growing emphasis on the study of formulaic sequences across various fields, including linguistics and psychology, as evidenced by an increasing number of researchers delving into this area of research. The natural features of FSs, such as the length and frequency of English FSs in daily use (Dai, et al. 2018), the holistic quality of FSs (Siyanova-Chanturia 2015), and what the FSs are for and what determines how many there are (Wray 2017), have been attracting many attentions. From a psycholinguistic perspective, how to store and process FSs in the mental lexicon is a crucial research question in the psychological approach to FS cognitive studies (Carrol and Conklin 2020). The indispensable and irreplaceable role of formulaic sequences in language use and learning is becoming increasingly apparent. Several empirical studies provide support for the hypothesis of a holistic processing approach, demonstrating that formulaic sequences are processed more rapidly and exhibit distinct psychological characteristics compared to non-formulaic sequences. This article aims to examine how proficiency levels in L2 learners of Chinese and native speakers differ in their processing of formulaic sequences within sentence contexts. Specifically, our study will concentrate on exploring the processing advantages of formulaic sequences in various groups, examining the contextual effects on FSs processing, and analyzing the similarities and differences in FSs processing across different proficiency levels of L2 learners.

Literature review

The concept, formation and processing of formulaic sequences

A formulaic sequence is the word string that is processed like a morpheme, stored and retrieved holistically from memory at the time of use, and without resource to any form-meaning matching of any sub-parts, cannot be discomposed and analyzed by the language grammar (Wray 2002, 2008). This type of lexical string is commonly used and prevalent in language, often associated with specific usage conditions. Formulaic sequences are primarily formed through the everyday use of language and are influenced by linguistic context. In these sequences, individuals combine two or more components into a string of words to express their thoughts and feelings in accordance with pragmatics. Unlike grammar-dependent structures, formulaic sequences rely on their overall meaning rather than adhering to strict grammatical rules. They serve as preconstructed and indivisible lexical units, where the meaning cannot be deduced by analyzing their individual components or surface form. In contrast, non-sequences, which require more time for recognition and processing, follow grammatical rules and derive meaning from their constituent parts based on established rules. Generally, formulaic sequences are frequently used strings that may deviate from strict grammatical regularity or semantic logic. When a strong association between the form and meaning of a particular string exists, with no variation, it remains unanalyzed. While early research on the acquisition and processing of formulaic sequences primarily focused on children’s first language development, the evidence regarding their role in adult second language acquisition and processing is less conclusive compared to that of children (Wood 2015). Early studies on the acquisition and processing of formulaic sequences predominantly concentrated on children’s first language development. However, when it comes to the role of formulaic sequences in adult second language acquisition and processing, the evidence is less definitive in comparison to the findings observed in children.

L2 learning is characterized by numerous incremental developments occurring over an extended period. Consequently, researchers commonly employ quantitative methods to examine learners’ knowledge and track changes in the acquisition of formulaic sequences over time (Chen 2019; Eskildsen 2015; Serrano et al. 2015; Toomer and Elgort 2019). Simultaneously, researchers have been investigating various factors that potentially influence the processing and acquisition of formulaic sequences in L2 learners. These factors include the types of FSs and their linguistic features, statistical properties of multi-word sequences, disparities between the learners’ first language and second language, duration of instruction, proficiency levels and vocabulary scales of L2 learners, receptive knowledge of the L2, classroom pedagogical approaches, and more (Ding and Reynolds 2019; Kim and Nam 2017; Nguyen and Webb 2017; Pan et al. 2018; Pulido and Dussias 2020; Wolter and Yamashita 2018; Yeldham 2020; Yi, Lu and Ma 2017). In contrast, the productive data derived from the spoken or written language of L2 learners directly elucidate their understanding and utilization of formulaic sequences. This type of data primarily focuses on the oral production or spoken proficiency of multi-word sequences (Chan 2019; Chen 2020; Saito 2020; Tavakoli and Uchihara 2020; Xu 2018; Yan 2020). Through a review of previous studies, it becomes evident that diverse processing patterns of formulaic sequences have consistently emerged in L2 research. As a result, the investigation of how FSs are processed and produced in L2 contexts has evolved into a prominent research question.

FSs processing between native and non-native speakers

The formulaic sequence exhibits characteristics of being preconstructed and is stored as a complete unit in memory, rather than being generated or analyzed by the rules of language grammar at the time of use (Wray 2002). In recent years, researchers have shown increased interest in collecting experimental data and examining the cognitive processing of formulaic sequences among both native speakers (NSs) and non-native speakers (NNSs). This is due to the significant role that FSs play in language acquisition and language use. These studies aim to investigate whether there is a processing advantage for FSs and explore whether FSs are processed holistically or analytically. This question has been a subject of research for an extended period (Van Lancker 2012; Wray 2002, 2013). The holistic processing approach involves treating formulaic sequences as complete units, where multi-word sequences are stored in the mind as a whole and cannot be understood by analyzing their individual components. In this perspective, individuals learn and store FSs as integrated units, retrieving and using these sequences holistically to convey their intended meaning. On the other hand, the analytical processing approach involves breaking down FSs into their constituent units and applying grammatical rules for comprehension. In this approach, FSs are processed word by word to derive meaning. Numerous behavioral experiments have provided support for the holistic processing advantage of FSs in both native and non-native speakers. These experiments demonstrate that the processing time for FSs is shorter compared to non-FSs in English. Consequently, the holistic processing advantage offers a faster and effortless burden of processing FSs, granting them priority over non-sequences.

This hypothesis has been supported by a substantial body of applied linguistic experiments conducted with native speaker participants. For behavioral research, FSs show a shorter response time in lexical decision tasks (Arnon 2010), a quicker self-paced reading than other sequences (Conklin and Schmitt 2008, 2012; Tremblay and Baayen 2010), and a faster reading aloud and a shorter time of silent reading (Siyanova-Chanturia et al. 2011; Underwood et al. 2004; Vilkaitė 2016). In contrast to the consistent findings regarding the processing advantage of formulaic sequences among native speaker participants, the results for FS processing among non-native speakers are highly contradictory. In order to gather additional evidence to support the processing priority hypothesis, researchers continue to conduct various behavioral experiments investigating FS processing in NNSs (Conklin and Schmitt 2008; Ellis et al. 2008; Jeong and Jiang 2019; Jiang and Nekrasova 2007). However, there is currently no consensus on whether non-native speakers process formulaic sequences holistically, similar to native speakers. (Schmitt et al. 2004; Schmitt and Underwood 2004; Underwood et al. 2004). Regarding the processing approaches of formulaic sequences, there have been hypotheses suggesting holistic processing, analytical processing, and hybrid processing. However, there is a need for further cross-linguistic experiments to validate these approaches. While FSs are common units in languages worldwide, the existing research primarily focuses on the processing of English FSs and lacks sufficient evidence from other languages.

Contextual effects in words recognition and processing

Linguistic context plays a crucial role in aiding individuals to rapidly and accurately identify words through various means. When words appeared in a congruent sentence context, they were responded more quickly in a naming task (West et al. 1983) or a lexical decision task (Becker 1980; Eisenberg and Becker, 1982), than those in a neutral or incongruent context. It was found that individuals need less information to recognize a word when context is available adequately (Grosjean 1980). Moreover, a word would be recognized in a shorter period where it is located in a later place of a sentence when more contextual information was available than in an early place of a sentence (Marslen-Wilson and Tyler 1980). It is intriguing to note that sentence context plays a significant role in determining the meaning of a word when it appears in ambiguous stimuli (Connine et al. 1991). The electrophysiological experiments have also indicated that contextual effects occur early for word identification (Sereno et al. 2003).

Natural language use is intricately connected to contextual linguistic information, regardless of location or time. Context serves as the most direct and efficient facilitator of word recognition and processing. Researchers design experiments with various research paradigms to incorporate contextual variables and explore their impact on language processing. McDonald and Shillcock (2001) conducted two lexical representation and processing experiments to explore the effect and priority of contextual distinctiveness (CD). The first experiment showed that CD was a significantly better predictor of lexical decision latencies than occurrence frequency, and it also proved that CD was the more psychologically relevant variable. Additionally, they explored the relationship between CD and six subjectively defined measures and found that CD was the only one which reliably related to Ambiguity. Cervera and Rosell (2015) evaluated the effects of linguistic context on word recognition in noise by elderly listeners. The materials, which had been chosen from Spanish Sentence Lists, contained high and low-predictability sentences. The results revealed that the participants were beneficial from the linguistic context of word recognition. Moreover, Baayen et al. (2011) discovered that contextual frequency would affect the speed of cognition of nouns. Considering the indispensible function of linguistic context, it is hard to imagine that communication without context can offer irreplaceable information for understanding meaning accurately and quickly.

In a study conducted by Zheng et al. (2016), intermediate L2 learners of Chinese participated in a processing experiment involving formulaic sequences within a sentence context. The results indicated that contextual linguistic information significantly enhanced their reading speed and accuracy. As contextual information plays a crucial role in word recognition and processing, it raises the question of whether the processing of FSs as word equivalents is also influenced by context. From the perspective of contextual information’s impact on word processing, context provides language cues that aid in word identification and processing, thereby facilitating the acquisition of target language vocabulary for L2 learners. However, it remains unclear how context affects the processing of FSs and whether it can enhance learners’ FSs processing. Further exploration is needed to examine the mediating role of contextual information in FSs processing and to verify its effects across different types of lexical units. If contextual effects are found to apply to FSs as well, teachers can consider utilizing context as a potential strategy for teaching FSs in classroom settings.

The present study

To investigate the processing approach of formulaic sequences in Chinese within a sentence context, this study addresses the following two research questions:

  1. 1.

    Do the participants employ a holistic approach in processing FSs?

  2. 2.

    Does sentence context influence the participants’ processing of FSs?

For the first research question, it was hypothesized that irrespective of their native language background, all participants would process the formulaic sequences holistically, based on previous research in this area. Regarding the second research question, it was speculated that sentence context would have an impact on non-native speakers’ processing of the materials due to their limited knowledge of Chinese. This study specifically focused on L2 learners of Chinese as participants to examine the processing of Chinese FSs within a sentence context. The results can provide valuable insights for the acquisition of Chinese FSs as a second language and can be compared with the processing of FSs in English as a second language. One limitation of this study is that it does not encompass testing L2 learners in multiple target languages; however, future research can consider including participants from various second languages for a broader perspective.

Methodology

Participants

In this study, two groups of participants were included: native and non-native speakers of Chinese. The NSs group consisted of undergraduate students (n = 20; 11 males, 9 females) from colleges in China, with an age range of 18–23 (M = 20.45, SD = 1.39). The NNSs group comprised participants at three proficiency levels in learning Chinese as a second language. The elementary participants (n = 20; 7 males, 13 females) had an age range of 18–28 (M = 21.65, SD = 2.37). The intermediate participants (n = 20; 9 males, 11 females) ranged in age from 20–28 (M = 23.10, SD = 2.51). The advanced participants (n = 20; 5 males, 15 females) were aged 23–29 (M = 26.70, SD = 2.00).

All L2 participants in this study were enrolled in Chinese colleges and had obtained corresponding ranks in the HSK examination. The HSK examination is a standardized Mandarin proficiency testing system for non-native speakers administered by the Ministry of Education of China. The participants had freely chosen their respective examination level to determine their Chinese proficiency. To pass a particular level of the HSKFootnote 1 examination, candidates need to achieve a score of 60% or above.

Before participating in the linguistic experiment, all NNS participants underwent a Chinese character recognition test to ensure their readiness for the experiment. This test covered the Chinese characters used in the experiment and aimed to confirm their ability to complete the experiment successfully. Prior to the experiment, all participants had signed an Informed Consent Form, indicating their understanding of the experiment and their voluntary participation.

Materials

Recognizing formulaic sequences poses a challenge for speakers, and various approaches have been employed in related research to address this. One popular and objective method involves utilizing frequency statistics or mutual information (MI) derived from extensive corpora. Additionally, psycholinguistic or acoustical features of a sequence can offer valuable insights into potential formulaicity. Moreover, a checklist of characteristics and usage patterns of multi-word strings can serve as a helpful guide for judges, whether they are native speakers or experts working with language data (Wood 2015). In order to ensure objectivity and scientific rigor, our study adopts a two-step approach. Firstly, we utilize statistical information, such as frequency and MI, obtained from a large Mandarin Chinese Corpus to select appropriate formulaic sequence materials. Subsequently, we employ a psycholinguistic approach to process these stimuli, thereby enhancing the validity and reliability of our findings.

A total of 40 sequences were selected for this study, comprising 20 formulaic sequences and 20 non-formulaic sequences. These sequences were chosen from a frequently used corpusFootnote 2 and were subjected to frequencyFootnote 3 analysis (t = 3.76, df = 19.0, ρ = 0.001) and mutual informationFootnote 4 calculation (t = 6.38, df = 19.0, ρ < .001) to ensure significant statistical differences between the two contrasting sets of materials. The non-FSs were created by substituting one or two characters in the matched FSs. Notably, the stroke count difference (t = 0.483, df = 19.0, ρ = 0.635, d = 0.108) between the characters in the two types of materials was not significant. For instance, the FS “不一定” (buyiding, means ‘uncertain’) was transformed into a non-FS “不充分” (buchongfen, means ‘insufficient’) by replacing the last two characters.

To ensure the effectiveness of the selected materials, several measures were taken. Firstly, a comparison was made between these materials and the vocabulary outline for the HSK proficiency test. This ensured that elementary participants would be able to recognize and process most of the characters in the sequences. Additionally, three linguistic researchers (all native speakers of Chinese) were invited to intuitively evaluate the FSs and non-FSs used in the experiment. Furthermore, five native speakers (all college students) conducted psycholinguistic judgments on the materials, ensuring a comprehensive assessment of their suitability.

The presentation of formulaic sequences in isolation, as typically done in laboratory settings, fails to consider the authentic use of language, where context plays a crucial role. In order to examine the potential impact of contextual effects on participants’ representation and processing of FSs, this study incorporated all FSs and non-FSs into grammatically correct and meaningful sentences. Each formulaic sequence and its matched non-formulaic sequence were embedded within the same sentence.

To ensure the suitability of the sentences, a readability test was conducted with native speakers and different proficiency levels of non-native speakers who did not participate in the subsequent response time experiment. Based on the feedback received, necessary revisions were made to the sentences to ensure their appropriateness for the experiment. In the end, a total of 20 sentences containing FSs, 20 sentences containing non-FSs, and 20 sentences without any FSs or non-FSs were included. Contextual numerical information for each sentence was calculated, and the results demonstrated a significant statistical difference between the sentences containing FSs and those containing non-FSs (t = 7.98, df = 19.0, ρ < .001, d = 1.78).

The traditional quantitative method for determining the typicality of a target unit’s context involves employing a questionnaire in which native speakers rate the context based on their language intuition, thereby quantifying semantically related information. While native speaker scoring carries some credibility, it is also subject to a certain degree of subjectivity. The cognitive processing of a language unit within a linguistic string does not commence solely upon encountering the target unit. Instead, it follows a Markov process, where the listener’s perception of the Nth language unit relies on the chain conditional probability between the N-1 language units, involving what is known as N-gram operations in computational linguistics.

In the context of this study, sentence context refers to the ability of readers to infer the subsequent words in a sentence based on the initial words given. Computational linguistics utilizes the Bayesian Law to calculate contextual numerical information. In our research, we applied the following formula to obtain statistical results for quantifying sentence context:

$$P\left( {w_3\left| {w_1,w_2} \right.} \right) = \frac{{c\left( {w1,w2} \right)}}{{w1}} \times \frac{{c\left( {w2,w3} \right)}}{{w2}}$$

Within this formula, C(W1) represents the frequency of occurrence of Word1 in the corpus, while C(W1, W2) denotes the frequency of co-occurrence of Word1 and Word2 in the corpus. When calculating the differences in stimuli, we incorporated the logarithm of quantitative sentence context into the computation.

The tested materials were divided into two separate lists, each containing 10 sentences that included FSs, 10 sentences that included non-FSs, and 10 sentences without any FSs or non-formulas. It is important to note that the formulaic materials and their corresponding non-formulaic sequences were not included in the same list to ensure controlled conditions and minimize potential biases.

Procedures

In this response time experiment, a moving windows technical paradigm was employed to present the tested sentences. The sentences were displayed using word-by-word masking, implemented through E-Prime software, to collect data. During the experiment, participants were required to press a key on the computer to reveal each word in the stimuli, and the instrument recorded their reading time.

In the Chinese language, words can consist of varying numbers of characters, including single-character words, two-character words, three-character words, and four-character words, among others. Formulaic sequences, which are made up of multiple Chinese characters, were treated as individual words in this study. The experimental sentences were presented word by word, allowing participants to comprehend the meaning of the entire sentence by understanding the words and their relationships within the context.

In order to ensure participants understood the meaning of the stimuli, they were required to answer Yes or No questions after reading each sentence. These questions were derived from the content of the sentences, focusing on key information. If a participant’s accuracy rate on the questions fell below 70%, their experimental results would not be considered. In such cases, another participant would be invited to participate to ensure an adequate amount of reliable data. Moreover, before engaging in the response time experiment, elementary participants were required to complete a character recognition test. If their recognition rate did not reach 90%, they would not qualify for the experiment. The concrete steps of the experiment are as shown in Fig. 1.

Fig. 1
figure 1

The illustration of a trial in this experiment.

Results

The behavioral data on response times were analyzed using two different statistical approaches. The first approach involved analyzing the original response time data for FSs and non-FSs without considering the quantitative sentence context factors. This analysis utilized a two-way analysis of variance (ANOVA) with repeated measures. The between-subject factors were the participant group (4), and the within-subject factors were the stimulus set (2). The variables examined were the accuracy of each question answer and the response time for correct responses in the judgment task.

The second approach involved considering the quantitative sentence context as a covariate factor. The behavioral data analysis of response times for FSs and non-FSs was performed using a two-way analysis of covariance (ANCOVA) with repeated measures, employing the same between-subject and within-subject factors as the previous statistical calculation. Following the two-way ANOVA and two-way ANCOVA, additional analyses were conducted using one-way ANOVA and one-way ANCOVA. In the one-way ANOVA analysis without contextual effects factors, the independent variable was the stimuli, which had two levels: FSs and non-FSs. The response times for these two items were examined as the dependent variables. In the one-way ANCOVA analysis, the dependent variables were the same as in the one-way ANOVA. However, the types of stimuli were treated as fixed factors, and the quantitative sentence context was included as a covariate.

By employing these statistical analyses, the researchers were able to assess the impact of formulaic sequences, non-formulaic sequences, and the quantitative sentence context on response times accurately.We analyzed the response times data of 20 NSs and 60 NNSs who ranked by the three levels Table 1.

Table 1 Stimuli response time (ms) of different participants.

When the data were analyzed without considering the quantitative sentence context factors, a two-way repeated-measures ANOVA with a 4 × 2 design revealed a significant interaction effect between participant group and stimulus set (F(3, 76) = 24.9, ρ < .001, ƞ2 = 0.002). Post-hoc comparisons of the stimuli indicated that the response times for FSs were significantly faster than those for non-FSs (ρ < .001), indicating that the processing time for FSs was shorter compared to non-FSs. The post-hoc comparisons of the participant group also showed significant differences in response times for stimulus processing among the participants (ρ < .001). Specifically, NNSs took more time to process both FSs and non-FSs compared to NSs. However, as Chinese proficiency improved among NNSs, their processing time gradually decreased.

When the data were analyzed with the quantitative sentence context factors as covariates, a two-way repeated-measures ANCOVA with a 4 × 2 design revealed a significant interaction effect between participant group and stimulus set (F(3, 74) = 24.49, ρ < .001, ƞ2 = 0.002). Post-hoc analyses indicated that, in terms of stimuli, the response times for FS stimuli were significantly shorter than those for non-FS stimuli (ρ < .001). Additionally, post-hoc comparisons between the participant groups showed significant differences between NSs and NNSs (ρ < .001).

The results of the one-way ANOVA revealed that the response times for processing FSs were significantly shorter compared to matched non-FSs across all participant groups: elementary L2 group (F(1, 19) = 125, ρ < .001, ƞ2 = 0.319), intermediate group (F(1, 19) = 89.0, ρ < .001, ƞ2 = 0.303), advanced group (F(1, 19) = 80.8, ρ < .001, ƞ2 = 0.244), and native group (F(1, 19) = 49.0, ρ < 0.001, ƞ2 = 0.101).

Regarding the one-way ANCOVA analysis, which considered the sentence context condition, the elementary group (F(1, 37) = 7.19, ρ = 0.011, ƞ2 = 0.147) and the intermediate group (F(1, 37) = 6.19, ρ = 0.017, ƞ2 = 0.130) exhibited significantly different processing times for the two types of stimuli items. However, the advanced group (F(1, 37) = 4.46, ρ = 0.042, ƞ2 = 0.075) and the native group (F(1, 37) = 4.362, ρ = 0.044, ƞ2 = 0.105) only showed marginally significant differences in processing time for the stimuli items.

Discussion

The verification of FSs holistic processing hypothesis

When the sentence context factors were not taken into account, we observed the participants’ response times for processing FSs and matched non-FSs. The results consistently showed that the processing times for FSs were shorter compared to non-FSs, regardless of the proficiency levels of the non-native speakers. This trend was also evident in the response times of native speakers. These findings indicate that even when considering only the reading responses to the target structures (FSs and non-FSs) within the sentence context, both L2 learners of Chinese and NSs exhibited faster processing speeds for FSs compared to matched non-FSs. This suggests that participants respond more rapidly to the recognition of FSs when presented within a context. Furthermore, when the target structures were presented in isolation, both L2 learners and native speakers displayed a similar trend in processing Chinese FSs. This suggests that there are fewer internal components to parse within the sequences, allowing for processing of these target structures at a synchronic level.

In this experiment, when the possibility of context was not considered, elementary L2 learners exhibited significant differences in reading response time between FSs and matched non-FSs. Due to their limited proficiency in the target language, learners at this level may struggle to distinguish the structural differences between FSs and non-FSs. They tend to adopt a form-to-meaning approach, processing sequences word by word and integrating the meaning of strings. However, the inclusion of sentence materials in this experiment provided rich language information and cues for participants to comprehend the structure of the stimuli.

The presence of sentence context offered additional clues for elementary learners to quickly identify and process the target structures. By understanding the words preceding and following a sequence, these learners could expedite the cognitive processing speed of FSs. When FSs were presented in isolation, both intermediate and advanced learners of Chinese also exhibited significant differences in their response times. Similar to elementary learners, intermediate participants made effective use of information clues within sentences to recognize and process the given sequences. On the other hand, advanced learners tended to approach FSs from a psychological perspective, considering the prefabrication and integrity characteristics of the sequences. Even within the context of sentences, the linguistic information provided by the sentences might not be as crucial for them in processing FSs. Regardless of their proficiency levels, L2 learners of Chinese tend to utilize the language information and clues available to them. They rely on linguistic information and cues to grasp the meaning of the target language. Consequently, they may process sequences primarily based on the linguistic information and characteristics of FSs, leading to improved processing speeds.

In this experiment, a similar pattern emerged among native Chinese speakers. When the influence of context was not taken into account, the response time for FSs was significantly shorter compared to matched non-FSs. Both L2 learners and native speakers exhibited significant differences in response times between FSs and matched non-FSs, indicating that the processing priority effect influenced readers regardless of whether FSs were presented in isolation or within a context. This finding supports the notion that the holistic processing hypothesis of FSs applies universally, regardless of the participants’ native language background, and is equally evident in the cognitive processing of both L2 learners and native speakers.

Contextual effects in processing FSs

Language is inherently intertwined with various contextual factors in real-life communication. Whether individuals consciously utilize these contexts to aid their understanding of language information or not, it is undeniable that language always exists within different contextual settings. This experiment was designed with this understanding in mind. It aimed not only to examine whether isolated sequences exhibited a processing priority effect but also to investigate the processing of FSs within the context of sentences.

By incorporating contextual possibility factors as crucial and influential conditions in the covariance analysis model, we can observe certain differences in the response times for stimuli among both NNSs and NSs. Considering the significant differences between context factors and response times, it becomes evident that the degree of quantitative contextual possibility indeed impacts the response times of stimuli for all participants. In other words, the higher the quantitative contextual possibility, the less time participants require to process FSs.

The contextual possibility can significantly influence Chinese learners at the elementary and intermediate levels. While learners at these proficiency levels are experiencing rapid development in their Chinese language skills, there may still be words and language information in the experimental materials that they cannot fully recognize and comprehend. In such circumstances, these learners may rely on their acquired knowledge of Chinese linguistics to make educated guesses about previously unseen content based on the known information provided in the materials. The presence of context can offer them ample and effective clues to process the materials in such situations.

In this experiment, sentences served as a scaffold for processing sequences, assisting elementary and intermediate learners in overcoming recognition challenges and successfully completing the experiment. The context, which provided linguistic information and reading cues, played an irreplaceable role. This scaffold acted as an essential and unique tool for L2 learners. Moreover, the possibility of context also promoted faster response times for FSs among advanced learners and native Chinese speakers. Advanced learners possess a greater knowledge base and processing ability in the Chinese language, along with more frequent language usage in speaking and writing, compared to elementary and intermediate learners. As a result, learners at the advanced proficiency level exhibit stronger cognitive abilities. When these sequences are presented in isolation, higher-level L2 learners demonstrate faster processing speeds and a higher level of accuracy in understanding the intended meaning.

The context of sentences plays a crucial role in language processing. Whether individuals are L2 speakers or native speakers, it is impossible for them to understand and process language in isolation, without considering the context. Therefore, the context in this experiment had an impact on the recognition and processing of FSs for all participants. However, the influence of context was not as pronounced for advanced learners compared to elementary and intermediate learners, owing to their higher language proficiency.

The findings of this experiment differ from those of McDonald and Shillcock (2001) as we categorized all L2 participants into three proficiency levels based on their HSK scores and analyzed their processing of Chinese FSs at different stages. L2 learners at different proficiency levels possess varying levels of Chinese linguistic knowledge and usage abilities, leading to differences in their processing of FSs and their understanding and utilization of context. While context does impact their processing of FSs, the extent of this influence varies depending on their level of proficiency in the target language.

Additionally, our experimental design differed from that of Cervera and Rosell (2015). Instead of a listening-based approach, we conducted a self-paced masking experiment, collecting response times and judgment accuracy as data to explore the processing approaches used by participants in relation to FSs.The influence of contextual possibility on the processing of FSs in sentences was not equal for all participants, and there was a certain degree of differentiation within them. It can be concluded that when reading sentences containing FSs and non-FSs, the context had a significant impact on the response times of L2 learners and NSs. However, this contextual influence was not balanced. The primary and intermediate learners were more affected, while the advanced learners and native Chinese speakers were less affected. Moreover, for L2 learners of Chinese, the reading facility effect of context on FSs was more obvious than that of non-FSs.

Conclusions and implications

The hypothesis regarding the holistic processing of formulaic sequences was confirmed through an experiment measuring response times with both L2 learners of Chinese and native Chinese speakers. Importantly, the processing advantage of formulaic sequences over non-formulaic sequences was evident not only among L2 learners but also among native speakers. Regardless of whether the formulaic sequences were processed in isolation or within a context, the time taken to process them was consistently faster and shorter compared to non-formulaic sequences. Furthermore, contextual effects played a significant role in the recognition and processing of sequences, although the extent of these effects varied among different participant groups.

Beginners and intermediate learners were profoundly influenced by contextual effects, and these L2 learners also exhibited a stronger psychological reality regarding formulaic sequences compared to advanced learners. From the experimental data, we can observe that the linguistic ability of advanced L2 learners in processing formulaic sequences is gradually approaching that of native speakers. The question of whether L2 learners can process a second language like native speakers and produce native-like language remains challenging to answer. However, the idea that formulaic sequences can serve as a tool for assessing and evaluating L2 learners’ language development in the field of second language testing and evaluation is now possible and valuable.

In a second language classroom, it is crucial for teachers to adhere to the holistic processing principle when teaching formulaic sequences. This is because FSs are best understood, remembered, retrieved, and utilized as holistic units. Contextual effects play an essential role in the processing of FSs by L2 learners, and learners can utilize context as a supportive scaffold for effective L2 learning. Further research and exploration in the field of formulaic sequence development are warranted, as it holds significance for both general linguistic theories and L2 pedagogies.

From a theoretical perspective, investigating the linguistic characteristics and patterns of change in formulaic sequences can enhance our understanding of the processes involved in L2 acquisition. Pedagogically, the teaching approaches for formulaic sequences should not be overlooked, given their significant role in L2 usage. Developing a comprehensive understanding of the linguistic and statistical features of formulaic sequences, as well as how they are processed and acquired by L2 learners, will undoubtedly facilitate more accurate and effective use of formulaic sequences by non-native speakers.