A computational model of TE-dominant noticing, repetition, prior knowledge and grammatical knowledge acquisition

Computer-assisted textual enhancement (CATE) technology has been widely used to improve English as foreign language (EFL) learners’ syntactical and grammatical learning. Visual attention, repetition, and prior knowledge are known as the vital factors in CATE-assisted knowledge-acquisition; however, there still lacks a model which can describe those factors’ intrinsic cooperating-mechanism that works in the CATE-based knowledge-acquisition. Therefore, this paper built up a computational model (PESE) of using those factors as variables, by fitting and predicting the data collected from empirical experiments with an average accuracy of 78%, PESE testified and complemented the assumptions proposed by previous studies. PESE suggested that although the efficacy of CATE is majorly decided by learners’ prior-knowledge of the targets, the interactive effects of visual-attention, repetition, and inductive activity could partly compensate for the effect from prior-knowledge, and the efficacy ceiling of repetition also could be estimated according to the ‘easy-perceiving level’ coefficient. At the end of this paper, 3 pedagogical implications were proposed for English teachers who are willing to integrate CATE into their teaching activities.


Introduction
Improved syntactic-and grammatical awareness can be beneficial to students in enhancing their reading and writing skills in second language (L2) acquisition. However, it is very hard for young L2 learners to develop a strong knowledge of the syntax to the extent that they can perform syntactic parsing as quickly as they do in their first language (L1). The major obstacle preventing L2 readers from developing responses regarding L2 syntactic features may be the restricted exposure to L2 prints (Park & Warschauer, 2016). Young L2 students, lacking L2 linguistic knowledge and insufficiently exposed to L2 prints, have difficulties in noticing and acquiring the formal features of written discourse, resulting in significantly low academic performances in reading and writing.
One of the approaches commonly used to help students pay sufficient attention to the syntactic information is textual enhancement (Park & Warschauer, 2016) based on the 'Noticing Hypothesis' (Schmidt, 1990). The input features entail typographical cues so that the target form is enhanced and its visual appearance in the text is altered (italicised/boldened/underlined, see Kim, 2006). The aim of the typographical change is to enhance the target form's perceptual salience, e.g., using colours to improve the perception of the linguistic cues (Comeaux & McDonald, 2018), using automatic colour font to enhance the perception of the articles (Ziegler et al., 2017), using colour, bold, and underlining to help acquire verbal morphology (Mayén, 2013), using underlining to enhance the perception of collocations (Szudarski & Carter, 2016) and the memorisation of target words (Boers et al., 2017), using visual-syntactic text formatting (VSTF) technology to attract learners' attention not only to a single feature, but also to a syntactic structure (Park & Warschauer, 2016). The syntactic features omitted by naturalistic L2-learners could be prominent by reforming learners' attention distribution (Cintrón-Valentín & Ellis, 2016). Up to now, the effects of textual enhancement on grammatical learning or syntactic-knowledge acquisition have been widely acknowledged (e.g., Gascoigne, 2006;Lee & Huang, 2008;Smith, 1993). Moreover, with the fast development of information technology, computer-assisted textual enhancement (CATE) has become the dominant form in improving learners' apperception of the syntactical and grammatical learning (e.g., Comeaux & McDonald, 2018;LaBrozzi, 2016;Park et al., 2012;Park & Warschauer, 2016;Szudarski & Carter, 2016;Winke, 2013).
Visual attention was reported as the corner stone of applying textual enhancement technologies, given that it is the prerequisite of transferring L2 input to intake (e.g., see Comeaux & McDonald, 2018 for review). In addition to visual attention, many other factors were also reported to be of vital importance in deciding the efficacy of applying CATE to L2 knowledge learning, e.g., Lee and Huang (2008) claimed that the CATE efficacy was a function of the learners' prior knowledge, and Szudarski and Carter (2016) implied that the overloaded repetition would lead learners into a resistance period of perceiving the CATE-based intervention.
Previous studies had researched those factors' dependent impacts on the efficacy of CATE, finding that the effectiveness of CATE was influenced by the interaction of those factors, e.g., Comeaux and McDonald (2018) suggested that the interaction of the type of the cues, the cues' appearing frequency, and the dominance of the cues affected the learning effects of CATE-based intervention. Actually, as early as in 2008, Han et al. (2008) had systematically discussed the possible interactions among CATE, prior knowledge and repetition based on empirical studies (for review, see Han et al., 2008). Their analysis laid the theoretical foundation of revealing the intrinsic cooperating mechanism of those factors that work within a CATE-based intervention.

CATE in L2 grammatical and syntactic knowledge acquisition
Grammatical-and syntactic awareness, is described as the ability of reflecting on sentence structures and the related rules of using and arranging words (Park & Warschauer, 2016). In the last 10 years, CATE technique has been widely developed and used in increasing the flexibility of the method, because the multi-media technology could make the computer-mediated contexts more vividly by incorporating sound, animation and interactivity (Gascoigne, 2006;Liu & Leveridge, 2017;Meurers et al., 2010;Park & Warschauer, 2016). Most importantly, multi-media enables learners to generalise abilities of exploiting the textual, aural, and visual cues in developing comprehension (Ulitsky, 2000).
Most successful cases of applying CATE appeared in grammatical and syntactic knowledge learning. The effects brought by CATE in grammar learning were widely acknowledged. For example, Park et al. (2012) suggested in their study that CATE could effectively increase the attention allocation on targets. Consequently, the participants in the team with CATE assistance outperformed their peers who were without CATE assistance. In addition to the single syntactic-feature acquisition, CATE was also used to improve the noticing of the syntactic structures (Park & Warschauer, 2016). In Walker et al.'s study (2005), they employed VSTF technology to visually separate and indent the phrases that were nested within a single sentence, and brought the meaning and underlying structures of a given sentence to a more prominent position. Their experimental results showed that low-proficiency students gained significant improvements in syntactic awareness. Nevertheless, it was also reported that CATE may impede reading comprehension, e.g., in Park et al.'s study (2012), they implied that the attention allocated to enhanced targets reduced the processing of comprehension. Winke (2013) similarly concluded the trade-off of using CATE in second language acquisition (SLA), that is "More attention to forms resulted in less to meaning." In sum, the successful cases mentioned above shed light on how to assist learners in noticing and comprehending syntactic knowledge that is hidden behind combinations of words, that is, the syntactic features omitted by naturalistic L2-learners could be prominent by reforming learners' attention distribution (Cintrón-Valentín & Ellis, 2016).

Visual attention, prior knowledge, and repetition in CATE
Exposing adult SLA beginners to intensive input to generate stable recognition patterns regarding L2 representation is advocated by Rast (2008), especially to generate awareness of noticing the syntactic features that are hard to be noticed by learners, such as morphological cues with poor semantic meanings (Ellis & Sagarra, 2011). Ellis suggested that, the remedy of the "fossilized" responses toward L2 input is to bring the information into the light of consciousness; therefore, interventions of consciousness raising or form-focus can help learners notice the cue in the first place (Ellis, 2002). The theory that supports this claim originated from Schmidt's Noticing Hypothesis (Schmidt, 1990), which holds that input cannot be transformed to intake for processing unless it is noticed, that is, consciously registered.
Attention could also be taught and learned (Indrarathne & Kormos, 2017). Ellis et al. (2011) testified the effectiveness of applying intensive pattern training for learners who studied abroad to help them notice target L2 syntactic features by reshaping their learned attention distribution in L1. The results were promising. However, they also admitted that not all learners have the opportunity to study abroad, suggesting that sufficient pedagogical interventions should be added into routine learning, such as providing feedback on errors and applying CATE technology (Ellis & Sagarra, 2010). Although the view that CATE could effectively attract learners' attention to the targets and improve the apperception of the syntactic knowledge has been widely acknowledged, e.g., see (Winke, 2013), its efficiency of how it transfers learned attention to feature acquisition remains unclear.
Prior knowledge of the targets may significantly affect the degree of acquisition of the enhanced forms (Lee & Huang, 2008;Webb et al., 2013). Prior knowledge is referred to by many studies as "what learners had already known about the target knowledge," e.g., see (Lee, 2007) and (Winke, 2013). It was assumed that the learners with intermediate knowledge regarding the target forms would receive the best intervention result, given that the beginners had insufficient prior-knowledge for understanding the new knowledge while the advanced learners had abundant previous exposure to the targets (Park et al., 2012). The learners' prior knowledge was also presumed to be correlated with their understanding of the CATE-targeted knowledge, e.g. Szudarski and Carter (2016) reported that their participants who had rather low vocabulary knowledge were lack of the gains in the form-recall test, although they had noticed the targets; and Park and Warschauer (2016) reported that the enhanced grammatical structures of a sentence were too complicated to be understood for young L2 beginners because of lacking the necessary prior knowledge.
Repetition refers to the enhancing of the salience of target knowledge with a manually increased frequency (Han et al., 2008). Similar to the training epoch in machine learning, repetition has been verified as an important factor for the effectiveness of input flood (Ellis, 2002;Rott, 2007;Yang et al., 2019). However, the effect of repetition is also reported to be limited, e.g., Szudarski and Carter (2016) suggested in their study that the overloaded repetition would not improve learners' performance, but led them into resistance of perceiving from the intervention; and Rott (2007) found that four times repetition of the textual enhanced content showed no priorities over the CATE content without repetition, but the repetition followed by CATE showed a prominent efficacy. Moreover, some explicit instructions could override the effects of repetition (Eckerth & Tavakoli, 2012), such as the effect of 'Motivational-Cognitive Involvement' (Laufer & Hulstijn, 2001). Although numerous studies, e.g., Han et al. (2008), had already discussed the possible interaction among prior knowledge, repetition, and attention in diverging the effect of CATEbased intervention, questions like, how the effect of repetition varies because of the changing of the attention and prior knowledge, was rarely discussed in the existing studies.

Visual-attention related computational models in L2 learning
Modelling humans' visual attention in an interactive environment has a long research history. In those computational models, humans are usually actively engaged in a task that is highly correlated with the using of visual-attention, and thus are strongly top-down driven (Borji et al., 2011;Borji & Itti, 2013). Computational models involving visual-attention in L2 learning not only revealed the mechanisms of how people dominated their visual attention but also strove to understand the causal mechanisms that could explain the relationship between a series of observed phenomena (Mareschal & Thomas, 2007). For example, Monaghan et al. (2017) employed a function of L2 language exposure to simulate the weakening of the frequency effect; Yang et al. (2022) built a computational model to simulate the effective boundary of audio-assisted reading with cognitive-load as the modulating variable. This type of computational models focuses on building an explanatory framework for language learning processes as well as interactive factors (for review, see Dijkstra et al., 2019), which even unifies bilingualism within a single infrastructure if the transparency of the model is sufficient (e.g., see Monaghan et al., 2017).
Another usage of building computational models in language learning is to work out unexpected implications of a theory, because "the world is highly interactive, even a simple process theory can lead to unforeseen behaviors." (Mareschal & Thomas, 2007, p. 2). For this usage, transparency is no longer a prerequisite of the model, as simulating and predicting the unexpected results based on the experience data become major concerns. For example, Wang et al., employed convolutional neural network (CNN; 2018) and Recurrent Neural Network (RNN; 2019) to predict the eye movements of readers reading a previously unseen text by using existing eye movements as training data. Their models implied that successful readers' visual attention pattern may not be suitable for the less-skilled readers, which stood on the opposite side of a popular pedagogical method that trains less-skilled readers with their successful peers' attention pattern to help the former achieve better reading outcomes. Although those models cannot explain how the outcome was generated because of the intervention, they provide researchers with unexpected implications based on experience data.
By summarising the references listed above, it can be concluded that computational models involving visual attention in L2 learning are classified into two types according to their usage: white-box-technique-based models and black-boxtechnique-based models. For the latter, deep learning models for natural language processing (NLP), e.g., CNN (Aloysius & Geetha, 2017) and Bidirectional Encoder Representations from Transformers (BERT, Devlin et al., 2019) were the first choices of the researchers as those techniques' predicting accuracy was far superior to that of white-box techniques. However, those models are inappropriate for observing causal-result relations among intervention and outcome unless rules can be extracted from the black-boxes. Conclusively, because of the limited explanatory ability of black-box techniques, most visual-attention related computational models adopted white-box-technique as the basic infrastructure.

This study
As mentioned above, although visual attention, repetition of the CATE-based intervention, and the learners' prior knowledge of the targets are of vital importance in deciding the efficacy of applying CATE in L2-knowledge-acquisition, those factors' intrinsic cooperating mechanism, and their compound impacts on L2-knowledge learning remain unclear. Therefore, this study uses different content modes to represent environments in which CATE-based interventions took place, and uses different levels of prior knowledge, visual attention and repetition as variables.
By building a computational model, this paper aims to answer the following questions: 1. How do prior knowledge, visual attention, and repetition intrinsically cooperate with each other in CATE-assisted L2-knowledge acquisition? 2. How does the efficacy of CATE-based intervention vary due to the change of prior knowledge, visual attention, and repetition?

Participants
This study used a design of three comparison groups to represent three types of content modes in which CATE took place for young Chinese-English students; 38 Mandarin-speaking students aged 8 (mean = 8.03, sd = 0.56) participated in the experiment. All participants were in second grade at a primary school in Sichuan province of China. Small gifts were promised to each student as rewards for his/her participation. Thirteen of them were randomly assigned to group-1 (G1), in which they were intervened with the materials presented in one-by-one mode; fourteen of them were randomly assigned to group-2 (G2), in which they were intervened with the materials presented in batch mode (simulating the flood mode); the rest (n = 11) were assigned to group-3 (G3), in which they were intervened with the materials presented in a batch mode and at the same time attached with L1 lexical information.

Experimental design and data processing flow
This study was designed in a quasi-experimental way. Firstly, a 3 × 3 × 3 × 3 MANOVA between subjects' design was employed by the experiment to observe the phenomena from a behavioural perspective, in which prior-knowledge (poor; general; good), salience (poor; general; good), effort (poor; normal; good), and epoch (1; 2; 3) were the variables. Here, data of prior-knowledge were collected from the pre-test implemented through a card-combination game; and data of effort were calculated through the participants' dwell time proportion of eyegaze. Three levels of salience correspond to three experimental groups. Three levels of epoch refer to the different intervention rounds.
Secondly, in addition to analysing dwell time proportion, Kullback-Leibler (KL) divergence was used to compare the degree of similarity among the attention patterns of three experimental groups. Finally, based on the findings revealed by those statistical analysis, a linear model was built to simulate the interactive effects on learning outcome made by prior-knowledge, salience, effort, and epoch.

Targeted grammatical knowledge and involved materials
Most EFL teachers adopt usage-based learning strategies (Elgort, 2017;Lin, 2014) in their regular face-to-face classes as most L1 teachers do. Possessive pronouns are not specifically taught in students' formal English learning classes, but are repeatedly confronted during the regular reading, belonging to the grammar that form the foundation of the students' future English learning. However, because of the lack of explicit instructions from teachers, students in grade-2 cannot generate an appropriate correspondence between the personal pronouns and their possessive pronouns in L2. Although participants in this study had little knowledge of the target forms, they had been repeatedly exposed to the declarative sentences and had the ability to translate subject, verb, and object (SVO) structured sentences into Chinese, i.e., they had acquired SVO declarative sentences on recognition level. Thus, this study chose the transformation from personal to possessive pronouns as the grammatical objective of the tasks to drive participants to deploy their visual attention, using the CATE technique (animation and colour) to make the target grammatical knowledge salient.
Therefore, the materials used in the experiment included two major types of sentence structures: declarative sentences using personal pronouns as subjects and the sentences with possessive pronouns followed by nouns as subjects. The syntactic structures of the sentences were basic ones following the subject, verb and object (SVO) structure. The words were those that had been repeatedly confronted by the participants, and the sentences were listed in "Appendix A".

Computer-Assisted Textual Enhancement (CATE) design
Approaches commonly used to help students raise attention on targets include discrete-item exercises to increase metalinguistic knowledge, simplifying texts to reduce syntactic complexity, and CATE to make language input salient (Park & Warschauer, 2016). Therefore, after controlling for the difficulty level of the materials and the possible influences from teachers, CATE technology was designed to guide participants' attention towards the target grammatical knowledge.
To reveal the nature of attention that may influence the acquiring of L2 grammatical knowledge from reading, this study designed three types of content presentation modes for simulating three typical reading scenes (see Fig. 1) in which the salience of the CATE was different in each situation. The first type of content mode presents sentences one by one, the enhancement of the target knowledge in each sentence is implemented through animation-based CATE. This content mode was designed to simulate the reading situation, in which students focus on a single sentence each time, and the salience of the CATE is most prominent compared to the rest two content modes. Thus, this content mode will be referred to later as the "good" level of the variable salience, and will be applied in G1. The second type of content mode is called batch-content-presentation mode, and is designed to simulate another reading scene, in which participants may intensely confront similar sentences within a reading content. In this reading situation, the salience of the CATE degrades compared to the former mode, but was the most commonly confronted situation for L2 reading. Thus, this content mode will be referred to as the "general" level of the variable salience, and will be applied in G2. The third type of the content mode was designed to simulate the reading scene that readers label L1 lexical information on the reading contents, which was crowded with L1 information. This reading mode makes the Fig. 1 a Content presentation mode for G1; b Content presentation mode for G2; c Content presentation mode for G3 salience of the CATE the least prominent compared to the other two content modes. Thus, this mode will be referred to as the "poor" level of the variable salience, and will be applied in G3.

Card-combination game
A card-combination game was used in the pre-test, as well as in all post-tests immediately after the intervention with the aim of evaluating participants' current knowledge of understanding the possessive formation of subject nouns. There were two reasons for choosing the card-combination game in this study: (1) it is a transformation of the sentence production process (Alanen, 1995) and was more like a game than a test for eight-year-old children, and (2) it can collect multi-faceted information to establish a knowledge database for each participant via recording their performance.
In the card-combination game, each participant was asked to generate two independent declarative sentences by selecting out appropriate words given by a teacher. Figure 2 demonstrates the process of forming two sentences by a student in the cardcombination game. The first sentence was a declarative one with a personal noun as subject, and the second sentence required the student to use the possessive pronoun as a subject. Teachers gave scrabble cards to a participant and then asked him/her to use all given cards to separately form two complete sentences. All participants' performances were video recorded. The sentence pairs used in the instant post-tests were 1 : (1) he has an apple. his Dad has an apple. (2) She eats a cake. Her Mom eats a cake. (3) It is a ball. Its colour is red.
If the learners generated correct answers for each sentence pair, they scored a point. Otherwise, they scored zero. Therefore, the students' score range would be [0, 3]. Here, the correct answer was defined as 'the participants can correctly use possessive pronouns followed by nouns as subjects in the sentences.' Therefore, student k was tested by three sentence pairs (it-its, she-her, and he-his), and the normalised performance in pre-and each post-test was logged as proportion correct parameter a k , where a k = score 3 , a k ∈ [0, 1]. Fig. 2 Screenshots of the beginning, middle, and end of a card-combination game

Eye-movement tracking
An eye-tracker (a Tobii T120 running python packages) was employed to monitor students' attention distribution while they received interventions. In our experiment, participants were required to watch videos displayed on the screen of an eye-tracker. The dividing of the different areas of interest is a partition of the screen, as shown in Fig. 3. That is, the union of all areas form a visual domain, and the intersection of the three areas is empty. Three areas are: the area covers target syntactical knowledge (targeted area), the area covers the rest of the content (un-targeted area), and the area that covers the other areas (other area, i.e., blank area and the area outside the screen). The three areas form a partition of the whole visual domain, each one is independent of another. Thus, the attention distribution in each test was formed by participants' eye-gaze dwell proportion on each area. The effort (E1) was calculated by adding the attention distribution on the target area (Ef1) and on the un-targeted area (Ef2). Original SEEV models defined effort as the physical movement of the human which may inhibit scanning. Thus, effort in the SEEV models may negatively influence the allocation of attention. In many other studies (e.g., Gollan & Ferscha, 2016), effort was interpreted as the participants' effort to attend or engage in a particular task. In these works, effort positively influenced attention allocation. In this paper, students had no physical interactions with the learning environment and thus, the effort can be interpreted as their attention-control ability (attention distribution on the other area). We transferred the effort from negative to positive by calculating the attention distribution spent on reading contents (Ef1 + Ef2), as the summation of the three (Ef1 + Ef2 + eye-gaze dwell proportion on other area) was equal to 1.
The eye-tracking data was calculated to form attention distribution by summing up all the Dwell Time (> 300 ms) regarding a same area of interest in an intervention. The Summed Dwell Time on each area of interest was normalised

Method of comparing attention distributions
Kullback-Leibler (KL) divergence, also called relative entropy, was employed in this study to compute the mutual similarity among the visual attention distributions of three experimental groups. KL divergence is a commonly used method to measure visual attention. However, because of its asymmetrical characteristics, many studies in vision research expanded it to symmetrical KL divergence (Borji & Itti, 2013). Both asymmetrical and symmetrical KL were positive. The larger the KL value is, the more different those two distributions are. Therefore, the KL value between two identical distributions would be 0. Specifically, symmetrical KL divergence is computed according to the following formula (Borji & Itti, 2013): where q refers to the probability space of the attention, in our experiment, the visual area was divided into three parts: targeted, un-targeted and other. Thus, q = 3. H and R are two different distributions, therefore, ∑ q k H k = 1 and ∑ q k R k = 1.

Procedure
The experiment was conducted in participants' second semester at grade-2, and lasted about 3 months (the experiment was performed 3-4 days a week, each time completed within an hour). First, all participants were required to implement the pre-test, and then were randomly clustered into three different groups by excluding the participants who achieved full scores in the tests; Second, the participants from three groups were presented with reading contents in different modes, and their eye-movement data was recorded by the eye-tracker; Third, each participant was required to undergo the instant post-tests. The process beginning from the second step to the third step would be conducted for three epochs. All participants received the intervention and instant post-test in 10 min at a frequency of 1 time per week, which is also the acceptable intervention frequency by English teachers. The procedure of the experiment was illustrated in Fig. 4.

Learning effects of the target knowledge
The learning effects (proportion correct a) were calculated from the card-combination games. Table 1 showed the results of a 3 × 3 × 3 × 3 MANOVA between subjects' design with the variables of prior-knowledge (poor; general; good), salience (poor; general; good), effort (poor; normal; good), and epoch (1; 2; 3). (1) Prior knowledge was calculated according to the participant' pre-test scores, and if a k ∈ [0, 0.33] , the prior knowledge level of student k was evaluated as "poor", if a k ∈ (0.33, 0.67] , the prior-knowledge level of student k was evaluated as "general", while a k was located in the area of (0.67, 1], the prior-knowledge level of student k was evaluated as "good." Additionally, the effort scores were divided into three areas in a non-increasing order, and defined effort = "good" if the effort score was located in the area of (1, 0.98] area, effort = "normal" if the scores were located in the area of (0.98, 0.95], and effort = "poor" if the scores were located in the area of (0.95, 0.73], where the threshold value 0.95 was the average of all effort values, and the threshold 0.98 was the average of all effort values that are higher than 0.95. Figure 5 plotted the proportion correct α at pre-test and post-tests for three groups, suggesting that only the improvement for G3 (mean = 0.36) was higher than G1 (mean = 0.25) and G2 (mean = 0.14).

Attention distribution patterns
This section compares the mutual KL divergences among three groups and the results are listed in Table 2. Table 2 revealed that the attention distribution of G3  is significantly different from those of G1 and G2 since D KL (G1, G2) is almost 10 times of D KL (G1, G3) and D KL (G2, G3). L1 information almost reshaped the attention distribution via increasing the effort on searching information. Table 3 showed the ANOVA analysis for the effort-related variables. It revealed that the content mode significantly influenced learners' physical effort of exploring information that may be related to the task. The influence on the effort spent on target area was insignificant, F(2, 99) = 2.4, p = 0.093, but the influence on the effort spent on the un-targeted area was significant, F(2,99) = 5.04, p = 0.008. Especially when the contents were integrated with L1 information, the effort-devoting was significantly higher than other two attention patterns.

Modelling the prior knowledge, visual attention, and repetition in grammatical knowledge acquisition
So far, we have verified the already known fact for most EFL teachers, that is, the prior knowledge effects as a major factor that influences the effectiveness of using CATE. But, how CATE-dominant effort, the salience of CATE and the repetition tune the learning effects remains unknown. Here, we model the prior knowledge (P), the CATE-dominant effort (E, Ef 1 and Ef 2 ), the salience of CATE (S), and repetition (E) as Formula (2) shows: where t refers to a particular grammatical knowledge targeted in this study and P t = prior grammatical awareness, which was updated according to the evolving of the epoch, i.e., the results of the last instant post-test in the last round were considered as the prior grammatical awareness of this round; Ef 1 = attention distribution on targeted area; Ef 2 = attention distribution on un-targeted area; S = Salience; E 2 = Epoch; b was the 'easy-perceiving level' coefficient, which was designed as an estimator of the target knowledge based on the novel assumption that the easier the target knowledge is, the faster the target knowledge be perceived by learners. In this formula, Prior knowledge (P) is considered as the dominant factor, and the interactive influences among Effort, Salience and Epoch (ESE) are considered as one Table 3 ANOVA analysis for the effort-related variables **Sig. < .01, Ƞ 2 = .01 (small effect); Ƞ 2 = .06 (medium effect); Ƞ 2 = .14 (large effect); Effort(E1) = attention distribution on targeted area(Ef1) + attention distribution on un-targeted area(Ef2) modulating parameter in the fitting. We employed logistic model which was also adopted by SEEV (Horrey et al., 2006) as the regressor and compressor to simulate the ESE effect. Although the interactive influence among variables were also taken into account to build the model, Level-2, degree-2 polynomial logistic regression model achieved best fitting results as the MSE (mean squared error) of predicting positive samples in both training dataset and cross validation dataset in this model was minimal. It needs to be noted that the nature of the PESE model is a linear model, which follows the principle of transparency of building computational models in explaining how human learns. 80% of the samples from this study's experiment were used as training data, and 20% of them were considered as cross-validation data. In Fig. 6a, we plotted the variability in Ef 1 by Ef 2 of three content-presentation modes in this study, as well as the variability in Ef 1 by Ef 2 in acquiring the 'be'question features. This figure represented a mutual-relationship between Ef 1 and Ef 2 , showing that the attention spent on targeted content and un-targeted content was mutually against to each other, but the variance of the Ef 1 by Ef 2 was largely due to the salience of the targets. To evaluate the robustness of this PESE computational model, we employed another dataset to train and test the model, the data was originated from an experiment of acquiring English questions' features with the assistance of CATE. 80% of the samples from that experiment were used as training data, and the rest 20% of them were used as cross-validation data. The experiment involved a same group of participants and was conducted in a semester before. 2 The linguistic features involved in that experiment were those of English 'be'-questions, and the experiment also employed animation to enhance the salience of the targets, the content was presented in a batch mode without L1 information. The computing of the accuracy proportion a k was in line with the normalisation method used in this paper. After trained the model by using two different datasets from two experiments, the final two models with learned coefficients achieved an average 79% accuracy on fitting and predicting learners' learning outcomes of targeted grammatical features of applying CATE. Figure 6b plotted the interactive efficacy among effort, salience and intervention rounds of acquiring possessive pronouns (0.97*Exp(Ef 1 , Ef 2 , S t , E 2 )); and the interactive efficacy among effort, salience and intervention rounds of acquiring 'be' question features (0.13*Exp(Ef 1 , Ef 2 , S t , E 2 )). It needs to be noted that all the results were the outputs of the trained models, therefore, there is a difference between a k s for the first 3 rounds in Fig. 6c and a k s in the Fig. 5. The outputs regarding epoch 4, 5, …10, were estimated outputs of two trained models by setting parameter E 2 to 4, 5, …0.10. It could be seen from Fig. 6b, c that the estimation of the coefficient b could be used to label the 'easy-perceiving level' of the target grammar for the participants, suggesting that the easier the target being perceived (small b), the quicker the ceiling of ESE's effects reach, and vice versa (large b). Figure 6b shows that, the effect of ESE regarding 'be'-question features reached the ceiling at the end of the first intervention round. Figure 6c plotted the proportion correct α predicted by a computational model as a function of intervention rounds (E 2 ) with regard to G1, G2, G3 in possessive pronouns' acquisition, and the proportion correct α predicted by another computational model as a function of intervention rounds (E 2 ) with regard to G2 (content was presented in a batch mode) in 'be'-question features' acquisition. As shown, the proportion correct of G3 was expected to achieve 1 and the proportion correct of others converged in the range of [0.5, 0.65]. The serial curves suggested that under most conditions, the efficacy of CATE has a 'resistance trait.'

Discussion
The PESE computational model achieved prominent fitting and predicting results (achieved average 79% accuracy) on the data collected from two experiments, revealing a reasonable assumption of the efficacy of applying CATE in L2-grammatical-knowledge learning, by using learners' prior knowledge of the targets, their TEdominant attention effort, the salience of the targets, and CATE-based intervention epochs as the modulating variables. The experiment results showed that the efficacy of CATE is not only a function of general prior knowledge (Han et al., 2008), but also a function of the interactive effects of the other variables. The PESE model suggested that although prior knowledge of the target was the dominant factor of deciding the efficacy of the CATE-based intervention, the interactive effects of other variables could partly compensate for the effect discrepancies because of the lack of prior-knowledge. Moreover, the coefficient b, i.e., the easy-perceiving level of the target knowledge for learners decided the effect ceiling of repetition; whether the intervention had a 'resistance trait' or not was decided by the inductive process that involved with adequate information.
It has been widely accepted that the learners' different prior knowledge may influence the efficacy of CATE, e.g., see Park and Warschauer (2016) and Han et al. (2008), and has been suggested to control learners' prior knowledge before using focused CATE technologies (Meurers et al., 2010). This study had carefully controlled for the learners' prior-knowledge differences by using SVO structure as the basic form of the sentences, and involving the words that had been sufficiently familiarised by learners in the intervention. However, the initial discrepancies of learners' prior-knowledge among different groups showed significant influence on the intervention effect. It could be seen from Fig. 5 that, the students in G1 who had intermediate prior-knowledge of the targets achieved best intervention results at the end of the third epoch, which is in line with the results found by Park et al. (2012). Nevertheless, Fig. 6c illustrated that the interactive effects of ESE could partly compensate for this less-efficacy as the convergent proportion correct of G2 was superior to that of G1; and the positive L1-L2 knowledge-transfer quickly facilitated learners in conducting comprehension activities, which in consequence compensated for learners' inadequate prior-knowledge of targets. The effect of L1-information was simulated through variable S in PESE model. PESE model showed that the positive L1-L2 knowledge-transfer initiated a new inductive process, which consequently led to the significant improvements of the learners' awareness of the target knowledge (see Fig. 5). In contrast, the efficacy of the interventions without L1-information quickly reached peaks and entered into a resistance period, see Fig. 6c. This 'resistance trait' of the CATE-based intervention had also been reported by previous studies, e.g., in Rott's study (2007), learners achieved best performance just through one time of intervention, and in Szudarski and Carter (2016), their participants' best performance appeared in the 6th epoch. Figure 6b showed that the increment of proportion correct made by ESE in each epoch was decided by the 'easy-perceiving level' of the grammar, i.e., the best intervention rounds could be predicated through coefficient b, and the coefficient b also can be estimated through experience data or by teachers. In this study, the proportion correct of 'be'-question features converged at the third epoch, where b = 0.13; in contrast, the proportion correct of possessive-pronoun features converged at about 5th epoch, where b = 0.97. Moreover, PESE could also simulate the inductive efforts that override the effects of repetition (Eckerth & Tavakoli, 2012), it can be found in Fig. 6c that the inductive process initiated by the L1-L2 knowledge-transfer expedited the convergence speed of the proportion correct by converging at the 5th epoch, in contrast, the proportion correct of G2 was converged at the 6th epoch.
It is noteworthy that, the inductive efforts were not only appeared in G3, but also appeared in G1 and G2 as the attention effort invoked by learners was spent both on targeted-and un-targeted area, suggesting that those grade-2 students actively behaved motivational-cognitive-involvement (Laufer & Hulstijn, 2001) in understanding the transformation of the subject pronouns to possessive pronouns. CATE could not prevent them from searching related information when they conducted this comprehension process (Table 2 revealed that the attention spent on non-targeted area was almost the same as the attention spent on the targeted area). Although Eckerth and Tavakoli (2012) demonstrated that the input-output cycle which was superior to the repeated input could be implemented by integrating the inductive activities into the reading tasks, PESE complemented that only the spontaneously behaved motivational-cognitive-involvement was not strong enough to help learners break the 'resistance trait,' unless the information that could initiate new inductive activities had also been involved in the cycle, e.g., the L1-information that could facilitate learners' inductive activities of understanding the target knowledge.

Pedagogical implications
Based on the conclusions mentioned above, three pedagogical implications are proposed for English teachers who may use CATE in their classrooms at lower grades. First, a list of the 'easy-perceiving levels' of the potential knowledge that may be targeted at by using CATE should be built. It is recommended to capture the coefficient bs of the knowledge by fitting the data collected from empirical experiments into PESE model; or, alternatively, teachers could estimate the bs based on their experience by comparing the knowledge with those referred to in this study.
Second, the best repetition rounds of applying CATE for young EFL learners in L2-knowledge acquisition should be decided according to the 'easy-perceiving level' of the target (but not recommend to exceed 6 rounds). After reaching the efficacy ceiling of repetition, new information or instructions should be added into the intervention to initiate a new inductive process of knowledge comprehension, i.e., to change the status of variable S.
Third, despite the learners' prior knowledge at intermediate level is the optimal prerequisite of applying CATE-based intervention, learners with insufficient prior knowledge could also benefit from the CATE-based intervention, i.e., the efficacy of the CATE-based intervention could be compensated for by strengthening the interactive effects of ESE. For example, by opportunely increasing the repetition number, by adding new information to stimulate further inductive activities, and by improving learners' attention distribution if the attention they devoted into tasks was not satisfying.

Limitations and future work
This study employed a transparent linear model to simulate the interaction among prior knowledge, CATE-based visual attention, and repetition under different content modes in which CATE took place. The model provided answers for EFL teachers regarding questions like, why the effectiveness of applying CATE in classroom is unstable, or to what extent that the interactive effects of ESE can compensate for the lack of the prior knowledge according to the 'easy-perceiving level' of the target knowledge. However, the current model remains the room for improvement in accuracy. Therefore, one of our future works will focus on studying the possibility of employing deep learning techniques with rule extraction function as the basic infrastructure for building model's ESE part. The model is expected to be improved both in its accuracy in fitting and predicting the data collected from classroom experiments, and in its explanatory ability. Secondly, although the bs learned from the experimental data revealed their correspondence with the difficulty levels in learning targeted grammatical items, the mapping of the two remains in a rough relation which may prevent teachers from accurately estimating b. Providing a precisely correspondence between coefficients and the difficulty levels in learning grammatical items is our another ongoing works.
1. The sentences used in card-combination game were not be initiated with capital letters given that the participants were less familiar with the capital letters in contrast to lower-case letters. 2. The details of how the experiment conducted can be found in another manuscript of us which has been submitted to another journal and is currently under review.