Introduction

Generally speaking, there are three modes of feedback: Teacher feedback, peer feedback, and AWE feedback. However, each type of feedback has its own advantages and disadvantages. With the rise of technology-empowered education, researchers have suggested that teacher feedback and AWE feedback should be integrated to enhance the effectiveness of feedback for L2 writing, considering the affordances and limitations of the two feedback sources (Link et al., 2022; Ranalli, 2021; Thi & Nikolov, 2022). In our study, AWE-teacher integrated feedback was operationalized as the combination of AWE feedback and teacher feedback, wherein students received AWE feedback on the first drafts of their writing and then received teacher feedback on the revised drafts. While AWE-teacher integrated feedback has been highly recommended in technology-empowered contexts, few studies have focused on it. Consequently, it is not clear whether this integrated feedback can improve L2 learners’ writing performance, and how they attend to such feedback (Thi & Nikolov, 2022; Xu & Zhang, 2022).

To better understand the influence of feedback on writing performance, it is necessary to explore how L2 learners engage with it. As researchers of some previous studies have argued (e.g., Cheng et al., 2023; Cheng & Zhang, 2024; 2023; Zhang & Hyland, 2022), feedback has the potential to improve students’ writing performance, but it does not contribute to students’ performance automatically. Instead, learners’ engagement taps into the potential of feedback fully. Specifically, through the examination of students’ engagement, we can gain a deep insight into how students analyze, interpret, construct, and internalize feedback, which enables them to transfer feedback to their subsequent learning and then improve their learning outcomes (Yang & Zhang, 2023). This means that engagement mediates students’ learning and performance in writing (Mao & Lee, 2023; Sulis, 2022).

Considering the popularity of AWE-teacher integrated feedback and the importance of learners’ engagement, our study concentrated on how L2 learners engaged with such feedback affectively, behaviorally, and cognitively, and how their engagement woud influence their writing performance in the Chinese EFL context.

Literature Review

Teacher Feedback and AWE Feedback in L2 Writing

Teacher feedback is the most traditional and important source of feedback. In the existing studies, a host of researchers have reported its beneficial effects on L2 writing, evidencing that such a practice can enhance L2 learners’ writing accuracy in both revised drafts and new pieces of writing (Bitchener & Knoch, 2010; Cheng & Zhang, 2021; Karim & Nassaji, 2020; Kim & Emeliyanova, 2021).

Despite the benefits of teacher feedback, we should pay attention to the limitations in the way that teacher feedback is provided in traditional L2 writing contexts. First, when giving feedback, L2 teachers direct much attention to local areas, particularly grammar in their students’ writing, disregarding the high-order issues in writing (e.g., content, organization, and genre) (Cheng et al., 2021; Mao & Crosthwaite, 2019). More importantly, it is a herculean task for L2 teachers to offer feedback. They are challenged by a range of contextual factors, including time constraint, tight teaching schedule, large-size classes, and learners with varied L2 proficiency (Cheng et al., 2021; Lee, 2017). These contextual constraints prevent L2 teachers from providing written feedback efficiently, which may miss the best timing of giving feedback and undermine its efficacy. In this situation, it is imperative that measures be adopted to lower teachers’ workload in feedback provision and improve the effectiveness of teacher feedback, among which employing AWE feedback appears to be a viable strategy.

With the rapid development of educational technology and natural language processing techniques, AWE feedback has increasingly gained attention in L2 writing teaching and research. Currently, the pedagogical application of AWE feedback has become a common practice. Compared with teacher feedback, AWE feedback enjoys several advantages: Immediacy, convenience, and multiple revision opportunities (Ranalli, 2021; Thi & Nikolov, 2022). Studies have explored the usefulness of AWE in L2 writing (Li et al., 2015; Wang et al., 2013; Xu & Zhang, 2022). These studies have reported that such feedback can contribute to L2 learners’ writing performance. In spite of the growing popularity of AWE feedback in pedagogical application, such feedback has several weaknesses (Thi & Nikolov, 2022). First, it predominantly focuses on local issues of writing, paying little attention to high-order dimensions. Second, AWE feedback tends to be fallible and inappropriate. Finally, since AWE systems are designed based on large language models, AWE feedback, to some extent, lacks flexibility, and has the nature of one-fits-all. Given such pitfalls, AWE feedback cannot replace teacher feedback in L2 writing instruction.

According to the above discussion, both teacher feedback and AWE feedback have their own respective affordances and constraints. To minimize their individual disadvantages and enhance the utility of feedback in practice, some scholars have recommended that teachers should employ AWE-teacher feedback, integrating the two sources of feedback (e.g., Jiang et al., 2020; Link et al., 2022; O’Neill & Russell, 2019). As these researchers argued, the combination of AWE feedback and teacher feedback may reduce teachers’ pressure on providing written feedback and free them from providing low-level language-oriented feedback. Thus, teachers can center on global aspects of writing. It can be speculated that in such a feedback model, AWE feedback mainly focuses on language errors, while teacher feedback targets global issues in writing.

Unfortunately, AWE-teacher integrated feedback is under-explored in current literature so far. Much remains to be discovered about how L2 learners engage with it in order to understand its mechanism.

Student Engagement with L2 Writing Feedback

As an umbrella term, engagement refers to the extent to which students are devoted to their learning (Sulis, 2022). Underpinned by the previous studies (e.g., Cheng & Zhang, 2024; Ellis, 2010; Fredricks et al., 2004; Han & Hyland, 2015; Zhang & Hyland, 2018, 2022; Zheng & Yu, 2018), our study considered student engagement as a multifaceted construct with three intertwined elements: (1) affective engagement is defined as students’ attitudinal responses to feedback; (2) behavioral engagement refers to how students address feedback, including feedback operations and feedback strategies; (3) cognitive engagement concerns the awareness of feedback (noticing or understanding) and cognitive/metacognitive operations students deploy to process feedback and facilitate their revisions. The integration of the three dimensions enables us to have a nuanced understanding of how L2 learners respond to AWE-teacher integrated feedback in the Chinese tertiary EFL context.

Considering the significance of student engagement with feedback, there is a proliferation of studies on this issue in the current literature. Among these studies, scholars have focused on L2 learners’ engagement with teacher feedback. For instance, Zheng and Yu (2018) investigated how low-proficiency L2 learners attended to teacher WCF and reported that the students exhibited superficial engagement from affective, behavioral, and cognitive perspectives. Cheng and Liu (2022) further found that L2 learners’ engagement with teacher feedback hinged on their L2 proficiency and feedback focus. Specifically, for teacher WCF, learners with high L2 proficiency had effortful engagement while low-proficiency counterparts did not engage with it assiduously. Interestingly, the two groups of students were disengaged with global feedback.

Also, the existing studies have looked into L2 learners’ engagement with AWE feedback. Zhang (2017) investigated how an individual participant engaged with AWE feedback produced by Pigai in the Chinese EFL context and he found that student engagement was a complex process influenced by both individual and contextual factors. Ranalli (2021) drew upon various sources of data including screen-capture recordings, stimulated recalls, and interviews to examine how L2 learners attended to AWE feedback and what factors mediated their engagement. Results showed that the students’ engagement with AWE feedback was proofreading-directed rather than learning-oriented. Unlike previous studies stressing the mediating role of language proficiency, the study found that trust in AWE feedback was the decisive factor influencing engagement.

Aside from the above studies focusing on single one source of feedback, a few studies have also examined L2 learners’ engagement with multiple sources of feedback. For example, Zhang and Hyland (2022) investigated how Chinese EFL learners attended to integrated feedback, which included AWE feedback, peer feedback, and teacher feedback. Drawing upon a variety of data sources, their study showed that the integrated approach to feedback helped enhance L2 learners’ affective, behavioral, and cognitive engagement with feedback.

To sum up, while the above studies add to our understanding of L2 learners’ engagement with feedback, we still have little knowledge about how L2 learners would respond to AWE-teacher integrated feedback. Furthermore, the existing studies in this line mainly focused on students’ engagement with feedback based on one-off feedback practice, paying little attention to the outcome of their engagement after multiple rounds of feedback, namely their writing performance. It is students’ engagement with feedback rather than feedback per se that influences their writing performance (Zhang, 2022). As a result, examining the effects of different degrees of engagement with feedback on learners’ writing performance is warranted.

To fill the important voids, this study employed a quasi-experimental design to address the following questions:

  • RQ1: Is there any difference between L2 learners receiving AWE-teacher integrated feedback and those receiving teacher feedback in terms of their affective, behavioral, and cognitive engagement?

  • RQ2: If yes, how do the two groups of students differ in their writing performance as measured for content, organization, vocabulary, and language use?

Methods

Context and Participants

Utilizing a convenience sampling technique, we recruited two parallel intact classes (n = 72) from a medium-ranking university in central China. The two classes were second-year English major students and were assigned into two groups randomly: Treatment group (TG) (receiving AWE-teacher integrated feedback, n = 36) and comparison group (CG) (receiving teacher feedback, n = 36). At the time of the study, all the participants attended English Writing Course 2, a compulsory course to increase students’ basic knowledge in English writing and improve their English writing proficiency. The students took the course weekly with two 45-min sessions over a 15-week semester and they were taught by the same teacher, Yan, who had earned her master’s degree in linguistics and had 10-year English teaching experience.

The demographic survey showed that 11 were male and 61 female students, with ages ranging from 17 to 20. At the time of data collection, these students had learned English for over 10 years on average. None of them had overseas learning experience, nor did they receive AWE-teacher integrated feedback.

Instruments

Writing Tests

Our study included a pretest and a posttest to investigate the changes in L2 learners’ writing performance after they engaged with different types of feedback. The genre of the writing was exposition, as it is a popular genre used to assess L2 learners’ writing proficiency. The topics (see Appendix A) for the writing tests were selected from the past TEM-4 (Test for English Majors Band 4) battery, which could be justified by two factors. First, as TEM-4 is a large-scale and well-established test, the difficulty of writing topics keeps consistent largely. Second, the topics of TEM-4 are drawn from students’ daily life, so they should be fair and familiar to all the students.

In each test, students were asked to complete their writing within 30 min in class and the length of the writing was no less than 200 words. They did not have access to external resources while completing their writing tasks. In terms of the pretest and posttest, we kept the writing genres, time allocation, and procedures constant. The two tests were administered by the first author.

Questionnaires

To explore student engagement with feedback, we administered questionnaires to the two groups at the end of our study. The questionnaire was adapted and developed based on Zhang and Jiang’s (2022) validated scale on student engagement with written feedback. The participants’ responses to the questionnaire were classified into a 5-point Likert scale, ranging from 1 (strongly disagree) to 5 (strong agree).

Before the data collection, we checked the validity of the questionnaire. Specifically, we conducted an exploratory factor analysis to assess the psychometric properties of the questionnaire. The results showed a three-factor model with 19 items, which encompasses affective (6 items), behavioral (6 items), and cognitive (7 items) dimensions. The Cronbach’s alpha for our questionnaire was 0.89, with the values of all the three subscales exceeding 0.7 (affective: 0.85; behavioral: 0.81; cognitive: 0.77) (DeVellis, 2012).

Semi-Structured Interviews

To deepen our understanding of student engagement with integrated feedback and teacher feedback, we also collected qualitative data from semi-structured interviews. We invited a total of 10 interviewees (5 students each group) to participate in the interviews. We conducted our interviews individually in Chinese to avoid any confusion and gather more information, each interview lasting 30–45 min. With the participants’ permission, we audio-recorded each interview for further analysis.

In addition, we collected the participants’ writing samples with teacher feedback from writing task 1–5 as complementary data to explore what feedback the teacher gave to the two groups of participants.

Intervention

The data in our study were collected from week 2–13 (see Table 1). The intervention started with the pretest in week 2 and was conducted every two weeks. Altogether, the participants received five sessions of intervention.

Table 1 Procedure of the study

In each intervention, students in TG were provided with AWE-teacher integrated feedback. As mentioned previously, our study operationalized such feedback as the integrated use of AWE feedback and teacher feedback. Specifically, the participants in TG received AWE feedback at the first stage and then they were offered teacher feedback (see Fig. 1). In other words, they received the two sources of written feedback at different stages. In our study, AWE feedback was provided by Pigai (http://www.pigai.org/) which is specifically designed for Chinese EFL learners. Pigai is similar to other AWE systems (e.g., Criterion) in terms of generating holistic scores, offering general comments on vocabulary, sentence structure, and delivering corrective feedback on grammar, vocabulary, and wording. Also, it provides students with collocations and synonyms. Furthermore, Pigai offers students metalinguistic explanations for errors. After AWE feedback, the students were asked to revise their writing and then submit the revised drafts to the course teacher. Next, after receiving the second drafts of students’ writing, the teacher reviewed them, detected the problems in the second drafts, and provided corresponding feedback. We did not ask the teacher to confine her feedback to local or global issues, after which the participants revised their writing accordingly and generated the final drafts.

Fig. 1
figure 1

AWE-teacher feedback process

The participants in CG received teacher feedback alone from the course teacher. In CG, we also did not require the teacher to focus on global or local areas of writing in feedback provision. Subsequently, the participants needed to revise their writing according to teacher feedback and then handed in their revised texts.

Immediately after the intervention, we administered the posttest and the questionnaires and conducted semi-structured interviews.

Data Analysis

Analysis of Teacher Feedback in TG and CG

To have a better understanding of how the teacher feedback differed in TG and CG, we analyzed the teacher feedback in terms of feedback focus. As the first, we needed to identify feedback points. In our study, feedback points refer to any type of written intervention by the teacher (Hyland, 2003). Following previous studies (e.g., Cheng et al., 2021; Mao & Crosthwaite, 2019), we categorized the teacher’s feedback from the two groups into local feedback and global feedback. After categorization, we tallied the number of local and global feedback points in each intervention in TG and CG.

  • Local issues: (1) grammar: errors in morphology and syntax; (2) vocabulary: errors in word choice and phrases.

  • Global issues: (1) content: feedback on conveyed information; (2) organization: feedback on structure of paragraphs and whole text, cohesion, and coherence.

Analysis of Writing Tests and Questionnaires

We scored the writing samples in TG and CG. In total, 144 written texts were collected and rated based on Jacobs et al.’s (1981) ESL Composition Profile, which has been used widely to assess L2 learners’ writing proficiency (e.g., Huang & Zhang, 2020; Teng & Zhang, 2020). The use of this scheme was based on our understanding that it was consistent with our research purpose. Considering that our study investigated the different dimensions of students’ writing performance, separate scores for global and local aspects were needed. To ensure the reliability of rating, we invited a PhD candidate in applied linguistics to be a co-rater. Approximately 20% writing samples were selected randomly and scored by the PhD student and the first author independently. The inter-rater reliability for the four dimensions was satisfactory (content: r = 0.86; organization: r = 0.90; vocabulary: r = 0.82; language use: r = 0.81). Moreover, the two scorers discussed to address the discrepancies in rating.

The writing scores and the questionnaire data were subjected to statistical analyses. First, the quantitative data were scrutinized for outliers, missing values, and particularly to ensure normal distribution. After confirming that the dataset were normally distributed, we conducted independent samples t-tests and paired samples t-tests to explore the between-group and within-group differences in different aspects of writing performance. As for questionnaire data, we performed independent samples t-tests to detect whether there were significant differences in the three perspectives of engagement between the two groups. Cohen’s d was used to measure the effect sizes of independent and paired samples t-tests.

Analysis of Semi-Structured Interviews

The recordings of semi-structured interviews were transcribed verbatim and the transcripts were sent to the participants for member checking. The transcripts were processed manually to understand how the participants in the two groups engaged with AWE-teacher feedback and teacher feedback in the three dimensions. First, we read and reread the transcripts to acquire a general understanding of the transcript data. Then, we adhered to Miles and Huberman’s (1994) approach to analyze qualitative data. First, we conducted data reduction by only focusing on engagement-related information. After that, we employed thematic analysis with a deductive approach. Specifically, the interview data were examined recursively and coded along with three themes of engagement (affective, behavioral, and cognitive) and the codes under the three themes such as their attitudes, revision strategies, and cognitive/metacognitive operations. For example, a participant in TG expressed that she recalled what previous teachers had taught to address teacher feedback on developing ideas insufficiently. We initially coded it as retrieval of previous knowledge, a type of cognitive operations and then assigned it into cognitive engagement. Finally, we made cross-case comparison and interpretation of the data. The coding process was also open to new themes that might appear in the transcripts so as to avoid missing some important themes.

To maintain the trustworthiness of coding, the same PhD candidate was invited as a co-coder. Roughly 20% of interview data were selected and coded independently by the co-coder and the first author. Any disagreements were addressed through discussion.

Results

Teacher Feedback in TG and CG

To contextualize the findings of our study, we analyzed and compared teacher feedback for the participants in TG and CG from writing tasks 1–5. According to Table 2, the TG participants received much more global feedback than local feedback from the same teacher. Among the global feedback, the teacher paid more attention to issues in content than those in organization. Such findings suggest that in AWE-teacher feedback, teacher feedback focused on global aspects of writing rather than local issues.

Table 2 Teacher feedback in TG

In comparison, teacher feedback in CG differed from that in TG conspicuously. As Table 3 shows, the CG participants predominantly received local feedback. This means that the teacher showed much concern with local issues than global ones in feedback provision. In this sense, for teacher feedback alone, teachers appeared to concentrate their attention on error correction.

Table 3 Teacher feedback in CG

TG and CG Students’ Engagement

Descriptive statistics for student engagement from the three perspectives are presented in Table 4. In terms of affective engagement, no significant difference was found between the two groups (p = 0.267). According to Table 4, TG and CG had deep engagement with integrated feedback and teacher feedback, respectively. Such affective engagement was evident in the qualitative data.

Table 4 Descriptive data of student engagement

In the interviews, the participants had a positive attitude towards the feedback they received and spoke highly of it. As Li (TG) noted, “I find the combination of AWE feedback and teacher feedback useful, since this feedback mode harnesses the advantages of AWE feedback and teacher feedback, ensuring both the timeliness and authority of feedback, which can improve its efficacy”. Another participant Juan noted that in the integrated feedback condition, AWE feedback and teacher feedback focused on local and global aspects of writing, respectively. Thus, she received both local and global feedback, which benefited her writing proficiency considerably.

In CG, the major reason for their favorable attitude was the reliability and authority of teacher feedback. This can be illustrated by Xu’s explanation:

I was really grateful to the teacher’s time and energy to provide feedback. I believed that teacher feedback was authoritative and more reliable than other sources of feedback.

Despite the general positive attitude, TG and CG participants reflected on and identified the problems with the two feedback modes. For example, the TG participant Shi pointed out that AWE feedback in the integrated mode sometimes was inaccurate and unreasonable, which might mislead students. As the CG participant, Zhao indicated the limitation of teacher feedback as follows:

As an English major student, I believed that a good piece of writing was not only a cluster of error-free sentences, but contained well-developed ideas, strong logic, and reasonable organization. However, the teacher mainly focused on grammatical errors when providing feedback, paying little attention to other dimensions of writing.

According to Zhao, the teacher’s feedback practice was not in line with her belief about English writing. In other words, the teacher’s feedback, which predominantly targeted linguistic errors, seemed to have little help for her to achieve a balanced development in writing.

From the behavioral perspective, the between-subject comparison revealed that there was a significant difference between the two groups in terms of behavioral engagement (p < 0.001, d = 1.25). This result suggests that participants in TG had more intensive engagement with the integrated feedback, which aligned with the qualitative findings. According to the interview data, the students in CG (e.g., Lin, Yan, Zhao) mainly deployed two revision operations: Correction and no correction. In contrast, the TG peers exhibited more operations. Shi was a case in point. As she articulated, in the AWE-teacher feedback condition, she received much feedback on local and global issues, so she adopted various revision operations such as correction, substitution, addition, and rewriting.

In terms of revision strategies, CG students (e.g., Xu, Zhao, Huan) reported that they sometimes asked the teacher for help to address feedback and facilitate their revisions, as they explained that the teacher focused on grammatical errors, so in most cases it was not challenging for them to address such errors by themselves. In contrast, facing AWE-teacher integrated feedback, the TG participants (e.g., Jia, Shi, Hong, Juan) actively turned to more sources of external assistance including teachers, classmates, and other resources such as textbooks, dictionaries, and the internet. Furthermore, these students appeared to make the best use of outside help. For example, Hong shared her revision experience:

In writing task 3, I received a feedback point, which required me to add a causal connective to realize cohesion. I consulted a textbook to address it. Moreover, I accumulated and memorized those unfamiliar words and phrases that denote cause and effect in order to expand my knowledge base.

Similarly, Juan explained how she employed the external resources:

In writing tasks 3 and 4, I made errors on attributive clauses, which was identified by the feedback. Thus, I referred to a grammar book to deal with the feedback. During the process, I further reviewed the different types of clauses and took notes to differentiate them.

According to the excerpts, it seems that Hong and Juan used the feedback to inform their learning rather than resolve immediate problems in writing. In this sense, they did not regard addressing feedback as text enhancement but considered it as a learning process, which would contribute to their English learning in the long run.

As regards cognitive engagement, the independent samples t-test revealed a significant difference between TG and CG (p < 0.001, d = 1.46). In this sense, TG expended greater cognitive effort than CG to address the feedback they received. Thus, while CG had relatively superficial cognitive engagement with teacher feedback (M = 2.89), TG engaged with the integrated feedback substantially (M = 3.54). The quantitative results are echoed by the qualitative data. Regarding the awareness of feedback, the students in both TG and CG responded that it was not difficult to notice feedback and understand the majority of feedback. However, the participants admitted that it was somewhat demanding for them to understand some feedback. As Lin (CG) replied, she found it a little difficult to understand some feedback on sentence structures, particularly the indirect feedback on such issues. Hong in TG responded that she had difficulty in interpreting and understanding feedback on content and organization, especially when the issues were just highlighted without explanations.

More importantly, the two groups of students differed significantly in terms of using cognitive/metacognitive strategies. Based on the interview data, it appeared that CG invested limited cognitive effort. Zhao’s words were very typical. As she explained, since the teacher feedback mainly addressed linguistic errors and the teacher had a good mastery of grammatical knowledge in English, the teacher feedback was highly reliable. In this situation, she did not deploy many cognitive/metacognitive operations to check the accuracy of feedback and just incorporated it in revision. This means that she appeared to follow teacher feedback technically.

Students in TG utilized different cognitive/metacognitive strategies to regulate their mental effort and deal with the integrated feedback. Firstly, four students (Li, Shi, Jia, Juan) noted that they conducted ongoing analyses and evaluations of the feedback. Specifically, they analyzed the underlying rationales of the feedback and figured out how to address the issues according to the feedback. Also, the participants carefully assessed the accuracy of AWE feedback. As Jia elaborated:

AWE feedback tended to be fallible, so I had to analyze and evaluate it to see whether it was in line with what I wanted to express. If yes, I adopted it.

Another cognitive strategy was memorizing, which was mentioned by three students (Jia, Shi, Hong). Faced with the metalinguistic explanations about her errors provided by Pigai, Jia memorized them to have a deep insight into the nature of errors and avoid them in the follow-up writing tasks. Likewise, in the interview, Hong described her revision experience such as taking notes to memorize the synonyms and different expressions offered by Pigai so that she could enlarge her vocabularies and improve the quality of English expressions in writing.

Moreover, TG students (e.g., Li, Jia, Juan) connected addressing feedback with what their teachers had taught. Specifically, when getting stuck in revision, they retrieved prior knowledge to improve the quality of their writing. Li served as an example. In writing task 3, she was offered a feedback point on insufficiently developed ideas. To remedy such a problem, she added a comparison in revision.

In writing course, the teacher had informed us of different strategies to develop ideas, one of which was making comparison.

Finally, all the five interviewees from TG employed the strategy of planning. In the AWE-teacher integrated feedback condition, the students did not receive AWE feedback and teacher feedback simultaneously so that they planned the revision focus at different times. For example, the interview data showed that Hong planned and determined what to revise at different stages. Specifically, she mainly corrected linguistic errors after AWE feedback and focused on global issues after teacher feedback. The similar metacognitive operation can be observed in Shi’s revision experience, in which she regulated her revision process distributing her attention to different areas of writing in different drafts of writing.

According to the above responses, the participants in TG did not treat the feedback passively. Instead, they exercised their agency and exerted great cognitive effort to process feedback and facilitate their revisions.

TG and CG Students’ Writing Performance

Descriptive data for content, organization, vocabulary, and language use are presented in Table 5. The independent samples t-tests showed that no significant differences were detected in the four dimensions of writing performance between the two groups in the pretest.

Table 5 Descriptive data of the sub-scores over time

In our study, we conducted both within-subject and between-subject comparisons to examine TG and CG students’ writing performance. As for the within-subject comparisons, a series of paired samples t-tests revealed that the students in TG experienced significant improvements in content (p < 0.001, d = 2.21), organization (p < 0.001, d = 2.43), vocabulary (p < 0.001, d = 1.61), and language use (p < 0.001, d = 2.04) from the pretest to the posttest. This means that TG students improved their writing performance significantly. However, no significant differences were found in CG between the pretest and the posttest in the former three aspects of writing. As an exception, language use improved significantly after the treatment of teacher feedback (p = 0.017, d = 0.68).

With regard to between-subject differences, the results of independent samples t-tests demonstrated that the students in TG outperformed those in CG in the posttest in terms of the four areas (content: p < 0.001, d = 1.13; organization: p < 0.001, d = 2.17; vocabulary: p = 0.013, d = 0.75; language: p = 0.025, d = 0.61). Interestingly, while the within-subject comparison reported that there was a noticeable improvement of language use in CG from the pretest to the posttest, TG’s better performance in language than that of CG deserved attention. The between-subject comparisons suggested that AWE-teacher integrated feedback had more beneficial effects than teacher feedback in promoting L2 learners’ performance of different writing dimensions.

Discussion

Our study examined how L2 learners engaged with AWE-teacher integrated feedback and how their engagement influenced their writing performance in different dimensions. The data revealed that they displayed profound engagement with the integrated feedback and the engagement contributed to their writing performance in content, organization, vocabulary, and language use. These results showed a close relationship between student engagement and writing performance and lent an empirical support to the claim that student engagement is a significant factor mediating the efficacy of feedback (Cheng & Liu, 2022; Zhang, 2022). As some prior studies have claimed (e.g., Mao & Lee, 2023; Zhang & Hyland, 2018, 2022), the provision of feedback does not necessarily bring about students’ writing development, but it is the engagement that achieves this.

TG and CG Students’ Engagement

Our study investigated how TG and CG students engaged with the integrated feedback and teacher written feedback affectively, behaviorally, and cognitively. From affective perspective, no significant difference was found between the two groups. That is, TG and CG had a positive attitude toward AWE-teacher feedback and teacher feedback, respectively. They further justified their attitude. One reason offered by TG students was that they received both local and global feedback, the former from AWE system and the latter from the teacher, which benefited their writing performance. Together with Table 2, this result demonstrated that in the integrated condition, AWE feedback freed teachers from concentrating on local errors, saving their time and energy to address global issues of writing (O’Neill & Russell, 2019; Thi & Nikolov, 2022).

CG students attributed their supportive attitude to the authority and reliability of teacher feedback, which has been observed in previous studies (Cheng & Liu, 2022; Zheng & Yu, 2018). In these studies, L2 learners considered teacher feedback authoritative and were willing to receive teacher feedback. Our result was not surprising due to the research context. Deeply rooted in Confucian philosophy, the Chinese educational system emphasizes the role of teachers.Teachers are regarded as authority figures and are responsible for imparting content knowledge and transmitting ethical values. Thus, students are required to show respect for their teachers and are not allowed to challenge them (Cheng et al., 2021; Jin & Cortazzi, 2006). With such a well-established cultural norm, students formed the tenacious belief in the authoritative role of teachers and trusted teacher feedback in the Chinese EFL context.

Behaviorally, the quantitative and qualitative data showed that TG exhibited deeper engagement than CG. According to the interviews, CG participants did not have intensive behavioral engagement with teacher feedback, deploying two revision operations and utilizing limited revision strategies. In comparison, TG participants’ engagement with the integrated feedback was more assiduous. As elaborated in the interviews, participants  adopted a variety of revision operations including correction, substitution, addition, and rewriting. The deployment of more revision operations was probably associated with the feedback they received. In TG, students received feedback focusing on both global and local issues, so they needed different operations to address local and global feedback. As Zhang and Hyland (2022) stated, L2 learners’ revision operations vary with feedback focus.

Moreover, TG students employed different external resources such as teachers, peers, the internet, and textbooks to address feedback and facilitate their revisions, as the case in Cheng and Liu (2022), and Fan and Xu (2020). More encouragingly, it seemed that the students in TG took advantage of the outside assistance to inform their learning rather than merely correct texts. This suggests that the participants went beyond the text level and used the affordances of feedback to facilitate their learning, tapping into the long-term potential of feedback to further their learning (Yang & Zhang, 2023).

In terms of cognitive engagement, our study showed that while CG engaged with teacher feedback at a surface level, TG experienced profound engagement. Based on the interviews, we can see that the two groups displayed obvious differences in the employment of cognitive/metacognitive strategies. As regards CG, the participants did not deploy many cognitive/metacognitive operations to analyze and assess the accuracy of teacher feedback because of their trust in the feedback. It appears that a supportive attitude toward feedback does not naturally lead to deep cognitive engagement, indicating the inconsistency between them. This result echoes the speculation that positive affective engagement (i.e., trust) may constrain L2 learners from proactively using cognitive/metacognitive strategies to process feedback (Koltovskaia, 2020; Ranalli, 2021).

In contrast, TG students employed much more strategies including ongoing evaluations and analyses of feedback, memorization, activation of prior knowledge, and planning. The use of these cognitive/metacognitive strategies suggested that L2 learners exercised cognitive effort to address feedback, thus having substantial cognitive engagement. Interestingly, considering that TG participants received local feedback and global feedback from Pigai system and the teacher, respectively, they were offered a large amount of feedback. However, they were not daunted by the revision task, since they employed various cognitive/metacognitive strategies in their revision. This finding may be relevant to our operationalization where students received the two sources of feedback separately. This helped the students break down their revision into different phases, which made the revision process less demanding and energy-consuming (Zhang & Hyland, 2022). As the TG participants responded in the interviews, they planned the revision focus in different drafts after AWE feedback and teacher feedback.

To summarize, the above discussion demonstrated that student engagement is a complex and nonlinear process (Fredricks et al., 2004). In our study, while both TG and CG students had positive, affective engagement, they showed different behavioral and cognitive engagement with AWE-teacher feedback and teacher feedback, respectively. From ecological perspective, feedback provides students with learning opportunities (i.e., affordances). However, the affordances cannot be realized automatically and require students to perceive and use them (van Lier, 2004). The process where students perceive and act upon the learning opportunities afforded by feedback is regarded as students’ engagement with feedback (Bitchener & Storch, 2016; Han, 2019). In general, student engagement is influenced by learners (e.g., language proficiency, learning styles, motivation, personal beliefs), and context (from textual to sociocultural levels) (Han & Hyland, 2015; Zhang & Hyland, 2018). To promote student engagement, it is crucial to align learner factors shaping learners’ agency with contextual factors mediating the learning opportunities embedded in context (Han, 2019). For CG, the students’ firm belief in teachers’ authority enabled them to believe in teacher feedback, but may constrain their willingness (i.e., agency), which prevented them from exerting cognitive effort to interpret and evaluate the teacher feedback that predominantly targeted language. Thus, the learning opportunities afforded by the teacher feedback in CG was willingness-inappropriate, which led to the superficial behavioral and cognitive engagement.

In terms of TG, the integration of AWE feedback and teacher feedback generated a positive synergistic effect. Specifically, the integrated feedback could provide L2 learners with new learning experiences, expanded audience, and balanced feedback (Zhang & Hyland, 2022). Compared with teacher feedback alone, the integrated feedback offered more diverse writing and revision experiences, which may carter for individual students’ different learning needs and then improve their agency, willing to invest effort in acting upon the feedback. Thus, TG students’ agency seemed to be in alignment with the learning opportunities by the integrated feedback, which prompted them to perceive and use the affodances to facilitate their learning (Han, 2019). From this perspective, it is the combination of the two kinds of feedback rather than any single one source of feedback that empowered students to participate in feedback actively.

TG and CG Students’ Writing Performance

TG students enhanced their performance in both content and organization over time and they outperformed their CG peers. The results are not surprising. In our study, TG participants not only received feedback on content and organization but also engaged themselves with the feedback proactively. Thus, they perceived, used, and realized the affordances by global feedback to facilitate their writing performance. Compared with TG students, CG participants, according to Table 3, were offered a few feedback points on global issues. Thus, they had limited opportunities to engage with global feedback and reap its benefits. Without adequate feedback on global aspects of writing, it is challenging for L2 learners to identify and resolve problems in content and organization, let alone make progress in these areas (Lee, 2017). In addition, CG participants were offered a large number of feedback points on language. Commonly, L2 learners regard revision as error correction, in which they are devoted to correcting linguistic errors (Chen & Zhang, 2019). In this situation, they may concentrate on linguistic errors with little attention to global issues in their writing.

While TG improved the performance in both content and organization, it appears that the intervention was more effective for organization (organization: d = 2.43; content: d = 2.21). Understandably, it is relatively easy for students to improve organization of writing, especially the macrostructure and cohesion. They only needed to follow the “introduction-body-conclusion” format to structure their writing and add conjunctions such as “moreover”, “in addition”, and “although” to realize cohesion. Furthermore, the macrostructure of writing and the devices to achieve cohesion do not vary with topics. In this situation, the participants had opportunities to transfer the knowledge they have learnt from feedback on organization to new writing tasks. In contrast, generating and developing ideas are more taxing (Faigley & Witte, 1981), which would rely on L2 learners’ personal topic knowledge. Thus, it consumes L2 learners’ more cognitive resources to improve the content of writing.

In our study, TG participants experienced a notable improvement in vocabulary and language use. Regarding vocabulary, AWE feedback corrected errors in word choice and provided students with synonyms and collocations. In TG, students engaged with such feedback, which empowered them to improve lexical accuracy and variety (Xu & Zhang, 2022). For language use (i.e., grammar), it is unsurprising that TG and CG enhanced this aspect from the pretest to posttest, since both of them received much feedback on grammar. However, TG showed better performance than CG in language use. Unlike teacher feedback, Pigai not only provided students with corrections but also offered them metalinguistic explanations about them. The explanations may contribute to student engagement, prompting them to raise their awareness of rules and developing their explicit knowledge. Thus, the explanations facilitated the acquisition of grammar.

Our study found that the AWE-teacher integrated feedback prompted L2 learners’ engagement and the enhanced engagement improved their writing performance. Such results were related to the characteristics of feedback. First, in the integrated feedback context, AWE feedback corrected students’ errors in language, which prompted teachers to liberate themselves from giving such feedback and to focus on global aspects of writing. Thus, students can receive local and global feedback from different sources at different stages. Second, such feedback helped L2 learners realize the importance of revision. In L2 writing classrooms, teachers tend to employ a product-oriented writing approach in which revision is underplayed (Zhang, 2022; Zhang & Cheng, 2021). The integration of AWE feedback and teacher feedback established a writing-feedback-revision process, where students needed to revise their writing after AWE and teacher feedback and generated multiple drafts. Through this experience, L2 learners may abandon their previous belief that writing was a one-shot task (Zhang & Hyland, 2022). Third, the addition of AWE feedback into traditional teacher feedback brings something new to classroom pedagogy. The integrated model involved human–machine interaction and wider readership (teacher and machine), which may afford L2 learners new and fresh experiences. With the experiences, they may arouse students’ motivation and agency to take up and use feedback, which could facilitate their learning of writing. Based on these features, AWE-teacher integrated feedback is a promising pedagogical approach. For one thing, it is effective in terms of promoting L2 learners’ engagement and their writing performance. For another, it is helpful for teachers, since it reduces teachers’ burden of giving feedback.

Limitations and Implications

Unsurprisingly, this study is not without limitations. First, due to the practical constraints, our study recruited English major students as the participants. It was understandable that English majors might have more extensive training in writing and greater familiarity with English and that there were much more female participants than male ones. The participants’ characteristics might have, to some extent, influenced the results of our study. Another limitation was that we only used self-reported data to explore L2 learners’ behavioral engagement with feedback, which did not realize data triangulation. As a result, future studies need to employ students’ writing samples and stimulated recall interviews to gather different sources of data to have a deepened insight into their behavioral engagement.

Despite the limitations, our study documented the great potential of AWE-teacher integrated feedback, so teachers can apply this new feedback mode in technology-empowered teaching contexts, innovating L2 teachers’ feedback practices and refreshing L2 learners’ feedback experiences. In this mode, AWE feedback addresses students’ linguistic errors, while teacher feedback focuses on global issues. Doing so, for one thing, can reduce L2 teachers’ heavy burden of providing feedback and increase their efficiency of feedback provision. For another, it enables L2 learners to receive feedback on both local and global aspects of writing, which could contribute to students' deep engagement and overall L2 writing proficiency. This means that the integrated feedback produces a positive synergy and can augment the advantages of AWE feedback and teacher feedback, prompting students to benefit from feedback greatly.

To facilitate the use of the integrated feedback, several recommendations can be provided to improve the pedagogical value of such feedback. First, given that AWE feedback has several pitfalls such as fallibility and one-size-fits-all, L2 teachers should inform their students of these limitations so that the students can avoid over-reliance on AWE feedback. Instead, L2 learners need to hold a critical attitude towards AWE feedback in this integrated mode, exerting agency to examine whether the AWE feedback was appropriate carefully in uptake of the feedback.

Moreover, since the mode combines AWE feedback and teacher feedback, L2 learners receive a larger amount of feedback than any single one source. In this situation, it is necessary and essential for L2 instructors to teach their students how to take advantage of the feedback and make revisions. Specifically, teachers can teach their students some cognitive and metacognitive strategies to promote their revisions, including how to plan their revision procedures, how to determine the revision focus, and how to monitor their revisions. Additionally, teachers can ask the students who do well in revision to share their experiences of addressing feedback such as what resources they employ to process feedback and how they use these resources to deal with those challenging feedback. With teachers’ instruction and peers’ experiences, L2 learners have opportunities to understand how to address the feedback from different sources more effectively and efficiently, by which they could use the feedback to inform their subsequent learning.