Introduction

Over the last decades, feedback activities have rapidly gained in popularity to enhance students’ writing performance in educational settings. An example of such a feedback activity is the situation in which dyads of students swap their first drafts of a writing assignments and provide (i.e., give) and process (i.e., receive) peer feedback in order to help their peer to improve their writing. As a result of the improvement-oriented nature of feedback activities, errors, such as spelling errors, structure errors and argumentation errors in writing, play a central role in feedback activities (e.g., Aben et al., 2019; Fong et al., 2018; Timms et al., 2016).

In order to enhance our understanding of this central role of errors in feedback processes, Aben et al. (2019) propose a theoretical model that visualizes what the process of dealing with errors may look like while providing feedback and processing feedback. Their model suggests that the dealing with errors while providing and processing feedback contains several cognitive sub-phases, such as error identification (i.e., locating an error), error decoding (i.e., labelling an error as an error of a particular kind; Akin et al., 1970), and error evaluation (i.e., thinking about how an observed error may be improved; Cowan, 2010; Tai et al., 2018). These sub-phases describe cognitive activities that a student may perform in the act of providing feedback on a text written by, for example, a peer.

Although Aben et al. (2019) seem to imply the existence of a universal pattern containing several cognitive sub-phases, it is still uncertain whether the cognitive sub-phases they theoretically distinguished, could also empirically be observed. That is, empirical research investigating what the process of dealing with errors during the provision and processing of feedback looks like, is scarce (Máñez et al., 2019). Therefore, the first aim of the current study is to investigate the extent to which the cognitive sub-phases as distinguished by Aben et al. (2019) in the process of dealing with errors while providing and processing feedback could empirically be observed.

In addition, it is likely that the appearance of this process is affected by factors describing the relationship between people involved in feedback activities (e.g., Aben et al., 2019; Esterhazy & Damşa, 2019; Winstone, 2017; Yu, 2021). For example, students’ prior experiences with each other, may affect their perception of the quality of a fellow student’s text when providing feedback, and the perception of the adequacy of provided feedback when processing feedback (Strijbos et al., 2010; Strijbos & Müller, 2014; Winstone et al., 2017). This, in turn, may affect the way and extent to which they act upon errors when providing feedback, and act upon feedback that identified errors when processing feedback. Therefore, the second aim of the current study is to investigate the extent to which the process of dealing with errors while providing and processing feedback is affected by interpersonal perceptions.

The appearance of the process of dealing with errors during the provision and processing of feedback, and the extent to which this process relates to interpersonal perceptions, is particularly relevant when explaining the learning gains of feedback processing (Aben et al., 2019; Esterhazy & Damşa, 2019). That is, a better understanding is necessary to improve our comprehension of how feedback is being processed and how learning gains of feedback activities can be explained, and eventually improved (Handley et al., 2011).

The topic was demarcated in three ways. First, we opted to investigate the process of dealing with errors in the context of the provision and processing of peer-feedback, because in particular when students provide and receive peer-feedback, interpersonal perceptions appear to affect the feedback processing (e.g., Alqassab et al., 2018b; Berndt et al., 2018; Strijbos et al., 2010). This might be because students may be more likely to doubt the expertise of a peer (horizontal constellation), than, for example, a teacher (vertical constellation) (Strijbos & Müller, 2014). Second, we studied the processes of peer-feedback provision and processing in the domain of argumentative writing, as this is a domain where peer-feedback is frequently used to enhance students’ writing skills and performance (Double et al., 2020; Hoogeveen & Van Gelderen, 2013; Huisman et al., 2019). And third, we focused particularly on 16- to 18-years-old secondary education students (grade 11, pre-university track) for two reasons: (a) social comparisons and personal relationships are especially influential in secondary education and the adolescent stage of development (Sebastian et al., 2008; Van der Aar et al., 2018), and (b) argumentative writing is particularly relevant for these students (grade 11, pre-university track) because they will be extensively confronted with argumentative writing tasks in the remainder of their academic career.

Theoretical framework

The role of errors in feedback provision and processing

Errors, often defined as deviations from a norm (e.g., Gloy, 1987; Oser & Spychiger, 2005; Rach et al., 2012; Spychiger et al., 2006), can be considered as fundamental prerequisites for learning (Kapur, 2016). After all, students that process feedback are likely to be confronted with (a) performance elements perceived as erroneous or improvable by the feedback provider, and (b) feedback elements identifying and criticizing these performance elements. Since error-making and problem-solving are crucial for learning, and feedback is likely to function as a scaffold to reduce the gap between the current and a desired performance (Ramaprasad, 1983), errors are viewed as opportunities for learning and play a central role in the provision and processing of feedback (Fong et al., 2018).

The theoretical model by Aben et al. (2019) assumes the existence of cognitive sub-phases in the process of dealing with errors while providing and processing feedback (see Fig. 1). Regarding the feedback provision phase, Aben et al. (2019) take a performance that potentially contains errors as starting point. For example, in the context of (peer-)feedback on a written text, feedback providers may identify an error, which refers to the moment an individual observes that a text element does not meet a norm. Subsequently, feedback providers may decode the identified error, i.e., assign meaning to the error, by labelling the error as an error of a particular kind (e.g., grammar error, spelling error, argumentation error). Third, this decoding may lead to the evaluation of an error, which refers to the assessment of what characteristics make a text element erroneous and/or to the thinking about how an observed error may be improved. Error evaluation may be followed by the encoding of a feedback remark on the specific error, i.e., the translation of the interpretation and evaluation of the error into the production of verbal and/or nonverbal signs. Finally, this remark may be sent to the feedback recipient in the form of a feedback remark.

Fig. 1
figure 1

A simplified display of Aben et al.’s (2019) model visualizing the role of errors in feedback provision and processing

Feedback recipients, on the other hand, first have to read a provided feedback remark in relation to their written text. Thereafter, they have to decode the feedback remark, i.e., assign meaning to it in order to interpret the feedback (Akin et al., 1970). Similar to the feedback provision phase, the decoding of the feedback remark may lead to the evaluation of the feedback remark, which refers to the activity of deciding to what extent one agrees with the feedback and/or the potential acknowledgement that one made an error. Finally, an output (e.g., correction of an error of a particular kind, such as a grammar error, spelling error, or argumentation error) may be encoded, potentially (partly) based upon the feedback remark. This output may express disagreement with the feedback or may show the intention to act upon the feedback and to correct the error. If the feedback recipient acts upon feedback, this leads to a revised performance, which ideally implies that the initial erroneous performance is (partially) rectified (Aben et al., 2019).

The theoretical model by Aben et al. (2019) displays similarities with the feedback model by Timms et al. (2016) that also centralizes the role of errors. This model visualizes how learners process feedback that may be automatically provided by digital learning environments. They distinguish the activities ‘Learning environment gives feedback’, ‘Learner decodes feedback’, ‘Learner makes sense of feedback’, and ‘Learner corrects error’. These activities refer to similar activities as to which Aben et al. (2019) refer as, respectively, ‘Feedback Remark’, ‘Decoding Feedback Remark’, ‘Evaluation Feedback Remark’, and ‘Encoding Output’. Differences between the two models with respect to the visualization of activities are that (1) Aben et al. (2019) focus on dyads providing and processing feedback, whereas Timms et al. (2016) focus on students processing feedback provided by a computer, (2) Aben et al. (2019) include a visualization of the feedback provision phase, whereas Timms et al. (2016) do not, and (3) that Timms et al. (2016) distinguish a ‘Feedback Identification’ activity in the feedback processing phase, whereas Aben et al. (2019) do not.

The existence of cognitive sub-phases in the process of dealing with errors while processing feedback was also empirically observed by Ahmadian et al. (2019). They collected think-aloud utterances related to dealing with errors of university students while they processed teacher feedback on writing performance, in the context of second language acquisition. The analyses indicated that their students displayed a recurrent pattern of dealing with grammatical errors. This pattern existed of the subsequent phases of reading the sentence in their own text (e.g., “…type of diversity which reflect…”), reading the feedback (“third person”), encoding a revised output (“which reflects”), and explaining the grammatical error (“I should have used ‘s’ here for the third person”). This pattern differs from the sequence of sub-phases as hypothesized by Aben et al. (2019) and Timms et al. (2016), in the sense that evidence of a feedback decoding phase was not found by Ahmadian et al. (2019), and that in their results the evaluation sub-phase takes the form of a retrospective error explanation.

Different configurations of the process of dealing with errors while providing and processing feedback may respond to the type(s) of errors that must be dealt with (Ahmadian et al., 2019; Kim & Bowles, 2019). Kim and Bowles (2019) performed a think-aloud study in the context of second language acquisition, that aimed to capture feedback processing. They investigated, among other matters, whether the type of error identified by the feedback provider related to feedback recipients’ depth of feedback processing. The results indicated that students took more time to process, interpret, and evaluate the feedback on errors related to their text’s higher-order concerns (e.g., content, argumentation, and paragraph structure) than on errors related to their text’s lower-order concerns (e.g., spelling, grammar, and interpunction).

The central role that errors play in feedback provision and feedback processing may also be reflected in the way learners deal with pluses. In the current study, pluses are defined as performance elements perceived by either the feedback provider or feedback recipient as meeting or surpassing a norm. For example, Máñez et al. (2019) compared the time that students took to process automatically generated feedback on errors with the time students took to process automatically generated feedback on pluses. During a reading task, students had to answer multiple-choice questions and select the relevant textual information on which they based their answer. They found that students spend more time to process the feedback that was provided when they did not execute the task successfully (i.e., errors) than to process the feedback that was provided when they did execute the task successfully (i.e., pluses). According to Máñez et al. (2019), these results may suggest that students are (intuitively) aware of the importance of understanding errors in feedback processing in order to enhance their skills and performance.

Interpersonal perceptions and peer-feedback processes

The process of dealing with errors in peer-feedback may not only contain cognitive sub-phases, but the order in which they occur and/or whether they occur at all may be affected by interpersonal perceptions (Aben et al., 2019; Esterhazy & Damşa, 2019; Winstone et al., 2017). Interpersonal perceptions are views of others that are gradually shaped over time through past experiences of the actor with the same partner (Gibson, 1969; Upshaw, 1978). For example, students that process peer-feedback are members of social constellations, such as classrooms, for extended periods of time. Hence, they are likely to attend group discussions, to collaborate with the same peers on learning tasks, and they could be aware of their peers’ skills or grades. Via their joint engagement in classroom activities, students collect pieces of information about their peers, which may either consciously or unconsciously, contribute to the composition of a mental representation of their peers’ expertise—as well as their own expertise in comparison to their peers’ expertise— and may, thus, potentially affect feedback processing (Aben et al., 2019; Strijbos & Müller, 2014; Winstone et al., 2017).

Although, to our knowledge, as of yet no studies have investigated the effects of interpersonal perceptions on the process of dealing with errors in particular, at least three interpersonal perceptions received considerable attention regarding their role in peer-feedback provision and/or processing in general. First, several studies found that friendship could lead to biases in peer-feedback or peer-grading (e.g., Harris & Brown, 2013; Cheng & Warren, 1997; Panadero et al., 2013). Although one could argue that friends may be less likely to be critical to a friend than to a non-friend, most studies found positive effects of friendship. For example, students may invest more effort in providing peer-feedback when they consider the peer a friend than when they do not consider the peer a friend (Finkelstein et al., 2017). This may be explained by the idea that students feel psychologically safer and therefore feel more comfortable to share their thoughts: trust can contribute to a critical analysis of a peers’ performance (Panadero et al., 2013). Also students themselves report that they feel more comfortable and feel that they can be more honest when providing peer-feedback to peers they know well than to peers they do not know well (Van Heerden & Bharathram, 2021).

Second, the effort peers devote to providing and/or processing peer-feedback may play a role in peer-feedback processing. For example, Timmers et al. (2013) found that the effort invested in feedback processing was predicted by an individual’s task-value beliefs, such as the importance of and interest in the task. As these task-value beliefs are situation and consequently task-dependent (Eccles & Wigfield, 2002), this may imply that the perceived effort that a peer has devoted to providing feedback, as perceived by the feedback recipient, may also affect the recipient’s peer-feedback processing.

Third, peer-feedback activities may be affected by students’ perceptions of their peers’ skills (e.g., Berndt et al., 2018; Patchan & Schunn, 2015; Strijbos et al., 2010). For example, Berndt et al. (2018) manipulated perceived language skills by giving their students scenarios containing essays written by a fictional student who was provided with fictional peer-feedback. They found that feedback provided by a peer with low language competence was perceived as less adequate than feedback provided by a peer with high language competence. Also Aben et al. (2023) manipulated the perceived language skills of peer-feedback providers. The results showed that the perceived language skills of the feedback provider significantly related to the proportion of textual revisions made by students based on feedback related to writing style: students revised their text more often in line with this type of feedback when they thought it was provided by a peer perceived to have stronger language skills than their own than when they thought it was provided by a peer perceived to have weaker language skills than their own.

The current study

Whereas errors play a central role in several conceptualizations of feedback models, empirical research investigating what the process of dealing with errors looks like, and how interpersonal perceptions affect the way in which students deal with errors during feedback processes is currently scarce. As far as we know, previous studies investigated the role of errors during feedback processing, and in the context of feedback provided by a teacher or a digital learning environment (i.e., Ahmadian et al., 2019; Timms et al., 2016). What the process of dealing with errors looks like in the context of peer-feedback provision and processing, and how it is affected by students’ interpersonal perceptions, is still unknown. Therefore, this study adds to the literature by investigating the following two research questions among 16- to 18-years-old secondary education students (grade 11, pre-university track) in the context of argumentative writing:

(1) To what extent can the cognitive sub-phases of identifying, decoding, evaluating, and encoding be distinguished during peer-feedback provision, and the cognitive sub-phases of reading, decoding, evaluating, and revising be distinguished during peer-feedback processing?

(2) To what extent is the process of dealing with errors while providing and processing peer-feedback affected by the interpersonal perceptions friendship, perceived skills, and perceived effort?

Method

Design

We conducted a mixed-methods study in which explorative data of think-aloud protocols and semi-structured interviews were collected. Dyads of students provided and processed peer-feedback on argumentative texts while thinking-aloud, and reflected on the processes during an interview afterwards. After the data collection, the think-aloud utterances and interviews were transcribed by two research-assistants, and analyzed using two strands of analysis. First, we applied a quantitative content analysis, and second, a qualitative thematic analysis. Our design contained two points of integration of the quantitative and qualitative strands (Guest, 2012). The first point was in the analysis, where results from our quantitative analysis partially informed the qualitative analysis. The second point of integration was at the level of interpretation, where we integrated the quantitative and qualitative analyses to distinguish the sub-phases in the process of dealing with errors in the context of peer-feedback provision and processing, and the role of interpersonal perceptions in this process.

Context and participants

Data were collected at a public high school in the north of the Netherlands in the context of the subject ‘Dutch language and literature’, which focuses on writing, reading, presenting and argumentation skills, among other skills. During a period that was devoted to writing skills, the teachers taught the students about the structure (i.e., introduction, body, conclusion), genre characteristics (e.g., defending a standpoint, providing arguments pro and contra the standpoint, rebutting counter arguments) and quality criteria of argumentative texts (e.g., related to coherence, argumentation, content). The students had repeatedly been confronted with argumentative writing before in their academic track, hence it was assumed that they had some basic knowledge about argumentative texts. The lessons contained activities such as reading, analyzing, and discussing argumentative texts, collecting new sources related to a particular topic students want to write about, and making an outline. Additionally, students produced an argumentative text, received peer-feedback, received teacher feedback, and rewrote their first drafts based on the feedback.

Students from all four 11th grade classes at this school (N ≈ 100, age range = 16–18) were asked by their four teachers whether they wanted to participate in a study about peer-feedback. Participation in this study was voluntary. Twelve students from two of the 11th grade classes (class A: n = 7; class B: n = 5) agreed to participate. Among those students were nine boys and three girls (Age: M = 16.6 years; SD = 0.95; range = 16–18 years). They all voluntarily signed a form of active informed consent.

In order to increase the ecological validity, the data collection blended in with the high school’s planning: students provided and processed feedback on the argumentative texts they had to write as part of their regular school activities. Additionally, the data collection took place in the same week in which the non-participating 11th grade students had to hand in their written texts. The method was approved by the ethical committee of the University of Groningen prior to data collection.

Dyad composition, procedure, and instruments

Data collection took place on two research days. On the first research day, students filled out a questionnaire about ‘perceived language skills’. Perceived language skills were measured by asking: “How good are your classmates in the school subject Dutch, compared to you?” The sentence was followed by a list of all participating classmates for whom the students had to indicate whether they perceived the language skills of each classmate as a lot worse than mine (1), a bit worse than mine (2), about as good as mine (3), a bit better than mine (4), a lot better than mine (5), or they could indicate that they did not know who the student was.

Subsequently, we composed six dyads because dyads are a typical constellation for peer-feedback activities in the Dutch classroom. We composed the dyads based on two criteria. The first criterion was that the students in a dyad knew each other, so that an interpersonal relationship could be assumed. Second, in order to optimize variability across dyads, the dyads were composed to represent different combinations of perceived Dutch language skills between their members. Table 1 shows a descriptive overview of the dyads (all names are pseudonymized).

Table 1 Description of Dyads based on the Two Criteria for Dyad Composition

Before the second research day, the students handed in the argumentative texts they had produced. Students could choose to write about one of three topics: (1) language: nature or nurture?; (2) should all Dutch education be in English?; (3) should the literary canon be expanded? On the second research day, two students that together formed a dyad were separately seated in two different rooms with one of two research-assistants, who were advanced Master students in the domain of Pedagogical Sciences. The students simultaneously provided feedback on each other’s draft texts, using a laptop. Subsequently, the provided feedback was exchanged and the students processed the feedback that had just been provided. These sessions were audio-recorded, lasted between 80 and 90 mins per dyad, and took place between four and six days after the students had to hand in their draft text for the writing assignment as part of their curriculum for the subject ‘Dutch language and literature’. The activities took place during two of the students’ regular adjacent school hours. The students’ regular teacher of the subject ‘Dutch language and literature’ was not involved in the data collection.

The research-assistant welcomed the student to the room and instructed students to “provide feedback on [their peer]’s text, to the best of [their] ability, with the aim to help the writer to improve the text”. The research-assistant provided and explained a brief overview with potential quality criteria to provide feedback on (i.e., spelling, writing style, argumentation, text structure; see Supplemental materials, Sect. 1). Data were collected by means of a concurrent non-metacognitive think-aloud procedure rather than a concurrent meta-cognitive think-aloud procedure. This implied that students had to think aloud while providing and processing feedback (non-metacognitive), but did not have to reflect on their thinking (meta-cognitive). A non-metacognitive type of think-aloud procedure was chosen, so that participants did not have to interrupt their thought process to argue their thinking. In this sense, a non-meta-cognitive think-aloud procedure is more valid than a meta-cognitive think-aloud procedure (Ericsson & Simon, 1998).

The students had a practice round of the think-aloud procedure (Bowles, 2010), to become acquainted with providing feedback, using Microsoft Word’s Track Changes and Comment functions. Hereafter, students had 25 mins to provide feedback on their peer’s text (Mtext length = 492 words, SDtext length = 171 words), while explicating their thoughts. Every time they remained quiet for five seconds, the research-assistant asked “what are you thinking at this moment?” (Bowles, 2010; Máñez et al., 2019). After 25 mins, the research-assistant saved the texts with feedback on USB-drives and exchanged them. Then, the students had 25 mins to process the feedback that had been provided by the peer, while explicating their thoughts. The students were told that they had the freedom to ignore feedback if they wanted to, as they remained the owners of their text. They were instructed to revise their text as if they had to hand in the final version at the end of the session.

The session ended with the semi-structured interview, conducted by the research-assistant. Students were asked to reflect on the way they had been affected by the interpersonal relationship with the peer. The research-assistant initially asked non-guided open questions providing the students the opportunity to mention potential effects of the interpersonal relationship themselves, before asking more guided questions. The interview contained six blocks of questions, containing a total of 19 main questions and 16 follow-up questions (see Table 2 for a brief overview, and Supplemental materials, Sect. 2 for the full overview).

Table 2 Brief overview of the questions asked in the semi-structured interview

Data analysis

This procedure resulted in think-aloud transcripts and interview transcripts. Both types of data were analyzed using a quantitative content analysis and subsequently a qualitative thematic analysis.

Quantitative content analysis. Content analysis enables replicable and valid inferences from data to their context (Krippendorff, 2004), and converting qualitative statements into quantitative data (Stemler, 2015). We applied multi-valued coding using Atlas.ti (2018) version 8, implying that a quotation could receive a code from different semantic domains. We distinguished six semantic domains: (1) sub-phases in dealing with errors while providing feedback (identifying, decoding, evaluating, encoding); (2) sub-phases in dealing with pluses while providing feedback (identifying, decoding, evaluating, encoding); (3) sub-phases in dealing with errors while processing feedback (reading, evaluating, revising); (4) sub-phases in dealing with pluses while processing feedback (reading, evaluating, revising); (5) mentioning the dyad partner (‘he/she’, ‘[name dyad partner]’, ‘you’); and (6) effects of interpersonal relationship (friendship, perceived language skills, perceived skills general, perceived effort, no effort) (See Table 3). For the think-aloud transcripts, all six semantic domains were coded. For the interview transcripts, only semantic domain 6 was coded. Semantic domains 1–4 (i.e., dealing with errors and pluses) were not coded because students were not prompted to talk about those domains, as our attempt to prompt them to talk about those domains revealed that there was a mismatch between students’ descriptions of the way they dealt with errors and the behavior they actually displayed in the think-aloud data. In that sense, we deemed the observation of their actual behavior more reliable than their own description thereof.

Table 3 Description of the Semantic Domains and Codes used in the Quantitative Content Analysis

Semantic domain 5 (i.e., the number of times students explicitly mentioned the dyad partner) was not coded for the interview transcripts, as the number of references to the dyad partner is not an informative statistic in an interview context because this amount is guided by questions. Semantic domain 6 (i.e., effects of interpersonal perceptions) was coded to serve as a start point for the subsequent qualitative thematic analysis.

The coding scheme was developed by the first author, and revised after initial trials performed by the first author and the two research-assistants. Two elements of the coding scheme require further elaboration. First, the error decoding and pluses decoding sub-phases could not be coded in the feedback processing phase, because it was problematic to distinguish the reading and decoding phases based upon think-aloud utterances. That is, we could only observe evidence of students reading the feedback, making a valid empirical distinction between the two processes of reading and decoding impossible. Second, we initially had not conceived semantic domain (5) as part of the coding scheme; however, we added this domain because it can be argued that linguistic references to the dyad partner form evidence of feedback providers’ awareness of the feedback recipient, and hence of the interpersonal relationship between the feedback recipient and the self.

A subsample of 16.7% of the complete data set was selected and coded by the first author and the two research-assistants in order to estimate an interrater reliability score. This subsample of 16.7% contained two feedback provision think-aloud transcripts, two feedback processing think-aloud transcripts, and two semi-structured interview transcripts, of six individual students, that were not used in the practice rounds. The interrater reliability between the three coders based on the 16.7% of the data was sufficient for all six semantic domains (Krippendorff’s α range: 0.70 – 0.81). This was also the case when the interrater reliability was calculated between only the two research-assistants, leaving out the first author’s codings (Krippendorff’s α range: 0.73 – 0.83). Subsequently, the two research-assistants coded the remainder of the data independently.

Qualitative thematic analysis. Hereafter, a thematic analysis of the think-aloud data and interview data was performed by the first author and one of the research-assistants. Thematic analysis is a method to identify, analyze, and report patterns within data (Braun & Clarke, 2006). By involving the same coders that were also involved in the coding for the quantitative content analysis, we expanded the array of possible interpretations of the data and enriched the ‘crystal’ with more facets (Plano Clark & Ivankova, 2016). The aims of the qualitative analysis were to expand the results of our quantitative content analysis, and to build a nuanced interpretation of the data by analyzing what aspects of the data were not captured by the quantitative content analysis. That is, on the one hand, we paid attention to the order in which the cognitive sub-phases as coded in the quantitative analysis occurred, and to whether all sub-phases were visible when errors where identified and when feedback was read. On the other hand, we derived patterns from the data that were not informed by the cognitive sub-phases as coded in the quantitative analysis.

The first author and the research-assistant re-listened to the audio-recordings and noted down their thoughts related to the process of dealing with errors and effects of interpersonal perceptions, not only when these thoughts were based upon students’ utterances during the feedback provision, feedback processing and interview activities, but also when these thoughts were based upon students’ utterances before, after or in between those activities. The utterances before, after or in between activities were also included, as students also on those moments produced utterances that contained information about effects of interpersonal perceptions.

The first author and the research-assistant first re-listened to two of the twelve recordings. The first author produced a list of four themes that were observed while re-listening to the audio-recordings: (1) Working procedure, (2) Occurrence of cognitive sub-phases in the process of dealing with errors, (3) Comparison to quality criteria, and (4) Effects of interpersonal perceptions. The first author and the research-assistant discussed how the patterns manifested themselves within the two recordings. Hereafter, the first author and the research-assistant independently re-listened to the remaining ten audio-recordings and noted down their thoughts. In the end, they both categorized for each of their own thoughts to what of the four themes it related. The thematic analysis ended with a two-hour discussion between the first author and the research-assistant, identifying and describing the main patterns within the four themes. The results of this discussion are reported in the results section.

Crystallization. Both analysis methods (i.e., quantitative content analysis and qualitative thematic analysis) had a complementary value in answering the research questions. Whereas the quantitative content analysis provided insights into the extent to which sub-phases of the process of dealing with errors occurred, the qualitative thematic analysis not only provided descriptions of what the sub-phases looked like and the order in which they occurred, but also provided insights in patterns visible in the data that could not be captured with the quantitative content analysis. In this sense, the thematic analysis served partly as an additional perspective on the quantitative data, and partly as an independent data source.

Similarly, both types of data (i.e., think-aloud utterances and interviews) jointly contributed to answering the research questions. The think-aloud utterances revealed how often separate cognitive sub-phases occurred when students were providing and processing peer-feedback, and the interviews provided enriching information about potential effects of interpersonal perceptions on the process of dealing with errors that may have been only partly observable in students’ think-aloud utterances.

Results

The results are organized by analysis method. First, the results are provided for the quantitative content analysis, i.e., the number of occurrences is given for each of the codes within the six semantic domains, as counted in the think-aloud data. Hereafter, the results are provided for the qualitative thematic analysis, i.e., the descriptions are given for the four themes that summarized the patterns visible in the think-aloud data and interview data.

Quantitative content analysis

Sub-phases in dealing with errors and pluses while providing feedback (semantic domain 1 and 2). During the procedure of providing feedback, the think-aloud utterances showed that students focused more on errors than on pluses. On the one hand, the students said to identify and decode errors about as often as pluses. More specifically, seven students more often identified pluses, and five students more often identified errors. Additionally, all students hardly declared to decode errors or pluses. On the other hand, the majority of students said more often that they were evaluating errors than pluses, and nearly all said to encode feedback based on errors more often than feedback based on pluses (Table 4).

Sub-phases in dealing with errors and pluses while processing feedback (semantic domain 3 and 4). Also during the procedure of processing feedback, the think-aloud utterances showed that students focused more on errors than on pluses. Table 5 shows that all students voiced more often think-aloud utterances related to any of the sub-phases of dealing with errors (i.e., decoding, evaluating, revising) than to any of the same sub-phases related to dealing with pluses; except for Sarah, who made more utterances regarding reading, evaluating, and revising for pluses than for errors.

Table 4 Frequencies for Codes Related to Semantic Domains (1) Dealing with Errors while Providing Feedback, and (2) Dealing with Pluses while Providing Peer-feedback
Table 5 Frequencies for Codes Related to Semantic Domains (3) Dealing with Errors while Processing Peer-feedback; (4) Dealing with Pluses while Processing Peer-feedback
Table 6 Frequencies for Codes Related to Semantic Domains (5) Mentioning the Other, During the Provision of Feedback and the Processing of Feedback

Mentioning the dyad partner (semantic domain 5). With respect to the explicit references to the other, Table 6 shows that, eleven of the twelve students (all, except Robert) explicitly referred in their think-aloud utterances to the feedback provider and feedback recipient by saying ‘he (his)’ or ‘she (her)’, ‘you’, and/or by mentioning the recipient’s name (see Table 5). Examples are: “I believe his point of view is not clear in the text.” (Linda, think-aloud); “I will write this down anyway: is ‘learnability’ a word? Maybe he can use another word” (Mary, think-aloud). Only Robert neither referred to the feedback recipient when providing feedback, nor to the feedback provider when processing feedback.

Effects of interpersonal relationships (semantic domain 6). Almost no effects of interpersonal perceptions were visible in the think-aloud data. Only Mary once made a remark that was coded as being related to ‘perceived effort’, and Thomas once made a remark that was coded as being related to ‘friendship’.

Qualitative thematic analysis

Working procedure

The provision of feedback, as conducted by the students in the sample, could be characterized as linear. Students opened the text document, and immediately started reading the text from start to end. They wrote comments, related to particular text elements, at the moment they first encountered those text elements. When they had read the end of the text for the first time, some students started re-reading the text from beginning to end, others did the same when the research-assistant stimulated them to continue thinking-aloud. During their ‘second round’, all students produced new feedback remarks, based on text elements other than those dealt with during the ‘first round’.

The processing of feedback, as conducted by the students in the sample, was less linear than the provision of feedback. First, all students made interchangeable use of two strategies. When they revised their text by taking the feedback as their starting point, they processed each of the feedback remarks one by one, and decided for each of them whether they deemed any textual revision necessary. When they took their texts as their starting point, they read their text and switched to a feedback remark when they encountered one. Second, eight students (all except for James, Anthony, Sarah, and Thomas), repeatedly did not deal with feedback immediately as they read it. When they encountered a feedback remark asking for textual adaptations which they agreed with, but simultaneously did not know how to revise their text accordingly, they decided to return to this remark later.

Occurrence of cognitive sub-phases in the process of dealing with errors

During the provision of feedback, students dealt with errors consistently in line with one of two patterns that could be described as ‘quick discovery of errors’ and ‘elaborate discovery of errors’. The ‘quick discovery’ pattern often occurred in relation to lower-order concerns, and was characterized by a phase of reading the text, immediately followed by the identification of an error, or even an immediate error correction. As such, the identification phase, potential decoding phase, and potential evaluation phase either did not occur, or occurred so quickly that students did not distinguish them in their think-aloud utterances. The following example illustrates the pattern of a quick discovery: “[reads the text:] ‘this is a joke about not being able to speak English very well and that you will learn it.’ At least a comma has been forgotten before ‘and’. And that you will – oh I need to write that down” (David, think-aloud). This example shows that the first time David read the sentence written by his dyad partner, he immediately identified the textual element as erroneous: while reading, he quickly discovered the error, and almost simultaneously provided a correction in the form of an added comma, which resulted in the encoding of a feedback remark.

The intuitive nature of the pattern of quick discoveries contrasted with the pattern of elaborate discoveries, which often occurred in relation to higher-order concerns, and was the result of a thorough interpretation process. During this interpretation process, students read a part of a text, realized that they did not fully grasp it, reread the same part, potentially in multiple iterations, and inferred what the author tried to say. This interpretation process led to the identification and evaluation of an error. For example:

[reads the text] This was the ehm, (4) counter-argument it looks like, but then, this is the conclusion. […] Ooh. It is placed again over here. Never mind. I missed the final page, but it is alright like this [rereads same part of the text] Yes, she uses the argument that one […] keeps learning Dutch […] But then one could use the same argument in favor of that one learns English in this way […] Ehm. I will write that down now (James, think-aloud).

In this example, the error, which could be characterized as an argumentation error, was identified after a thorough thinking process. First, James did not understand what he had read, then he reread the same part; he first believed the author was right, but only in the end he identified and evaluated an error.

Also during the processing of feedback, students dealt with errors consistently in line with one of two patterns. First, students often immediately had an idea of whether they agreed with the feedback when they read it. For example: “He says it’s a weird sentence, but I actually don’t agree” (John, think-aloud). This immediate response on feedback displays similarities with the ‘quick discovery’ pattern in the feedback provision phase where students immediately recognized a text element as erroneous. Second, students often had to reread feedback remarks in order to interpret their meaning and decide whether they agreed with the feedback. For instance, in the following excerpt, Sarah processes feedback of Steven who marked the words ‘self fulfilling’ [sic] in her text yellow:

Marking? (…) I don’t understand, ‘formatted’. I don’t understand why Steven marked this. Or is it because self-fulfilling should be written as one word? I don’t know. Is he [he = Microsoft Word’s spelling check] going to mark this as an error? [types ‘self-fulfilling’ in her text.] Ah, I see, I think he [he = Steven] means that it is one word. So then we can delete this comment (Sarah, think-aloud).

This process of simultaneously reading and interpreting a feedback remark displayed similarities with the ‘elaborate discovery’ of errors in the feedback provision phase, where students (iteratively) reread text elements in order to interpret them.

Comparison to quality criteria

During the provision of feedback, students compared the text with internal manifestations of text quality criteria. That is, eight of the twelve students (Mary, James, John, Michael, David, Thomas, Daniel, Anthony) repeatedly used their knowledge about what an argumentative text should look like (i.e., quality criteria) during their processes of identifying and evaluating errors, and encoding feedback remarks. For example:

[Teacher name; omitted] always comes up with three, ehm, rules that [the title of an argumentative text] should follow, being that a title should be covering, explaining, and catchy. Covering and explaining, that’s kind of fine, but catchy… I would, ehm, a bit more, ehm… (Anthony, think-aloud).

This excerpt illustrates that the comparison with criteria contributed to the identification of errors, in this example an elaborate discovery of an error, by offering a ground for a proper evaluation.

Effects of interpersonal perceptions

With respect to the provision of feedback, there was a clear difference between the extent to which students declared in the interview to have been affected by interpersonal perceptions while providing feedback and the extent to which they seemed to have been affected during the think-aloud while providing feedback. On the one hand, ten of the twelve students (all, except John and Anthony) strongly argued in the interviews that they were not affected by the interpersonal relationship with the feedback recipient while providing feedback. They shared the view that they ‘just’ provided feedback on a text, and that they would have done that the same way when they would have had to provide feedback on a text written by another peer: “It’s a text, not a person what I read” (Robert, interview); “[The way in which I provide feedback] would not differ that much [from] if I had to provide feedback to someone else” (Thomas, interview).

On the other hand, half the students (Mary, James, Steven, Sarah, Daniel, Anthony) showed awareness of the feedback recipient, without being prompted by one of the research-assistants. This happened before, in between, or after the think-aloud activities, during the interviews, and also during the think-aloud activities. These students seemed to have taken into account the perceived language skills, and/or the perceived effort of the feedback recipient. A few examples: James mentioned, just before he started providing feedback, that “Mary [Mary = the feedback recipient] has in general, let’s say, a higher level of writing skills, so [laughs] (perceived language skills).” Sarah stated, before she started providing feedback: “He [He = the feedback recipient] also just texted me and he said, like, ‘I wrote it late in the evening’, so, don’t worry too much about it (perceived effort).” Daniel said during the interview: “I know that he [he = the feedback recipient] is dyslectic himself. So you notice that, it pops up” (perceived language skills).” And Mary said, while providing feedback, “James [James = the feedback recipient] told me that he wrote this essay just quickly within 23 minutes (perceived effort).”

With respect to the processing of feedback, there was a clear distinction between students that declared to be affected by their relationship with the feedback provider while processing the peer-feedback (Mary, James, Robert, John, David, Thomas, Daniel, Anthony), and students that said not or hardly to be affected by their relationship with the feedback provider while processing the peer-feedback (Linda, Michael, Steven, Sarah). Six students that said to be affected by the interpersonal relationship mentioned the perceived language skills (Mary, James, Robert, John, David, Thomas), and four students (David, Thomas, Daniel, Anthony) mentioned perceived effort as an important factor. For example:

“I think that I – I think that I keep that [that = the identity of the feedback provider] into account. Especially because I, ehm, well yes, you always have got an idea about someone. Like, well, is this person, ehm, skilled in the Dutch language, let’s say (perceived language skills, Thomas, interview);

If I would think ‘this person has not seriously worked on this task’, than I would find it more difficult to take the criticism seriously, because I wouldn’t know in what way it would have been written (perceived effort, Daniel, interview).”

The four students that did not show any signs of being affected by the interpersonal relationship with the feedback provider shared several characteristics: they (a) were all members of class B; (b) made less comparisons with text quality criteria than the eight students that said to be affected by the interpersonal relationship; (c) together formed two dyads; and (d) made less references to the other while processing feedback than almost all other students (see Table 6).

Discussion

The aim of the current study was to investigate what cognitive sub-phases could be distinguished in the process of dealing with errors, and how these sub-phases may be affected by interpersonal perceptions during the provision and processing of peer-feedback. With a think-aloud protocol, we explored students’ thoughts during the activities of peer-feedback provision and peer-feedback processing, and in semi-structured interviews students looked back on the peer-feedback activities.

Our think-aloud data showed that both in the feedback provision phase and in the feedback processing phase, the process of dealing with errors displayed one of two patterns. In the feedback provision phase, we found two patterns. In the pattern of ‘quick discoveries’, the identification of errors seemed to happen simultaneously with the decoding, and often any thoughts related to an evaluation phase were lacking. By contrast, during the pattern of ‘elaborate discoveries’, the identification of an error seemed to occur as a result of an interpreting/evaluating phase. In the feedback processing phase, we found two patterns that were comparable with the patterns in the feedback provision phase with respect to the extensiveness of students’ thought processes.

The order in which the sub-phases of the process of dealing with errors occurred during the provision of peer-feedback partly deviated from the expected behavior based on the conceptualization by Aben et al. (2019). Whereas Aben et al. (2019) hypothesized a sequence of error identification, decoding, evaluation, and feedback encoding, our results showed that some sub-phases may be skipped, or may occur in a different order. Also with respect to the processing of peer-feedback, the patterns were partly in line with the conceptualization by Aben et al. (2019). Whereas Aben et al. (2019) hypothesized a sequence of feedback reading, decoding, evaluating, and output revising, the reading and decoding phases could not be distinguished in students’ think-aloud utterances. Potential explanations for these findings are that some sub-phases may have followed each other so rapidly that they could not be distinguished with the current measurement method, or that they have been automatized and therefore could not report them when thinking aloud. It is also possible that the process is more fluid than described in the model, and that in practice, the boundaries between sub-phases are blurred.

Although our analyses did not show relations between the types of errors and how those errors were dealt with during the processing phase (as found by Kim & Bowles, 2019), we did find evidence that the process of dealing with errors had different appearances dependent of types of errors during the provision of peer-feedback. More specifically, the pattern of ‘quick discoveries’ mostly resulted in the identification of errors related to lower-order concerns, whereas the pattern of ‘elaborate discoveries’ mostly resulted in the identification of errors related to higher-order concerns. Hence, future research could continue to explore the relationship between the appearance of sub-phases during the provision of peer-feedback and different types of errors.

The analyses showed that the majority of students in our sample predominantly focused on errors, and comparatively less on pluses, while providing and processing peer-feedback. Whereas they declared that they identified pluses about as often as errors while providing feedback, errors resulted more often in the encoding of a feedback remark than pluses. Furthermore, during the feedback processing phase, textual revisions only occurred as a result of feedback related to errors and not as a result of feedback related to pluses. The focus on errors is not surprising, as our instruction guided students to focus on spelling errors, writing style errors, argumentation errors, and text structure errors. Besides, students are primarily involved in peer-feedback activities with the aim to improve their (peers’) learning (performance), rather than complimenting their (peers’) current learning states (Liu & Carless, 2007). As the necessity to bridge a gap in relation to a standard is not there in the case of pluses, this likely enhances students’ error-oriented focus in feedback activities (Narciss, 2017). This is also in line with Máñez et al. (2019), who found that students spend more time on errors than on pluses while processing feedback.

With respect to the influence of interpersonal perceptions on feedback processes, we detected differences when we compared the feedback provision and processing phase. Our findings suggest that interpersonal perceptions influenced students only implicitly during the feedback provision phase, as almost all of the students said in the interview not to be affected by the interpersonal relationship, whereas half of them showed before, after, or during the think-aloud activities, or in the interview to be aware of the recipient’s perceived language skills and/or perceived effort. By contrast, during the feedback processing phase, interpersonal perceptions seemed to affect the majority of the students explicitly in the interview. Two third of the students declared in the interview taking the provider’s perceived language skills and/or the perceived effort into account while processing their feedback. These results were in line with previous research illustrating that peer-feedback processing may be affected by perceptions of language skills (Berndt et al., 2018; Strijbos et al., 2010) and perceived effort (Timmers et al., 2013).

Regarding interpersonal perceptions, the difference between the feedback provision and feedback processing phase is remarkable. On the one hand, it could be that students want to portray themselves as objective, and/or strive to be objective, when providing feedback. Research shows that avoiding bias is highly valued when making decisions or when assessing (Irwin & Real, 2010), which is also reflected in the wide range of attempts to compose descriptions of performance quality criteria, such as rubrics (Panadero & Jonsson, 2013). On the other hand, during feedback processing, the feedback recipient’s self-interest may be too large to ignore knowledge of the interpersonal relationship that potentially conveys information about the accuracy of the provided feedback.

Strengths and limitations

Our mixed-methods design provided a richer account of the way students dealt with errors and the role of interpersonal perceptions during the provision and processing of peer-feedback than either analysis would have provided. Findings from both analyses were consistent, for example, in showing that students focus more on errors than on pluses when providing and processing feedback, and that interpersonal perceptions play a role in the processing of peer-feedback. Simultaneously, the analyses provided complementary information. For example, the qualitative analysis revealed that the occurrence of sub-phases, as identified in the quantitative analysis, followed different patterns. Additionally, the qualitative analysis showed that students seemed to be explicitly affected by interpersonal perceptions in the feedback processing phase, and implicitly in the feedback provision phase. This emphasizes the value of combining multiple data sources and analytical techniques in order to better understand the provision and processing of feedback.

Simultaneously, it is important to bear in mind potential limitations of this study. First, one could argue that social desirability played a role in students not reporting that the perceived Dutch language skills of their peer may have affected their provision of feedback. However, as students had no problems with mentioning that interpersonal perceptions had affected their feedback processing behavior, our data rather seemed to indicate that most students were not aware of effects of interpersonal perceptions on their feedback provision behavior, whereas most of them were aware of those effects during the processing of feedback.

Second, there are known limitations of think-aloud studies in general. For example, Sachs and Polio (2007) found that learners who did not have to think aloud made more accurate revisions while processing teacher feedback on their text written than learners who did have to think aloud. In fact, students in our sample mentioned that they were unexperienced in thinking aloud, which may have affected their feedback provision and processing behavior positively—“Actually I think my mind would wander much quicker if I would not have to talk aloud” (Mary, interview)—or negatively—“Well I – yeah, it was a bit difficult, because of course you only had ehm, yeah, five seconds let’s say, to think” (Anthony, interview). Especially this last remark emphasizes caution, as it shows that Anthony did not understand the rationale behind thinking aloud.

Third, our analyses indicated that students’ patterns of dealing with errors was related to the type of error identified (i.e., errors related to either higher- or lower-order concerns). Nevertheless, we need to be careful with linking specific types of errors to specific patterns of dealing with errors, as we neither executed a content analysis of the errors in the texts as produced by the students, nor a content analysis of the actual revisions made by the students based on the feedback they received.

Practical implications

Despite the limitations, the results have clear implications for educational practice. First, earlier research emphasized the crucial role of teacher instruction for optimal learning gain from peer-feedback activities (e.g., Min, 2005; Van Steendam et al., 2010). With respect to the process of dealing with errors, this implies that the explanation of criteria and the deviations from criteria, e.g. errors, may also carry a central role in instructions for peer-feedback activities. Our results showed that the students of the one class referred more often and more efficiently to criteria than the students of the other class. Although our sample was too small to investigate whether this was due to a classroom or teacher effect, it suggests that the engagement with text quality criteria may be either a result of, or a prerequisite for, efficiently dealing with errors while providing and processing peer-feedback on writing performances. Further studies could be undertaken to explore this idea.

Second, instructions prior to peer-feedback activities in educational settings should also aim to teach students to view their texts in a holistic manner. Instead of first reading the whole text before encoding feedback (feedback provision phase) and reading all the feedback before revising the text (feedback processing phase), the students in our sample typically immediately encoded a remark during the reading of the text (feedback provision phase) and, often, immediately started revising their text after reading a feedback remark (feedback processing phase). As such, they did not appear to look at their texts with a bird’s-eye view, and therefore did not treat the text and feedback as coherent entities. This also became clear from the fact that most of the students identified a complete new set of errors on a micro level when they reread the text when providing feedback. The development of a holistic view on texts improves the chances of identifying errors related to higher-order concerns, which in turn may lead to more significant text improvement (Lerchenfeldt et al., 2019).

Third, the fact that interpersonal perceptions seemed to influence feedback provision and feedback processing behavior implies that the composition of a dyad may also influence potential learning gains. Especially when processing peer-feedback, students may, for example, spend less time on processing feedback when they perceive the effort invested into the provided peer-feedback as low. This implies that future research could aim to establish whether peer-feedback activities could be optimized when students, as well as teachers, would be aware of the potential role that interpersonal perceptions may play in peer-feedback provision and processing.