Introduction

The ability to synthesise information from different sources into a new meaningful text, a synthesis text, is an important skill in higher education. However, many students find writing synthesis texts challenging. This is not surprising given the cognitively demanding nature of this task (Martínez, Mateos, Martín, & Rijlaarsdam, 2015; Mateos, Martín, Villalón, & Luna, 2008; Solé, Miras, Castells, Espino, & Minguela, 2013).

The process of source-based writing, such as synthesis writing, involves both reading and writing, which led Spivey & King (1989) to label it as a hybrid task. The complexity of synthesis writing does not call for a simple “reading-then-writing” strategy. Rather, it involves a complex interplay of reading and writing sub-processes. During the writing process, students alternate between reader and writer roles as they read sources, select relevant information from the sources, compare and contrast the information from the different source texts to each other, and write and revise the actual text. Key to synthesis writing is the integration process which encompasses connecting the ideas from the different source texts by organising and structuring them around a central theme in the target text (Solé et al., 2013; Spivey & King, 1989).

A small body of research looks into the use of sources during the writing process and its relation to text quality, and found that a more complex process leads to a better quality text (Martínez et al., 2015; Mateos & Solé, 2009; Solé et al., 2013). Generally, more complex processes involve recursive activities and mediation between the different process activities. A recursive process implies that the writer adopts a non-sequential or non-linear approach. Instead of a simple sequence of reading and then writing, the process is marked by a sequence of recurring and alternating reading and writing. Thus, in a recursive process, the reading and writing activities occur repeatedly throughout the process. A more complex process also implies mediation between the reading and writing activities. Mediation involves interaction between reading and writing activities. Reading entails writing and vice versa. Mediation between the sources and the synthesis text being produced occurs when, for example, the writer takes notes during reading (main goal = reading) or when the writer goes back to the sources while writing to look for additional information (main goal = writing). In the latter case, these reading activities play an intermediary role during the writing activity and act as a mediator. Martínez et al. (2015), Mateos & Solé (2009), and Solé et al. (2013) found a positive correlation between the quality of a synthesis text and the amount of recursion and mediation during the process in audio–video recordings of students’ synthesis processes. These findings seem in accordance with the very nature of the synthesis task, which involves sub-processes such as comparing and contrasting the information from the different sources, linking the sources to one another and integrating the information in a new independent text. In order to successfully accomplish these goals, the writing process should be marked by recursion and mediation.

When examining the relation between process and product, it is not only important to take into account the (frequency of) various process activities and the interaction between them, but also the moment at which they occur and their variation over the writing process (i.e., temporal distribution). Rijlaarsdam & van den Bergh (1996) and Breetvelt, van den Bergh, & Rijlaarsdam (1994) showed that the influence of a certain writing activity on text quality depends on the phase of the process in which the activity takes place. A cognitive activity that has a positive effect on text quality at one phase of the writing process may show a negative effect during another phase (Breetvelt et al., 1994).

A limited number of studies on source-based writing took into account the temporal dimension of writing. Research by Lenski & Johns (1997), Martínez et al. (2015), Mateos & Solé (2009), and Solé et al. (2013) for example, stressed the importance of activities returning at various points during the writing process. Adopting a more general approach to the temporal dimension of writing, the authors showed that a recursive approach led to a higher quality text than a sequential approach. These studies, however, did not specify at which moment certain activities should take place to be beneficial for text quality. Some other studies on source-based writing adopted a more specified view on the temporal distribution of activities in the process. Breetvelt et al. (1994), Escorcia, Passerault, Ros, & Pylouster (2017), and Leijten, Bernolet, Schrijver, & Van Waes (2019), showed that a long initial reading time is positively correlated to text quality, though Escorcia et al. (2017) remark that an intense focus on the sources in the first phase of the writing process leads to less attention to content elaboration, spelling and grammar. The research by Breetvelt et al. (1994) showed that generating ideas and structuring content have a positive effect when occurring during the second phase of the process. The study by Leijten, Bernolet, Schrijver, & Van Waes (2019) confirms this. They found that switching between the sources, needed for generating and structuring the ideas, is negatively correlated to text quality in the first phase of the process, but has a positive effect in the second phase of the process. Writing and rereading proved to have a positive effect in the last phase of the writing process (Breetvelt et al., 1994).

Research, however, also shows that it may be the combination of specific activities, that is, specific patterns of writing activities in different phases of the writing process which determine text quality (van den Bergh, Rijlaarsdam, & Van Steendam, 2016). Nevertheless, no study to date has looked into effective source use patterns for different types of synthesis texts.

Aim of the present study

Given the importance of sources in synthesis writing, our research objective is to get a clear picture of an effective use of sources during the writing process, in other words, source use resulting in a good quality synthesis text. More specifically, we would like to study effective source use patterns for two genres of synthesis texts, which have up to now remained largely unexplored.

The insights of this study will serve as a base for developing process-oriented feedback on students’ synthesis writing. Though previous research (van den Bergh et al., 2016) has proven the link between writing process and text quality, feedback aiming to improve the writing process is scarce. In practice, teachers usually give feedback on the writing product (i.e., text quality). However, given that it is the writing process that generates the product, feedback on the writing process might be extremely valuable and should be taken into account as well. To develop such feedback, it is necessary to gain a full understanding of how students deal with sources when writing a synthesis and which source use patterns lead to a good quality synthesis.

To reach our goal, we will first describe source use in synthesis writing of 294 upper-secondary students as registered with keystroke logging software. More specifically, we will study the temporal distribution of various source-related activities over the writing process. We do this for two different synthesis genres (argumentative and informative synthesis), as writing strategies differ according to genre (Beauvais, Olive, & Passerault, 2011). Secondly, we explore the relation between a set of source-related writing activities and text quality, and consider the effect of temporal distribution on this process–product relation. Finally, we build two models (one for argumentative and one for informative synthesis tasks) taking into account the combination of various source-related activities in the different phases of the writing process to identify patterns of source use resulting in a successful synthesis. More specifically, we will answer the following three research questions:

  1. 1.

    To what extent does source use vary during the writing process? And does it differ according to genre (argumentative versus informative synthesis)?

  2. 2.

    Do the individual source-related process indicators relate to text quality? And does this differ according to genre?

  3. 3.

    What are effective patterns of source use for each of the two synthesis genres?

Method

Participants

A total of 300 Dutch students participated in the study. The data of six students could not be analysed due to technical issues, resulting in a final sample of 294 students whose data were included in the study (95 males, 199 females, average age 16.41). Students varied in grade level from grade 10 to grade 12 of pre-university secondary education and were registered at twelve different Dutch schools. Table 1 provides an overview of the distribution of participants over the grades.

Table 1 Distribution of participants over grades

Materials

Writing task

Two different synthesis tasks on the human-wildlife conflict in Africa were developed: one argumentative task and one informative task. Contrary to previous research that mainly focused on a single genre, we wanted to know whether there would be an effect of genre on the writing process and therefore opted to use the two existing synthesis genres. Within each genre, four versions of the task were created, crossing two factors. Factor 1 was the relation between the source texts (two levels: complementary/contradictory information), and factor 2 the amount of irrelevant information in the source texts (two levels: little/a lot). The synthesis task was based on five source texts. The task construction is visualised in “Appendix A”.

The variety in tasks within each genre allows us to generalise over the different synthesis tasks (within the two synthesis genres). Previous studies (Schoonen, 2012; Van den Bergh, De Maeyer, van Weijen, & Tillema, 2012) showed that a variety of tasks is needed in order to draw conclusions concerning genre-related writing. For this reason, four task versions were used in the current study.

The different versions of the task were distributed randomly over the participants. The distribution of the participants over the two genres did not differ significantly regarding gender (Χ2(1) = 1.07, p = 0.30), grade (Χ2(2) = 0.238, p = 0.888) nor age (t(292) = 0.988, p = 0.324).

Students received textual instructions for the task. Instructions included: (1) a short explanation on what a synthesis text is, (2) a short explanation on the characteristics of an argumentative/informative synthesis text, dependent on the task at hand, (3) instructions on how to deal with the sources, (4) instructions on the audience they had to keep in mind for their text, (5) instructions on style, (6) instructions on text length, and (7) instructions on time. “Appendix B” presents the instructions in detail. The instructions were identical for all versions of the task, the only difference was whether the characteristics of an argumentative or an informative synthesis were explained, depending on the genre in which the students had to write their text.

Equipment

Students wrote their texts on laptops on which keystroke logging software Inputlog (Leijten & Van Waes, 2013) was installed. Inputlog is a tool that logs and analyses different aspects of the writing process by registering mouse movements, keystrokes and window switches. The source texts were provided as PDF files on the laptops. No paper versions of the sources were available, as we wanted to register all source use with Inputlog.

Procedure

Students participated in the study at their own school in groups of ten to twenty students from the three different grades. Data collection was led by two researchers. Students were first informed of the goal and procedure of the study. After reading and signing the consent forms, they were walked through the synthesis task instructions so they knew what the writing tasks would entail. They were given the opportunity to ask questions if the instructions were unclear to them. Then, students received a short instruction on the use of Inputlog. Once all students were familiar with the task instructions and the use of Inputlog, they opened (without reading) all sources belonging to the version of the synthesis task assigned to them. They were instructed to use only the provided sources for their text. Internet connection was disabled. Students then made sure Inputlog started recording their writing process and had 50 min to carry out the task.

Analysis of text quality

Bouwer, Béguin, Sanders, & van den Bergh (2015) emphasise that the writing performance depends on the textual genre. Therefore, we set up two assessments, one with 145 informative and another with 149 argumentative synthesis texts.

The texts were rated with D-PAC, an online tool for comparative judgement. This method is based on the assumption that comparing two performances to one another is easier for the rater than assigning a score to a particular performance or product (Lesterhuis, Verhavert, Coertjens, Donche, & De Maeyer, 2016). 26 raters were involved in the rating procedure. On average, each synthesis text was compared 14.66 times. This led to a reliable rank–order ranging from the lowest to the highest scoring text (SSR reliability coefficient is 0.76 for the argumentative texts and 0.73 for the informative texts).

Raters evaluated the texts holistically, thus assessing the global quality of the texts. Because we wanted the raters to acknowledge the relevant features of a synthesis text, they received information on the four key synthesis text quality aspects: (1) relevance and correctness of the information, (2) integration of the sources into a new text with its own structure and overarching theme, (3) coherence and cohesion, and (4) language use. We based these criteria on previous research on synthesis writing (Boscolo, Arfé, & Quarisa, 2007; Mateos et al., 2008; Mateos & Solé, 2009; Solé et al., 2013). So the texts were assessed holistically, but the rating was done by raters who were informed of the relevant quality aspects of a synthesis text.

Analysis of writing processes

All 294 writing processes were analysed using the source analysis of version 7.1.0.47 of Inputlog (Leijten & Van Waes, 2013). Based on the data Inputlog generated, we created three process indicators: (1) relative time in the sources, (2) number of transitions between the sources per minute, and (3) number of transitions between the target text (i.e., the student’s own synthesis text) and the sources per minute. All three indicators were relative in nature, allowing us to compare the different writing processes (as some students finished earlier than the given 50 min time on task) and to generalise the findings.

Each writing process was divided into three equal intervals. We opted for three intervals, as this division (beginning–middle–end) is easily interpretable and therefore transferable to future feedback. Thus, as each of the three source-related process variables were taken into account in each of the three process intervals, nine process variables were available per text.

Preparing the data for analysis

To examine the importance of the temporal distribution of source-related activities (Research question 1), we conducted linear mixed models analyses. The process–product relation (Research question 2) was explored using Hayes moderation analyses. Polynomial regression analyses were used to identify effective source use patterns (Research question 3). Depending on the specific analyses to be conducted, some preparatory analyses were needed.

First, both exploration of the data via boxplots and tests of Skewness and Kurtosis showed that the process data were not normally distributed. Therefore, we transformed the process scores to log-normal distributions by taking the natural log of each of the scores. These transformed variables were used to conduct mixed models analyses as in these analyses the process variables were dependent. As there is no assumption about normality on independent variables, log-normal transformation was not needed in the case of the other analyses.

Secondly, the data had to be prepared to perform a polynomial regression analysis. This type of regression analysis not only takes into account linear, but also quadratic relations. To decide whether or not to include a quadratic variable in the regression analysis, the existence of a quadratic relation was explored by performing a curve estimation analysis on each of the nine process variables. Given that curve estimation and regression are highly sensitive to outliers in the data, the extreme outlying scores (i.e., cases with values more than three times the interquartile range) were filtered out. In total, eleven cases were filtered out: five in the case of argumentative synthesis texts, six in the case of informative synthesis texts. Since the regression contains polynomial terms, the process variables were standardised by centring them around the mean (Mortelmans & Dehertogh, 2007). The standardised variables were then used to create a quadratic term of those process variables that proved to be curvilinear after performing the curve estimation analysis.

Results

Distribution and variation of source use for two synthesis genres

The first aim of this study was to explore the effect of temporal distribution and genre on source use during the writing process, as this would give us more insight into the distribution of source-related activities over the writing process intervals and the possible influence of genre on this distribution.

The effect of interval, genre and their interaction (interval × genre) on relative time in sources, transitions between sources per minute and transitions between synthesis and sources per minute were examined using linear mixed models. For each of the three process variables, four models were compared (see “Appendix C” for an overview of the models with estimates of fixed effects). The first model took into account the random variance in each of the three process variables across the participants. In the second model, interval was added as a fixed effect. This model examined whether the process variables were equally distributed over the writing process or whether they differed according to the interval. In the third model, genre was added as a fixed factor. Finally, the fourth model included the interaction effect of interval and genre. This last model allowed us to explore whether the effect of interval was the same for both genres, or whether genre had an influence on the variation of the source-related process variables across the intervals.

To assess the adequacy of fit of the different models, all four models were compared taking into account the change in log-likelihood ratio. Chi square test statistic was then used to determine the model with the best fit (Curran, Obeidat, & Losardo, 2010) (see “Appendix D”). In the case of time in sources (χ2(2) = 795.253, p < 0.001) and transitions between sources (χ2(2) = 243.841, p < 0.001), the model with the best fit is the one that includes the effect of interval (Model 2, “Appendix D”). For transitions between synthesis and sources, the interaction effect (interval × genre) proved to be the best fit (χ2(2) = 12.015, p = 0.002) (Model 4, “Appendix D”).

We studied the mixed model analyses in more detail to specify how the temporal distribution (i.e., interval) affected source use and how the distribution of the transitions between the synthesis and the sources was different for the two genres.

Effect of interval

Table 2 presents an overview of the observed distribution of the source variables (relative time in the sources, number of transitions per minute between the sources, and number of transitions per minute between the synthesis text and the sources) over the three intervals for both argumentative and informative genre.

Table 2 Descriptive statistics of the three source-related process measures as observed over the three intervals for both genresmean (standard deviation)

Table 2 shows that the relative time spent in the sources was the highest in the first phase of the process, where students spent half of their time consulting the sources. In the second and third interval, time in sources was considerably lower, being the lowest in the last phase of the writing process. These tendencies were noticeable in both argumentative and informative synthesis texts. Linear mixed models analysis proved that the time spent in the sources was significantly different in each of the three process intervals (F(2, 412.976) = 1259.89, p < 0.001). Pairwise comparisons confirmed that all three intervals differed significantly from each other (p < 0.001), with time in sources gradually declining over the course of the writing process.

The same holds for the number of transitions per minute from one source to another source (Table 2). Results of the linear mixed models analysis showed a statistically significant effect of interval (F(2, 389.087) = 166.942, p < 0.001). All three intervals differed significantly from each other (p < 0.001), with the number of transitions per minute between the sources being the highest in the first interval and the lowest in the third interval.

Concerning the distribution of the number of transitions per minute between the synthesis text and the sources, we observed the highest number of transitions in the middle of the process (Table 2) in both genres. There was a significant effect of interval (F(2, 361.843) = 41.655, p < 0.001) as affirmed by the results of linear mixed model analysis. Pairwise comparisons showed that the number of transitions between synthesis and sources in the second interval differed significantly from those in the first (p < 0.001) and the third interval (p < 0.001). There was, however, no significant difference between the first and the third interval (p = 0.147).

Effect of genre

Results showed that there was no statistically significant difference between the two synthesis genres regarding time in sources (F(1, 269.884) = 0.526, p = 0.469) and the transitions between the sources (F(1, 290.114) = 2.792, p = 0.096). Nor was there an interaction effect of interval and genre (F(2, 413.386) = 1.264, p = 0.284 in the case of time in sources, and F(2, 389.593) = 1.038, p = 0.355 in the case of transitions between sources).

For the transitions between the synthesis text and the sources, however, linear mixed models analyses showed that not only the interval, but also the interaction between interval and genre had an effect (F(2, 359.224) = 6.115, p = 0.002). For genre, no main effect was found (F(1, 265.504) = 0.540, p = 0.463). So although the process of both genres was characterised by a similar distribution of transitions between synthesis and sources over the intervals, with the highest number of transitions in the second interval and the lowest in the third interval, the degree to which the students switched, differed according to genre. Estimates of fixed effects showed that the number of switches per minute in the second interval was significantly higher in the case of argumentative synthesis texts (t(231.997) = − 2.576, p = 0.011) compared to the informative synthesis texts.

This confirms the findings of a study by Beauvais, Olive, & Passerault (2011), stating that different genres call for different writing strategies during the process. As a result, in our following analyses, the writing processes of the two synthesis genres were analysed separately from one another.

Relations between source use during the writing process and text quality

For the purpose of creating meaningful feedback, we needed to get an insight into the relation between the use of sources during the writing process and text quality. To explore the interaction effect of the source-related process measures and interval on text quality, Hayes’ process analysis was performed (Hayes, 2013). We explored whether the relation between text quality and source use was moderated by interval.

For the argumentative synthesis texts, three interaction effects were explored (“Appendix E”), one for each of the three source-related process measures. First, we explored the relation between time in sources and text quality, and the moderation of this relation by interval. Results showed that the relation between time in sources and text quality was moderated by interval (p = 0.03). The simple slopes analysis indicated that the effect was situated in all three intervals (B = − 1.37, p < 0.001 in interval 1, B = − 2.26, p < 0.001 in interval 2, B = − 3.15, p < 0.001 in interval 3). Secondly, for transitions between the sources, no interaction effect was found (p = 0.910). Thirdly, results showed an interaction effect of transitions between synthesis text and sources × interval on text quality (p < 0.001). The simple slopes analysis indicated that text quality was influenced by the transitions between synthesis and sources in the first (B = 0.19, p = 0.010) and in the third interval (B = − 0.15, p = 0.020).

For the informative synthesis texts the same three interaction effects were explored (“Appendix E”). There was an interaction effect interval × time in sources (p = 0.01). The relation between time in sources and text quality proved to be significant in the second (B = − 1.66, p < 0.001) and third interval (B = − 2.67, p < 0.001), but not in the first interval. No significant interaction of transitions between sources (p = 0.270) or between synthesis text and sources (p = 0.490), and interval on text quality was found.

These results affirm the importance of the temporal distribution (i.e., the factor interval). Thus, process-oriented feedback aiming at improving students’ synthesis writing performance, should take into account the moment at which source-related activities take place. The results of Hayes’ process analysis indicate which source-related process activities influence text quality at which moment in the process.

Regression model: source use predicting text quality

To form a more integrated perspective on source use during the writing process of a synthesis text, regression analyses were performed.

Given the small number of previous studies on the synthesis writing process, the selection of variables to be included in the regression could not be based on research-grounded hypotheses. However, Hayes moderation analysis provided us with an indication of possibly relevant predictors. The variables that proved significant when performing moderation analysis, were selected as a set of possible predictors of text quality (Table 3). This selection was broadened with one variable approximating significance in Hayes’ process analysis (i.e., transitions between sources, interaction effect with p = 0.07). The selected variables and their quadratic terms (significance explored via curve estimation analysis) were included in the first step of the regression analysis. Based on the first model, some corrections were made to optimise the model. The predicator variables were reduced (based on significance and weight in the model) and a limited number of outlier participants was eliminated (detected via Casewise Diagnostics, with a limit of 2.5SD). Separate models were built for argumentative and informative synthesis texts.

Table 3 Variables to be included in the first regression model as possible predictors of text quality (based on Hayes moderation process analysis) for both genres

Argumentative synthesis texts

In the case of the argumentative synthesis texts, the first regression model contained eight variables (Table 4, Model 1). Results indicated that the model was significant (adjusted R2 = 0.203, F(8, 135) = 5.547, p < 0.001). It was found that the relative time spent in the sources in the first interval (quadratic term) significantly predicted text quality (β = − 0.34, p < 0.001). The number of transitions per minute between the synthesis and the sources in the first interval approximated significance (p = 0.082) and had a β-value of 0.19, indicating that its effect was quite strong. Though not significant, time spent in sources in interval 3 (β = − 0.20) also had a strong effect as its β-value was quite high compared to the β-values of the other predictors. To improve this first model, a selection of four predictor variables was made, that is, the significant predictor and the two near significant predictors. The fourth predictor consisted of the linear term of the significant quadratic predictor, as polynomial regression demands the presence of the lower-order term when including a higher-order one. Model 2 (Table 4) was significant and had a higher adjusted R2 than the first model (adjusted R2 = 0.215, F(4, 139) = 10.819, p < 0.001), indicating a better fit. This second model contained two significant predictors: time in sources in the first (quadratic term) (β = − 0.33, p < 0.001) and in the third interval (β = − 0.25, p = 0.001). Via Casewise Diagnostics, four outliers were identified (limit of 2.5 SD). After removing these outliers, the regression analysis was repeated (Table 4, Model 3), resulting in the final model (adjusted R2 = 0.246, F(4, 135) = 12.339, p < 0.001). Results indicated that the four predictors explained 24.6% of the variance in text quality. Two predictors significantly predicted text quality of an argumentative synthesis text, namely, time in sources in the first interval (quadratic term) (β = − 0.36, p < 0.001), and time in sources in the third interval (β = − 0.27, p < 0.001). The third predictor, transitions between synthesis and sources in the first interval, was not significant, but contributed to the overall power (i.e., the adjusted R-squared) of the model. The linear term of time in sources in the first interval was included because the quadratic term proved significant.

Table 4 Regression models: source use predicting text quality of argumentative synthesis texts

Informative synthesis texts

The first regression model for the informative synthesis texts contained eight variables and proved to be overall significant (adjusted R2 = 0.150, F(6, 132) = 5.065, p < 0.001) (Table 5, Model 1). Three predictors significantly contributed to text quality, namely, time in sources in the third interval (β = − 0.39, p = 0.001), transitions between sources in the first interval (β = 0.29, p = 0.001), and transitions between sources (quadratic term) in the third interval (β = − 0.26, p = 0.022). The other predictors were not significant, moreover, their standardized β-values were low compared to the ones of the significant predictors, hence, the non-significant predictors contributed less to the model. For these reasons, a second model (Table 5, Model 2) was tested with only the three significant predictors, completed with the linear term of the quadratic variable. The overall model was significant (adjusted R2 = 0.143, F(4, 134) = 6.745, p < 0.001). Two of the four variables were significant, time in sources in the third interval (β = − 0.30, p = 0.001) and transitions between sources in the first interval (β = 0.29, p = 0.001). Via Casewise Diagnostics (limit of 2.5 SD) three outliers were identified. Model 3 (Table 5, Model 3) was the result of the regression analysis on the dataset without the outliers. Results indicated that the four predictors explained 16.2% of the variance in text quality of informative synthesis texts (adjusted R2 = 0.162, F(4, 131) = 7.543, p < 0.001). Three predictors were significant, namely, time in sources in the third interval (β = − 0.29, p = 0.001), transitions between sources in the first interval (β = 0.33, p < 0.001), and transitions between sources (quadratic term) in the third interval (β = − 0.22, p = 0.046). The non-significant variable, the linear term of transitions between sources in the third interval, was included because of the significant quadratic term.

Table 5 Regression model: source use predicting text quality of informative synthesis texts

Discussion

In our search for effective patterns of source use to design process-oriented writing feedback, we first explored the effect of interval by mapping the use of sources over the writing process. This way, we gained insight into the distribution and variation of different source-related activities over the writing process. Secondly, we related source use to the quality of the synthesis text. Both when exploring the temporal distribution and when relating the writing process to text quality, we included the effect of genre by comparing argumentative synthesis texts with informative synthesis texts. This approach allowed us to achieve our main research objective, that is, identifying effective source use patterns resulting in a high quality argumentative/informative synthesis text. By conducting polynomial regression analyses we were able to take into account the various source-related process activities and their temporal distribution in an integrated way. This resulted into two patterns related to a successful synthesis: one for the argumentative genre and one for the informative genre.

To answer our first research question To what extent does source use vary during the writing process? And does it differ according to genre?, we mapped which source-related activities took place at which moment of the process. Results showed a general pattern in which the first process interval was marked by a focus on the sources, in both genres. The students spent approximately half of their time in the sources during the first interval. Moreover, they alternated frequently between the sources during the first interval, indicating that they compared and contrasted the information from the different sources in the beginning of the writing process. After the initial phase with a focus on reading the sources, the focus shifted towards text production. The middle of the process was marked by mediation; the students alternated frequently between their synthesis text and the sources. They interrupted their writing by looking for information in the sources. This was the case in both genres. The students writing an argumentative text, however, switched significantly more often than the ones writing an informative text. So the participants returned to the sources more frequently when writing an argumentative text. It thus seems as if, in the case of the argumentative synthesis genre, students had to check the sources more frequently in order to plan the content of their text, or (if they had already outlined the content in the first phase) to specify their plan and to support their arguments in a more detailed way. The last interval was characterised by a focus on one’s own text in production, so the time spent in the sources was low and the transitions between the sources and the transitions between synthesis text and sources were less frequent. These results confirm the findings of previous research, stating that it is important to take into account the temporal distribution of activities over the writing process (Breetvelt et al., 1994; Rijlaarsdam & van den Bergh, 1996). In addition, we could observe two typical aspects of the synthesis process: mediation between reading and writing, and recursion, as the reading and writing activities return at different moments throughout the process (Martínez et al., 2015; Mateos & Solé, 2009; Solé et al., 2013). Moreover, our analyses confirm that the writing process differs according to genre (Beauvais et al., 2011).

To provide an answer to the second research question Do the individual source-related process indicators relate to text quality? And does this differ according to genre?, we related the various writing process variables to the quality of the text. Surprisingly, the relative time spent in the sources during the first interval correlated negatively with text quality for argumentative synthesis texts. It seems doubtful that this result implies that reading the sources carefully before starting to write is a bad idea, as previous studies showed the importance of the initial reading time (Breetvelt et al., 1994; Escorcia et al., 2017; Leijten et al., 2019). It is more likely that students’ writing performance was hindered by their low reading ability (Plakans, 2009). The low scoring students may have struggled with reading and interpreting the sources, and selecting information; therefore, spending too much time in the sources. Hence, this suggests that the relation between time in sources and text quality is not just a linear one. Also in the middle and at the end of the process, we found a negative relation between time in sources and text quality for both the argumentative and the informative genre. This implies that, after the initial reading, the focus should be on the writing itself. For the number of transitions between the sources per minute, we could not find significant relations with the quality of the synthesis text, for neither of the two synthesis genres. Regarding the number of transitions between the synthesis text and the sources per minute, relations with text quality were found for the argumentative genre. Switching between the text in production and the sources had a positive effect on text quality when done in the first interval, but when occurring during the last interval of the process it had a negative effect. The first finding seems plausible as the alternation between sources and synthesis can point to note-taking while reading (i.e., mediation) (Mateos & Solé, 2009; Slotte & Lonka, 1999). The second finding is less straightforward, as it seems logical that while writing and revising in the third interval, the students go back to the sources to select or to check information before including it in their own text. We assume that this result is related to the low scoring writers who struggle with selecting information and integrating it into their own text (Altemeier, Jones, Abbott, & Berninger, 2006; Chan, 2017). Again, this finding might indicate that the process–text quality relation is not a linear one.

From the results obtained, it could be inferred that relating each individual process measure to text quality is not sufficient to capture the complexity of the writing process, even after having taken into account the temporal distribution (i.e., interval). The writing process involves a complex interplay between the various writing activities and the moment at which they occur. Moreover, it does not seem sufficient to capture only linear relations. These insights were crucial to reach the main objective of our study, namely formulating an answer to the third research question: What are effective patterns of source use for each of the two synthesis genres?. By conducting polynomial regression analyses, we were able to take into account the various source-related process activities and their temporal distribution in a more integrated way, instead of looking at each process variable individually. It also allowed us to capture more complex process-product relations by including curvilinear relations. This method resulted in one model for predicting the quality of an argumentative synthesis text, and one model for predicting the quality of an informative synthesis text.

Table 6 presents a visual overview of the pattern related to a high quality argumentative synthesis text with three predictors explaining 24.6% of the variance in text quality. The process of a successful argumentative synthesis text is marked by a considerable amount of time spent in the sources in the beginning of the process. Spending both little and very much time in the sources at the beginning of the process leads to a poor-quality text. This seems plausible, as on the one hand it is necessary to focus on the sources in the first phase of the writing process as it is crucial to read and understand the sources, so spending little time reading is not beneficial (Breetvelt et al., 1994; Escorcia et al., 2017; Leijten et al., 2019). On the other hand, spending too much time in the sources probably indicates problems with the understanding of the source texts (Plakans, 2009). Besides spending a considerable amount of time in the sources, it is also beneficial to switch quite a lot between the sources and the synthesis text in production. In this case, mediation between the sources and the synthesis most probably involves note-taking or drafting while reading. Students writing a good quality argumentative text generally select information from the sources and write this down while reading in the first phase of the process. During the last part of the writing process, spending much time in the sources has a negative effect on text quality. This seems plausible, as in the last phase of the process the focus should be on the writing and revising of the text, not on reading and selecting information from the sources (Breetvelt et al., 1994).

Table 6 Process pattern of a successful argumentative synthesis

Table 7 shows the pattern resulting in a successful informative synthesis text. This pattern consists of three predictors explaining 16.2% of the variance in text quality. Switching between the different sources during the first part of the writing process has a positive impact on the quality of the informative synthesis text. Thus, in the beginning of the writing process, the high performing students do not only read the sources but also compare and contrast them in search of an overarching theme. At the end of the process, spending much time in the sources proved to have a negative effect on text quality. In this stage, the students should focus on text production rather than on reading (Breetvelt et al., 1994). Moreover, the processes leading to a successful informative synthesis show a moderate amount of switching between the sources at the end. Text production is alternated with moments of comparing and contrasting the sources. This process is crucial to link the information from the different sources together and thus to produce a synthesis in which the information from the sources is integrated (Martínez et al., 2015; Mateos & Solé, 2009; Solé et al., 2013). However, switching too much between the sources at the end of the process, has a negative impact on text quality. In the case of excessive switching, the students are most likely struggling to integrate the information and therefore lose focus on production.

Table 7 Process pattern of a successful informative synthesis

The patterns of effective source use clearly differ according to text genre (Beauvais et al., 2011). We assume these differences are inherent to the genre. For example, when writing an argumentative text, students were given a position they had to support. This may imply a more focused reading from the beginning on, in other words, the writer reads with a goal. This is reflected in the first interval of the writing process where—in an effective process—the number of transitions between synthesis and sources per minute is relatively high, implicating that the writer takes notes while reading. The beginning of an effective process of an informative synthesis task is marked by a high number of transitions between the sources per minute. Writing an informative synthesis is less structured at the start, as the writer has no clear objective apart from providing an integrated view on the theme. This implies that during the first phase of the process, the writer has to compare and contrast the sources to look for relevant information and an overarching theme.

Our study has a few limitations. First, although the use of keystroke logging software allows us to capture the writing process in a non-obtrusive way and offers useful and detailed analyses, interpretation of the findings is not that straightforward. Our interpretations of the results are based on the findings present in our data and on empirical evidence from prior studies. However, we cannot be absolutely sure that, for example, the negative correlation between time spent in sources in the first interval and text quality indicates that the writer struggled to read and understand the sources. In some cases, it would be advisable to combine keystroke logging with other research methods. For example, we found that participants switched more between the synthesis text and the sources when writing an argumentative text (compared to when writing an informative text). To provide an explanation for this, video recordings or analyses of case studies could be valuable. This could provide insights into the cognitive processes behind the observed behaviour. Secondly, in this study we focused on source use during the writing process and did not take into account text production during the process. This might be an explanation for the absence of decisive predictors in the middle of the effective process patterns. The middle of the process is identified as the phase in which the generated ideas are to be translated into text (Hayes & Flower, 1980). For a more complete view on the writing process, it is necessary to combine the source use measures with variables capturing text production. Future research could try to identify patterns of effective source use combined with an effective production process. Another possibility for future research is to replicate our study with data containing multiple synthesis texts per participant; this could improve the power of the current findings (van den Bergh et al., 2012; Van Steendam, 2017).

Despite its limitations, our study contributes to the emerging field of synthesis writing studies by providing insights into various aspects of source use at different phases of the writing process for different genres of synthesis tasks. We did not only look at the individual process measures, but presented a more integrated perspective on source use, resulting into two models predicting the quality of the text. Given that synthesis writing is a frequently required task in higher education that poses a challenge to many students, it is crucial to provide them with meaningful feedback. Our findings concerning source use are used to develop process-oriented feedback. When students’ writing processes are logged with Inputlog, we can present them personalised information on the source-related activities of their writing process. Not only is it possible to provide students with personalised process feedback, we can also show them processes of higher scoring students whose processes reflect the successful source use patterns identified in this study. By comparing their own process to the exemplary processes, students get an insight into their own writing process and receive a feed-forward, helping them to improve their synthesis writing. Another option is to add instruction to the feedback. In a follow-up study, we test the effects of modelling synthesis writing strategies, thereby focusing on the source use in the first part of the process. Instruction videos show the importance of spending time reading the sources in the beginning of the process. Instruction is adapted to the synthesis genre. In the case of argumentative synthesis texts, the video model shows the importance of note-taking during writing (and thus switching between the synthesis text and the sources). For the writing process of the informative synthesis texts, the focus of the instruction is on finding the overarching theme by comparing and contrasting the sources (seen as switches between the sources).