Writing a coherent and readable text involves handling many (meta)cognitive activities, such as generating, planning, translating ideas into language and making revisions when needed, while keeping rhetorical goals in mind and staying aware of the intended audience (Flower and Hayes 1980; Hayes and Flower 1980; Hayes 1996). The writing process is not a linear chain of actions in which planning, generating, text production and revising, for example, are carried out in consecutive phases. Rather, it is a recursive process, in which cognitive activities may be re-applied during any phase of the writing process (Hayes 1996). Sometimes a fully developed content plan is subsequently translated into text; in other cases, development of the text plan correlates with the development of the written-down-text. In the latter cases, writing is an act of discovering what to say (Galbraith 1996; Hayes 1996). In short, there are numerous possible configurations of (meta)cognitive activities (Rijlaarsdam and Van den Bergh 1996; Van den Bergh and Rijlaarsdam 2001; Van Weijen et al. 2008).

Torrance et al. (1994) present two extreme writing styles, with different configurations of writing activities. They identified planners, “who planned extensively and then made few revisions” on the one hand, and revisers, “who developed content and structure through extensive revision” on the other hand. In addition, they identified so-called mixed strategy writers, who applied both planning and revising activities extensively. Similar strategies are found in Torrance et al. (1999, 2000), although they are labeled differently, together with a number of additional strategy types. Biggs et al. (1999) present a typification of writing strategies similar to the planner/reviser distinction. On the one hand, engineers plan extensively before commencing with text production. On the other hand, sculptors start text production in a relatively early stage of the writing process, without much planning preceding it. The content plan develops as the text develops. The produced text is subsequently revised until it fits what the writer wants to say. Kieft et al. (2006, 2008), finally, also use these two writing styles in their research. They quote Galbraith and Torrance (2004) to describe the planning strategy as a strategy “in which writers concentrate on working out what they want to say before setting pen to paper, and only start to produce full text once they have worked out what they want to say” and the revising strategy as a strategy “in which writers work out what they want to say in the course of writing and content evolves over a series of drafts” (Kieft et al. 2008, p.380).

Planners and revisers, then, by definition have different configurations of planning activities, text production activities and revision activities. Other cognitive activities might be expected to have different distributions, too. Reading the assignment, for example, probably occurs more early on in the writing process for extreme planners, but in later stages of task execution for typical revisers. After all, typical planners will think about whether the text matches the assignment during planning stages, while typical revisers will think about this once a text has been produced. The moments of occurrence of all of these activities is, of course, interrelated. If planning happens early on in the writing process, for example, revising cannot.

In short, planners and revisers apply the various (meta)cognitive activities at different moments during the writing process. This leads us to observations made by Rijlaarsdam and Van den Bergh (1996) and Van den Bergh and Rijlaarsdam (1996). They demonstrated that the occurrence of (meta)cognitive activities varies across task execution. Structuring activities (a subcomponent of planning), for example, are on average more likely to occur a short while after the start and also towards the end of task execution, but less likely to occur during middle stages of the writing process. They also showed that the distributions of cognitive activities differ between individual writers. Some writers, for example, follow the average distribution of structuring activities, while others tend towards a different distribution. One, for example, in which structuring activities are hardly used at the start of the task, a little more during middle stages, and mostly during the final phases of the writing process. Finally, Rijlaarsdam and Van den Bergh (1996) and Van den Bergh and Rijlaarsdam (1996) demonstrated that the relation between cognitive activities and text quality varies across task execution. Structuring activities, for instance, were shown to be more effective when they occurred during early stages of task execution (i.e. the correlation between structuring and text quality is at its highest at the start of the writing process) and less effective when they occurred towards the end of task execution. They therefore advocated a temporal analysis of activities over the writing process: analyses of cognitive processing during writing should take the moment(s) at which cognitive activities occur into account. The validity of this temporal approach was confirmed by Breetvelt et al. (1994), Van den Bergh and Rijlaarsdam (2001), Van den Bergh et al. (2009) and Van Weijen et al. (2008), who also demonstrated that differences between writers in terms of distributions of cognitive activities explain differences in the quality of the texts produced. The temporal approach of writing processes has become a dominant view in writing research (Leijten and Van Waes 2006; Olive et al. 2008).

The configuration and temporal distribution of (meta)cognitive activities is an online characteristic: we can establish it by measuring what happens during the process of task execution. A common method for measuring the process of task execution (during writing, but also during other tasks, for example reading tasks or mathematic problems) is the use of think aloud techniques (cf. Cromley and Azevedo 2006; Rijlaarsdam and Van den Bergh 1996; Roca de Larios et al. 2008; Van den Bergh and Rijlaarsdam 1996; Van Weijen et al. 2008; Veenman and Spaans 2005). Another method for concurrent measurements of writing processes is keystroke logging (e.g. Leijten and Van Waes 2006; Strömqvist et al. 2006). Online measurements have been shown to have predictive value for the quality of the output of task execution (Cromley and Azevedo 2006; Torrance et al. 1999; Van der Stel and Veenman 2008; Van Weijen et al. 2008; Veenman et al. 2003). This output may be text quality (for writing), but also test scores (in other domains, such as reading and mathematics).

However, there are also numerous studies where writing behaviour is measured independently from the writing process. Questionnaires about different aspects and/or configurations of writing processes (Kieft et al. 2006, 2008; Lavelle et al. 2002; Torrance et al. 1994, 1999, 2000) are an example of such offline measures. The use of offline measurements have been criticised as inaccurate reflections of the underlying process. Russo et al. (1989), for example, found the contents of retrospective protocols to be incomplete and partly fabricated. Their opinion is shared by Veenman et al. (2003) and Cromley and Azevedo (2006). In both studies, offline reports were related to online data, the latter in the form of total or relative frequencies of strategy-related verbalisations in concurrent data (Cromley and Azevedo 2006; Veenman et al. 2003) or by proportions of indicated strategy use in a concurrent multiple-choice tool (Cromley and Azevedo 2006). They found that relations between offline reports and online task execution were weak or absent.

However, analyzing online metacognition by establishing frequencies of metacognitive verbalisations runs counter to the idea that the quality of online task execution is determined by the temporal distribution of (meta)cognition across the writing process. Possibly, this could form an explanation for the absence of (substantial) relations between offline and online data in these studies.

Torrance et al. (1999) indeed showed a correspondence between questionnaire outcomes and online data, the latter being analyzed in terms of distributions. Participants in their study completed a questionnaire about their writing behaviour. On the basis of this questionnaire, participants were categorised into one out of three possible strategy groups. The questionnaire outcomes in this study predicted online writing behaviour. This online behaviour was analyzed in terms of distributions of (meta)cognitive activities, such as planning, translating ideas into language and revising. Torrance et al.’s (1999) study, then, suggests that offline reports of writing behaviourFootnote 1 can, at least to some extent, be used as predictors of a general tendency towards a particular online distribution or configuration of cognitive activities.

A Writing Style Questionnaire developed by Kieft et al. (2006, 2008) measures reported degrees of planner- or reviser-type writing behaviour within individuals. Contrary to Torrance et al. (1999), Kieft et al. (2006, 2008) do not categorize writers, i.e. writers are not either planners or revisers. Rather, the two dimensions (i.e. the planner and reviser dimension) are seen as scales, on both of which the degree to which it applies to an individual writer is expressed. Kieft et al. (2008) provide some evidence which seems to suggest a degree of validity for this questionnaire. They tested students’ writing style by means of the Writing Style Questionnaire. Subsequently, all students participated in a lesson series on writing. One group of students (consisting of both students for whom planning was the dominant writing style and student for whom revising was the dominant writing style) received instruction that matched a planning style and another group of students (again consisting of both students for whom planning was the dominant writing style and student for whom revising was the dominant writing style) received instruction that matched a revising style. They found that study outcomes (i.e. the quality of the texts which the students wrote) increase if writing lessons match the most dominant writing style in students’ responses to the questionnaire. This is, however, indirect evidence for the assumption that the Writing Style Questionnaire is a predictor of online writing behaviour. There has, to date, been no research to test whether higher or lower degrees of reported planner- or reviser-type behaviour do indeed predict different online configurations and temporal distributions of (meta)cognitive activities. This is investigated in the present study. It may be assumed, for example, that ‘high planners’ (according to the Writing Style Questionnaire) perform more planning activities at the start of task execution than ‘low planners’. After all, we know that planning activities are most effective during initial stages of the writing process (Van den Bergh and Rijlaarsdam 2001). Similarly, it may be assumed that ‘high planners’ will apply less planning activities than ‘low planners’ at stages in the writing process during which planning activities are less effective: towards the end of task execution. ‘High revisers’, on the other hand, will generally apply more planning activities at the end of task execution than ‘low revisers’. After all, typical revisers use text production to arrive at a plan of what to say. As a consequence, ‘high revisers’ will also apply more revision activities at the end of task execution than ‘low’ revisers. In the same vein, we predict that different scores on the planning and revising dimension in the Writing Style Questionnaire (Kieft et al. 2006, 2008) are related to different distributions (across task execution) of the other (meta)cognitive activities which occur during writing.

Method

Participants

The participants were fourteen- and fifteen-year-old students (N = 20; 10 female and 10 male). They were from three different third-year-forms at the same school for pre-university secondary education. They were recruited by means of a call for volunteers, which was distributed by their Dutch language teacher. All participants were native speakers of Dutch. They received a small financial compensation for their participation. Parental consent was obtained.

Tools and procedures

The students completed four writing tasks. In addition, they completed an offline questionnaire to measure reported writing behaviour. They performed all tasks individually in a university room, in the presence of a test leader.

Writing tasks

All students wrote four argumentative essays in Dutch, their mother language, on topics such as ‘camera surveillance in inner city areas’ or ‘legalisation of soft drugs’. They completed all their essays during one session, with a short break of about 15 min between assignments. The sequence of topics was systematically balanced across participants.

The assignments consisted of a brief statement of topic, audience (peers), medium (the school paper) and purpose (to convince the readers of your point of view), followed by a series of quotes (factual information as well as opinions) that were related to the topic, of which two had to be used in the essay. All assignments were tested with third year students of pre-university secondary education during a pilot study in 2005. They were also successfully used by Van Weijen et al. (2008, 2009). The essays had to be about half a page in length (which is about 250–300 words). An example of an assignment can be found in Appendix A.

The available time for each essay was 30 min. The mean writing time was 20.13 min (SD = 5.89, Min. = 7.80, Max. = 32.15). The time spent on each task was related to the order in which tasks were completed (χ 2 = 7.23, df = 1, p < 0.01). The mean writing time for the first essay in the session was 21.74 min, while the mean writing time for the last essay was 18.93 min. That students spent less time on the last task in the session than on the first task can probably, for a large part, be explained by the fact that students generally needed less time for reading the assignment during later tasks: a fairly large portion of the instruction text (e.g. description of audience, medium and purpose) was identical in all tasks.

The students wrote the essays on a computer using Microsoft Word. They had to think aloud during the process of task execution. All writing sessions were video-taped. The writing sessions were also recorded by means of keystroke logging (Inputlog: Leijten and Van Waes 2006), in order to obtain more detailed information on text production and revision activities.

Writing style questionnaire

The students also completed Kieft et al.’s (2006, 2008) Writing Style Questionnaire. This questionnaire measures reported degrees of planning and revising style. It is specific to the domain of argumentative writing, in that participants are asked how they would handle writing an argumentative essay about the tobacco industry. This ‘tobacco’ task, which was not actually carried out by the participants, is very similar to the four writing tasks performed by the students in the present study in terms of text type and intended medium.

The questionnaire consisted of thirty-six statements about writing strategy. Thirteen of these items reported planning-type behaviour and twelve of these items reported revising-type behaviour. The remaining eleven items are fillers. Students had to indicate in how far each statement pertained to them, by checking a box on a five-point scale. On the basis of their questionnaire responses, participants received scores for both the planning dimension and the revising dimension. They could therefore score equally high or low on both dimensions, or one of the two dimensions could be dominant. Figure 1 features all questionnaire items, which are sorted according to the dimension they pertain to. In the actual questionnaire, the items were presented in random order. For the present study, the questionnaire was in Dutch, the students’ mother language.

Fig. 1
figure 1figure 1figure 1

Items in the Writing Style Questionnaire (Kieft et al. 2006, 2008), sorted according to which dimension they measure

Analyses

All think aloud data were transcribed and segmented, which were completed by the Inputlog recordings. A new segment in the protocols reflected a switch to a different (meta)cognitive activity within a participant’s writing session (Van den Bergh and Rijlaarsdam 2001).

All segments were coded according to a coding scheme (adapted from Breetvelt et al. 1994). One think aloud protocol (number of segments = 518) was coded by two researchers. The intercoder agreement (Kappa) was 0.85. The coding scheme (see Table 1) consists of fourteen categories. Five of these categories reflect planning activities, namely monitoring, goal setting, generating content, structuring (which involves the selection and evaluation of propositions which have been generated but not (yet) translated into text) and metacomments. One category involved subcodings, namely ‘Revising’. Revisions were subcoded as ‘automated corrections’ when they involved corrections of typographic errors. They were typically errors which are made as a result of the use of a keyboard, which seem to be corrected almost automatically. An example would be: a writer types ‘almots’, and immediately corrects this error by giving to backspaces and typing ‘st’, so that it now reads ‘almost’. All revisions which were not ‘automated corrections’ were subcoded as ‘conceptual revisions’: they involve alterations which are made by the writer in a non-automated way. That is, they involve actual alterations at the level of spelling or content. In the remainder of this article, we will mean conceptual revisions if we use the term ‘revision’ or ‘revising’.

Table 1 Coding categories in the coding scheme

As different degrees of reported planner- or reviser-style behaviour entail different configurations of the complete writing process, we expect different distributions for all of the activities listed in Table 1: reading the assignment, planning, text production, reading own text, evaluating own text, and revision. An exception is formed by the activities categorized as ‘OTHER’. For these activities, there is no conceptual link with planner and reviser styles. The same applies for the subcategory of ‘automated corrections’. These five activities (pauses, interactions with test leader, physical activity, navigation and automated corrections) were therefore not included in the analysis.

Figure 2 shows an example of (a part of) a protocol and illustrates how the think aloud data en the Inputlog data were integrated. All information in the column labelled ‘Typing’ are derived from Inputlog. This column contains ‘text production’ and ‘revising’ activities.

Fig. 2
figure 2

Part of a completed protocol

The protocols consisted of seven columns. The column labeled ‘Reading’ was used for indicating if any reading activities (reading the assignment—RA—or reading own text—ROT) were taking place. The ‘Verbalizations’ column contained everything which was said out loud by the student, except interjections, such as “uhm”. The ‘Typing’ column contained all text production as registered by Inputlog. There were three categories in this column, namely the production of new text, revisions (indicated by Inputlog as [BS] for backspace or [DEL] for if the delete button was pressed), and navigation (by means of mouse movements or arrow buttons). The ‘Pausing’ column contained all silences and interjections. The column labeled ‘Other’ mostly contained descriptions of physical activities, for example ‘takes a sip of his drink’. One row is one protocol segment. As such, this transcription method allows for parallel actions. Text production and verbalizations, for example, often occur simultaneously. The ‘Typing’ column would contain the text production as registered by Inputlog. The codings in the last column were in reality numbers. Code 01, would stand for ‘reading the assignment’, for example, and code 02 would stand for monitoring. English translations of Dutch verbalizations are given in italics.

Due to technical deficits, there were less than four writing sessions available for analysis for three participants. For one participant, two writing sessions were included in the analysis. For two other participants, three writing sessions were included. For the remaining seventeen participants, all four writing sessions were available.

Modelling (meta)cognitive activities across task execution

The first step in the analysis is to model the (online) occurrence of (meta)cognitive activities temporally, that is, as a function of the moment in the writing process. We constructed this time variable by splitting each protocol into five equally long episodes in terms of numbers of segments (cf. Roca de Larios et al. 2008; Van den Bergh and Rijlaarsdam 1996). A protocol of 330 segments, for example, would be analyzed as five episodes consisting of 66 segments each. By using episodes we achieved standardisation: it allowed us to compare different writing processes between (and within) individuals in terms of start, middle and end of task execution. After all, episode 3, for example, reflects the middle part of the writing process for each protocol, no matter if it contains segments 133–199 in a protocol of 330 segments, or segments 101–150 in a protocol of 250 segments. The (meta)cognitive activities which are the dependent variables in our analysis (i.e. reading the assignment, planning, text production, reading own text, evaluating own text, and revision) were all expressed as proportions of the total number of segments for each episode. For instance, if an episode consisted of 80 segments (which would mean that the entire writing process consisted of 400 segments), and 10 of these segments were coded as ‘reading the assignment’, then the proportion for reading the assignment in that episode would be 0.25.

A multilevel regression model has been applied to model the occurrence of the (meta)cognitive activites at each episode, as episodes are nested within writers (Van den Bergh et al. 2009). The analysis was conducted with MLwiN software for multilevel models. In effect, a longitudinal model is in operation, as it concerns changes in occurrence during the writing process: proportions of the applied (meta)cognitive activity may be different during each new episode. Therefore, the occurrence of each of the six online activities (A) had to be described as a function of episode, i.e. A = f(episode). Note, however, that this function f does not need to be identical for all individuals i: A = f i (episode).

This function, f, can take many forms (Goldstein 1979; Healy 1989). For this study, polynomial models were preferred because of their flexibility. Depending on the number of coefficients (and their numerical values), polynomials can take almost any shape. As such, they can be used to model various kinds of growth patterns.

Growth across task execution is not necessarily linear. For instance, text production activities may occur relatively little during first and last episodes of the writing process, but a lot during the middle part of task execution (e.g. during episode 3) Therefore, non-linear terms (e.g. quadratic or cubic terms) can also be included in the model: the occurrence of an activity (at each episode) is described as powers of episode (episode0, episode1, episode2, …). The number of parameters needed to describe the observed activities (in each episode) is considered an empirical matter. That is, a next power of episode is only included in the model if it has a significant contribution in the description of an activity and if all lower powers are significant as well (see, Van den Bergh and Rijlaarsdam 1996). For example, ‘episode’ to the second power can only be added to the model, if the linear term (episode1) has reached significance.

In order to meet the requirement that the function f is allowed to differ between individuals, not only are the regression coefficients of the powers of episode estimated, but also the variance of these parameters. That is, the variance of the intercept (writers differ in occurrence at episode = 0), the variance of the linear component (writers differ in linear change over the writing process), et cetera.

These variance components are in fact the variances of residuals which characterize the occurrence of activities of a specific writer. Therefore, the differences between individuals can be explained by individual characteristics like their offline planner and reviser scores. Adding these offline scores, then, is the final step in the construction of a multilevel regression model, in which episode and the individual planner or reviser scores were the explanatory variables. Of course, the effect of these offline scores is not (necessarily) constant across task execution. (For instance: we expect differences in process execution due to higher planner scores to be larger at the start of task execution than during later parts.) Therefore, interaction effects between the offline scores and the time variable (episode) on the dependent variable were also calculated. The complete multilevel regression model, as construed in MLwiN and as used for explaining the occurrence of each of the six (meta)cognitive activities, can be found in Appendix B.

Results

Internal consistency was calculated for the Writing Style Questionnaire. Cronbach’s alpha is .72 for the items on the planner dimension and .64 for the items on the reviser dimension. These reliabilities, which are similar to the reliabilities found by Kieft et al. (2006, 2008), justify aggregating the items for each dimension to calculate mean scores per dimension per student. As the two dimensions are only moderately correlated (r = .39), planning- and revising-type behaviour can be identified separately in the Writing Style Questionnaire data (see also Kieft et al. 2006, 2008).

Table 2 features descriptive information about the data. It shows that writing processes are on average 346.68 segments long, but that there is great variation (SD = 137.88). In addition, Table 2 shows that text production, which on average takes up about 131 segments (which is about 38% of the segments), occurs far more often than the other activities in the analysis, which on average occur in 0,5% to 5% of the segments.Footnote 2 However, the data show a relatively larger range for the more infrequent activities than for text production. The standard deviations for ‘reading the assignment’, ‘planning’, ‘reading own text’, ‘evaluating own text’ and ‘revising’ are in most cases larger than their means. For these activities then, there seems to be a lot of variation due to episode (i.e. moment during the writing process) and student. This variation is, of course, to be explained by the results of the regression analyses.

Table 2 Descriptive statistics for the Writing Questionnaire scores, and (meta)cognitive activies in the concurrent protocols (absolute numbers and proportions)

The results of the regression analyses show that for the six online activities which were analyzed, the average occurrence indeed varies across task execution, i.e. due to ‘episode’. (See Appendix C for parameter estimates.) The proportion with which reading the assignment, planning, text production, reading own text, evaluating own text and revising are applied, is significantly different if episode is the explanatory variable.

Pseudo R 2 was calculated for the six models which were constructed to explain the occurrence of online activities with the ‘episode’ variables. The outcome is presented in Table 3. Only for one of the activities, R 2 is low (reading own text: R 2 = 0.18). As this is not a crucial activity in our analysis, this is a relatively minor problem. For the other five activities (reading the assignment, planning, text production, evaluating own text and revising), R 2 proved to be satisfactory.

Table 3 R 2 describes the fit of the constructed regression models

Figure 3 shows the average distributions of all six activities. The relation between online activities and episode was analyzed in logits. As logits are hard to interpret, we transformed them into proportions, in order to interpret the results as probabilities of occurrence. In the figures below, then, we present distributions of activities in proportions.

Fig. 3
figure 3

Average distributions across the writing process for six activities: Reading the Assignment; Planning; Text Production; Reading Own Text; Evaluating Own Text; Revision

Figure 3 illustrates that the probability that ‘reading the assignment’ occurs is highest in episode 1, that is, at the start of task execution. Thereafter, its probability of occurrence declines (with slightly different amounts) with every next episode. To put it more simply: reading the assignment happens most often at the start of the writing process (episode 1), and least often at the end of task execution (episode 5). The same pattern applies for planning activities. Text production activities are distributed differently across task execution. They are already quite likely to occur at the start of task execution, with an estimated probability of almost .35. After the start of task execution, the probability increases, reaching its peak at the middle stage of the writing process. After episode 3, there is a slight decrease towards the end of task execution. Text production activities, in short, occur quite frequently across the entire writing process, but are most frequent during the middle part of task execution. Both reading and evaluating own text occur least at the start and most at the end of task execution, although they are still very unlikely to occur there. This is even more the case for revision activities. Although they are very slightly more probable during episode 5, revision activities on average occur infrequently at any moment during writing.

The next question now is if these distributions vary according to offline planner and reviser scores. Table 4 gives an overview of whether significant effects were found. The second and third column show if there is a significant main effect of planner or reviser scores on the number of activities applied. Such an effect would mean that differences in offline scores are related to differences in the number of times that an activity occurs during the entire writing process. The fourth and fifth column show if there are significant interaction effects of planner/reviser and episode. The existence of such an effect would mean that the effect of higher or lower offline scores varies across episodes. It would mean, in other words, that distributions are different for different planner or reviser scores. (See Appendix D for parameter estimates.)

Table 4 Overview of the effects of offline planner and reviser scores and episode on six (meta)cognitive activities measured online

Table 4 shows that there is a main effect of planner scores on four of the six online activities, namely reading the assignment, planning, text production and revising. Significant main effects of reviser scores exist for the activities planning and reading own text. Interaction effects could be established in two cases: 1) for online planning activities, the effect of planner scores is different for various episodes, and 2) for reading the assignment, the effect of reviser scores is different for various episodes. The direction of the effects can be inferred from the regressions weights, and are illustrated in Fig. 4. Again, the logits were transformed into proportions to facilitate the interpretation of the graphs. Figure 4 shows the variations in the distributions of (meta)cognitive activities according to variations in offline planner and reviser scores. Each graph contains three lines: one (P or R) reflecting the distribution of the specific activity for students with average planner or reviser scores, one (+sd) reflecting the distribution for students with a planner or reviser score of one standard deviation above the average score, and one (−sd) reflecting the distribution for students with a planner or reviser score of one standard deviation below the average score. No effects, in any form, were found on the activity ‘evaluating own text’. This activity is therefore not featured in Fig. 4. For text production, reading own text and revising, we observed an effect of either only the planner or only the reviser scores. Therefore, only one graph is displayed for each of these three activities. The graphs marked with + + + illustrate the variations due to episode and planner/reviser scores as observed, although only the main effects (i.e. effects of only the planner or reviser score) were significant.

Fig. 4
figure 4

Observed distributions of (meta)cognitive activities over the writing process for students with average planner (P;left) and reviser (R;right) scores and students with planner/reviser scores which are one standard deviation above (+sd) or under (−sd) the average planner/reviser score. The graphs marked with + + + illustrate the variations due to episode and planner/reviser scores as observed, although only the main effects (i.e. effects of only the planner or reviser score) were significant

Students with a higher planner (+sd) score generally read the assignment on fewer occasions than average. Students with a lower planner (−sd) score read the assignment more frequently than average. Although the difference between high and low planners seems to become smaller as the writing process progresses, there was no significant interaction effect of planner score and episode. In other words, no evidence could be found that the effect of a higher or lower planner score is different for different episodes. We have to assume that it is stable across task execution. The effect of reviser scores on the occurrence of reading the assignment does vary over time, i.e. there is a significant interaction between the variable ‘reviser score’ and the variable ‘episode’. At the start of task execution, students with lower reviser scores (−sd) are (slightly) more likely to read the assignment than students with higher reviser (+sd) scores. From episode 2 onwards, however, this effect is reversed. From that moment on, students with lower reviser scores are less likely to read the assignment than students with higher reviser scores. The difference (in terms of the probability that ‘reading the assignment’ occurs) between students with higher and lower reviser scores grows larger towards the end of task execution.

For planning activities, the effect of higher or lower planner scores varies across task execution. Students with higher planner scores (+sd) are more likely to apply planning activities at the start of task execution than students with lower planner scores. This changes fairly soon after the start of task execution, so that towards the end of task execution, students with higher planner scores are less likely to apply planning activities than students with lower planner scores. The effect or higher or lower reviser scores was not found to vary across task execution for planning activities. An overall effect was established: the higher the reviser score, the more planning occurred in total during the writing process.

For text production, there was a significant main effect of planner scores: the higher the offline planner score, the more text production activities occur. For reading own text, there was a significant main effect of reviser scores: the higher the offline reviser score, the less ‘reading own text’ occurs. For revision activities, the difference between high and low planners seems larger at the start and the end of task execution. This interaction effect was, however, not significant. There was a significant main effect of planner scores on revising: the higher the offline planner scores, the more revising activities occur.

Discussion

We predicted that differences in reported planner and reviser styles as measured by Kieft et al.’s (2006, 2008) Writing Style Questionnaire were related to different distributions of various (meta)cognitive activities over the course of the writing process. Results indicate that the occurrence of six (meta)cognitive activities—reading the assignment, planning, text production, reading own text, evaluating own text and revising—varies across task execution. Activities are more likely to occur during some episodes than during others. Different distributions due to reported writing style were found for two out of the six activities which were analyzed, namely for reading the assignment and planning. For three other activities, namely text production, reading the assignment and revising, the effect of different degrees of reported planner or reviser styles did not vary across task execution, but a main effect of planner and reviser scores was established nonetheless: the higher the offline score, the more (or less) frequent do these activities occur during task execution.

The variation in distributions found for planning activities fits the available theory about the planner style. Students who report a higher degree of planner-type behaviour apply more planning activities at the start of task execution, but less towards the end of the writing process. This is in line with the idea that planners do most of their planning before they write anything down. The variation in distributions found for reading the assignment also fits the available theory about the reviser style. Students who report a higher degree of reviser-type behaviour read the assignment more often towards the end of task execution. This makes sense, because revisers think about a content plan after text production, that is, during later stages of task execution. It follows that typical revisers will also mostly think about the match between the produced text and the assignment during these later stages.

Various explanations come to mind for the fact that different distributions due to differences in reported writing styles could not be established for text production, reading own text and revising (i.e. there were main effects, but no interaction effects), and no effects were found at all for evaluating own text. One explanation is that, except for text production, these are low-frequent activities. The second explanation has to do with the nature of the activities in this specific age group. This seems to pertain particularly to text production, which is a frequent activity. Although its probability of occurrence is different in different episodes, this variation between different moments in the writing process is quite a bit smaller than it is in more developed writers. (Cf. Van Weijen (2009), who used the same assignments in a group of first year university students and found that text production activities were not likely to occur at all at one stage of task execution, but very likely to occur during other stages. Temporal variation was, in short, much larger.) There is, in other words, less variation to explain in the first place.

Cromley and Azevedo (2006) and Veenman et al. (2003) found that offline reports had little or no predictive value for online task execution. In the present study, relations between self reports and online task execution have been established, as is the case in the study by Torrance et al. (1999). There are two main differences between these two sets of studies which may explain the different findings. The first is that in the present study and the study by Torrance et al. (1999), the data are analyzed temporally, which is not the case in the studies by Cromley and Azevedo (2006) and Veenman et al. (2003). However, the absence of a temporal analysis in the former studies cannot be the sole explanation for the absence of relations between offline and online data. First of all, the fact that frequential main effects were found in the present study for text production, reading the assignment and revising, demonstrates that a temporal analysis is not always needed. In addition, Cromley and Azevedo (2006) and Veenman et al. (2003) studied reading tasks, whereas the present study and the study by Torrance et al. (1999) deal with writing processes. The reported writing styles—planner and reviser styles—imply variation in distributions, whereas this is not so much the case for the offline measures used by Cromley and Azevedo (2006) and Veenman et al. (2003). Whereas there is evidence available that the occurrence of cognitive activities can also vary across the reading process (Janssen et al. 2005), this is not what offline measures of reading tasks generally focus on. It follows that the predictive value of these particular offline measures cannot be analyzed temporally.

It is striking that there were fewer effects—namely three—due to reported reviser behaviour than due to reported planner behaviour—namely five. Although it is possible that this is a chance finding, taken together with the fact that the reviser dimension in the Writing Style Questionnaire had lower reliability than the planner dimension (.64 versus .72), this seems to raise some doubts as to the usability of the reviser dimension for less proficient writers, such as the participants in the present study. This idea is supported by the observation that revising, and also reading and evaluating own text, which are associated activities, are extremely low-frequent activities (in this age group). In addition, it might be the case that the definition of revisers is not that clear-cut. It seems that two definitions are simultaneously in operation: one which focuses on the tendency to rely on revision, and one which focuses on how revisers use text production as a means to arrive at a content plan. Actually, the tendency to revise might be a side-effect of revisers’ use of text production to get an idea of what they want to say. After all, their initial text production serves planning purposes and the resulting text is therefore likely to need some work. Possibly, the items in the Writing Style Questionnaire which deal solely with the amount of revision need reconsideration, as these might not be central to the definition of a reviser writing style. An example of such an item would be this statement: “When my text is ready, I elaborately read through it and make improvements: a lot can still be changed at that point”. A Writing Style Questionnaire item which typically represents the part of the definition focusing on using text production to construe a content plan is: “For me, writing is a way to get my thoughts clear”. On the basis of the present results, then, it seems than the planner dimension of the Writing Style Questionnaire can better predict different online configurations than the reviser dimension.

In this study the planning activity was construed of five subcategories (monitoring, goal setting, generating, structuring, metacomments). In future research, however, the validity of the analysis could possibly be increased by modeling the occurrence of these subcomponents separately. Hayes and Nash (1996), for example, distinguish between ‘content planning’ and ‘non-content planning’. Goal setting, generating and structuring might arguably be instances of content planning, whereas monitoring and metacomments are more process-oriented activities and might therefore be seen as instances of non-content planning. Ideally, the relation between reported planner style and online planning activities should be analyzed separately for different types of online planning. This was not possible in the present study, due to the low frequency with which the subcomponents occur.

To conclude, it seems that questionnaires can have predictive value for online task behaviour. Kieft et al.’s (2006, 2008) Writing Style Questionnaire, and particularly its planner scale, seem to be a valid predictor of writing processes. In addition, a temporal analysis of (meta)cognitive activities across task execution seems to be a valid and sensitive reflection of online processing, particularly for writing. Whether a temporal analysis is also suitable for bringing out relations between offline and online measurements for other types of tasks, such as reading and mathematic tasks, is an issue for future research.