Source use and argumentation behavior in L1 and L2 writing: a within-writer comparison

The aim of this study was to test whether Cummins’ Linguistic Interdependence Hypothesis (LIH) might also apply to writing, by determining to what extent writers’ text quality, source use and argumentation behavior are related in L1 and L2, how effective writers’ behavior is and whether their L2 proficiency influenced the relations between them. To answer these questions, twenty students wrote four short argumentative source based essays each in L1 (Dutch) and four in L2 (English). A within-writer cross-linguistic comparison of their texts revealed that their L1 and L2 writing competencies appear to be related. Furthermore, writers’ source use behavior differed to some extent between languages, but the strong positive correlations found between source use features suggest that in most cases this was more a person than a language effect. Similarly, for argumentation behavior, results showed some learner specific features (e.g. inclusion of titles and reference lists), but differences between languages for others (e.g. the inclusion of both arguments and counter-arguments). Effects of the different source use and argumentation features studied on text quality were limited and no clear effect of L2 proficiency on writers’ behavior or their influence on text quality were found. & Daphne van Weijen D.vanWeijen@uva.nl Gert Rijlaarsdam G.C.W.Rijlaarsdam@uva.nl Huub van den Bergh H.vandenBergh@uu.nl 1 Research Institute of Child Development and Education, University of Amsterdam, Nieuwe Achtergracht 127, 1018 WS Amsterdam, The Netherlands 2 University of Antwerp, Antwerp, Belgium 3 Umeå University, Umeå, Sweden 4 Utrecht University, Utrecht, The Netherlands 123 Read Writ (2019) 32:1635–1655 https://doi.org/10.1007/s11145-018-9842-9


Introduction
In higher education, students are often required to write argumentative essays as part of their academic coursework using relevant sources. During argumentative essay writing writers must adhere to strict requirements related to argumentative structure (Coirier, Andriessen, & Chanquoy, 1999) and distribute their attention effectively between what they want to say, how they want to say it, and how to apply their reasoning skills (Kellogg, 2008, p. 2; see also Albrechtsen, 2004;Ferretti & Fan, 2016;Van Wijk, 1999). In addition, selecting and integrating external information effectively in a text, through direct quotations or by paraphrasing, can be a cognitively demanding skill to master (Cumming, Lai, & Cho, 2016;Pecorari & Shaw, 2012;Strobl, 2015), because it requires both advanced reading and writing skills and effective interaction between them (Mateos & Solé, 2009). Thus argumentative writing is clearly ''an intellectually challenging problem'' (Ferretti & Fan, 2016) and as a result it is not surprising that young or inexperienced writers find writing argumentative texts more difficult than writing narrative ones (McCutchen, 2011). An added complicating factor, is that specific characteristics of genres such as argumentative writing, can differ between cultures and languages (cf. Graham & Rijlaarsdam, 2016), although this depends to some degree on the extent to which languages are related.
Inappropriate source use is a common problem in source based writing, which tends to occur more frequently in L2 than L1 writing (Keck, 2014). However, why students use information from sources inappropriately, i.e. without correct attribution to the original author, intentionally or unintentionally, is often unclear. Plagiarism occurs when students intentionally attempt to present the words of other authors as their own (Pecorari & Shaw, 2012). But what if students' behavior is unintentional? Then they may be ''patchwriting'' rather than plagiarizing. Patchwriting occurs when student writers incorporate chunks of verbatim source text in their essays to support their writing, without rephrasing the content in their own words (Howard, 1995;Pecorari, 2003). Some consider this to be lazy or dishonest behavior, which should be considered plagiarism and penalized accordingly. But Howard (1995) proposed that novice writers might also use patchwriting unintentionally to cope with their lack of L2 proficiency or source use skill. If this is true, then teachers should support students through this developmental phase and help them to become better writers.
For argumentative source-based essays, it is clear that, in addition to general writing skills and sufficient language proficiency (particularly in L2), students must acquire two types of domain-specific practices: argumentation or reasoning practices and source use practices (see Ferretti & Fan, 2016; see also De La Paz, Ferretti, Wissinger, Yee, & MacArthur, 2012). Potential ways to help students master these practices include permitting students to translate texts from L1 to L2 (e.g. Wolfersberger, 2003), use their L1 during L2 writing (Van Weijen, Van den Bergh, Rijlaarsdam, & Sanders, 2009), and use patchwriting while learning about appropriate source use (Pecorari, 2003). However, another option worth investigating is whether it is possible to teach students argumentation and source use practices in one language (L1 or L2) in a way which might enable them to apply their new behaviour in other languages or school subjects as well. If so, this could save both students and teachers a lot of time and effort. Whether this option has potential will be investigated in this study, by testing whether Cummins' Linguistic Interdependence Hypothesis applies to writing as well.

Cummins' Linguistic Interdependence Hypothesis (LIH)
To be able to succeed in education and read and write at an advanced level, students must develop their Cognitive Academic Language Proficiency (CALP; .  proposed that all language learners acquire Basic Interpersonal Communication Skills (BICS) informally, while developing CALP requires determined effort and formalized schooling. A person's BICS can be seen, according to Cummins, as the tip of the iceberg, while one's CALP is the less visible submerged part. Subsequently, Cummins proposed that in multilingual learners, the L1 and L2 CALP areas overlap between languages to form a 'Common Underlying Proficiency' (CUP, . In later studies he named this idea the linguistic interdependence hypothesis (cf. Cummins, 2016), which ''assumes that two languages are distinct but are supported by shared concepts and knowledge derived from learning, experience, and the cognitive and language abilities of learners'' (Chuang, Joshi, & Dixon, 2012, p. 98). Nearly four decades after first proposing his hypothesis, Cummins concluded that ''The common underlying proficiency makes possible transfer of concepts, skills, and learning strategies across languages'' (Cummins, 2016, p. 940). However, most studies which investigated the LIH in relation to L1 and L2 literacy, focused on reading rather than on writing (see Dressler & Kamil, 2006, cited in Cummins, 2016; see also Chuang et al., 2012;Van Gelderen, Schoonen, Stoel, De Glopper, & Hulstijn, 2007), although the link between reading and writing (e.g. Fitzgerald & Shanahan, 2000) and the possibility of transfer between languages for writing have been studied to some extent as well. Cumming, Rebuffot, and Ledwell (1989), for example, investigated whether writing expertise might, in part, be a languageindependent cognitive ability, and concluded that writers appear to use ''fundamentally similar'' thinking processes when performing a summarizing task in L1 and L2 (Cumming et al., 1989, pp. 213-214). Berman (1994) found signs of transfer from Icelandic (L1) to English (L2). Furthermore, Schoonen et al. (2003) found that L1 writing proficiency was highly correlated with L2 writing proficiency, while Van  concluded that L1 writing proficiency had an indirect influence on L2 text quality. Finally, Cumming et al. (2016), concluded that it was very difficult to make clear distinctions between L1 and L2 writing, i.e. they are similar in many ways. These findings offer some support for Cummins' LIH hypothesis and thus for a degree of interdependence between L1 and L2 writing. If so, then it might be possible for writers to access and apply writing related (metacognitive) knowledge and practices learnt in one language in other languages as well.
It is important to mention, however, that earlier studies also suggested that language proficiency might influence the potential transfer of writing related practices from L1 to L2 or vice versa. Some stressed that writers can only apply their L1 knowledge in L2 if their L2 proficiency has risen above a certain threshold (Berman, 1994;Schoonen et al., 2003;Wolfersberger, 2003; see also Breuer, 2014, p. 70;Tillema, 2012, p. 79), while Cummins (1979) proposed that L1 knowledge can only be applied in L2 if it is sufficiently developed in the first language. Therefore, it is important to take writers' language proficiency levels into account as well.

Source use and argumentation in academic writing
Earlier research on source based writing in L1 and L2 often focused on L2 writing only (e.g. Plakans & Gebril, 2013;Weigle & Parker, 2012), or on comparing a group of L1 writers to a group of L2 writers (e.g. Keck, 2006Keck, , 2014Shi, 2004), usually with single tasks per writer. However to determine whether source use and argumentation behavior are learner-specific practices, and thus transferrable between languages, requires a within-writer comparison of L1 and L2 writing across multiple tasks. In their recent synthesis on source based academic writing, Cumming et al. (2016) were surprised to find the number of studies based on withinwriter comparisons of L1 and L2 source based writing was rather low, often included small student samples, and only one or two tasks at most (e.g. Kobayashi & Rinnert, 2008). They argued that within-writer cross-language comparisons are ''… needed to disentangle the differential effects of language proficiency and the ability to write from sources'' (Cumming et al., 2016, p. 53). This is necessary because research has shown that writing processes vary within writers, for example due to topic differences (cf. Van Weijen, 2009;Tillema, 2012), which means that to establish L1-L2 effects, multiple tasks must be collected per writer in each language.
Although earlier research has compared the way writers execute cognitive processes such as planning and formulating in L1 and L2 using multiple tasks per language (e.g. Van Weijen, 2009;Tillema, 2012), a comparison of source use and argumentation behavior in the same way has yet to be carried out. Such a comparison makes it possible to (a) compare variation in source use behavior and argumentation behavior not only between languages, but also within writers across multiple tasks, and (b) determine the effects of language proficiency and source use knowledge on source based argumentative writing in both languages. Weigle and Parker (2012), for example, found that ''only a small percentage of students borrowed extensively from source texts'' (p. 118), though ''a few students borrowed substantially more than average, skewing these mean figures'' (p. 124). Similar results were found by Keck (2014). Perhaps, as these studies suggest, some participants show unique source use patterns, different from the rest of the group, because their patterns are to some extent learner-specific. The same might also hold for text structure and argumentation (see Sanders & Schilperoord, 2006).

Aims and predictions
The aim of this study was to test whether Cummins' Linguistic Interdependence Hypothesis (LIH) might also apply to writing, by answering the following questions: First, in line with Cummins' linguistic interdependence hypothesis, and earlier results for reading (Chuang et al., 2012;Van Gelderen et al., 2007) and writing (Schoonen et al., 2003), we predict that we will find significant positive correlations between writers' text quality scores in L1 and L2. Writers who are better writers in L1 are likely to be relatively better writers (compared to others in the group) in L2 as well.
Second, we predict that writers' source use and argumentation behavior will be language independent, i.e. to a large extent learner specific (question 2). For source use, this means for example that some writers may have a clear preference for using larger quotes from the sources, while others may prefer to paraphrase source content in their texts in both languages. Regarding argumentation, we also expect to find similarities rather than differences between writers' argumentation choices (opinion for, against or nuanced, alternating between arguments for or against) and text characteristics (e.g. providing a clear conclusion) between tasks and languages. This means we expect to find significant positive correlations between writers' behavior for these aspects across languages, and/or a lack of significant differences between them. Thus, if we find no significant differences in writers' source use and argumentation behavior between languages and tasks, that would suggest that there might be a common underlying source of writing related knowledge or practices available for writers to use in multiple languages.
Third, we predict that the effectiveness of writers' source use and argumentation behavior (question 3) will be language independent. This means, for example, that if a writer's source use behavior is similar in L1 and L2 then the quality of his or her texts is likely to be similar across languages as well (relative to that of others in the group). However, in line with earlier findings (e.g. Berman, 1994;Cummins, 1979;Schoonen et al., 2003) we predict that the effectiveness of writers' source use and argumentation behavior might be moderated by their language proficiency to some extent (question 4). In other words, if the effectiveness of writers' source use and argumentation behavior is found to differ between languages, this might be due to a lack of L2 proficiency. For example, less proficient L2 writers might be able to integrate sources correctly in their L1 texts, by paraphrasing source content in their own words, while in their L2 texts they may resort to patchwriting, i.e. copying chunks from the source texts without attribution, due to their lack of L2 proficiency. Thus, if we find significant differences in writers' behavior between languages, they might still have a common underlying source of knowledge and practices, but their potential to access to it might be moderated by their language proficiency.

Method Data
Twenty students, all first year Bachelor English Majors (M =18 years old), each wrote four short argumentative essays in their L1 (Dutch) and four in their L2 (English). They all volunteered to take part and were given a small financial reward (50 Euros) for their efforts. The texts analyzed in this study were initially collected for another study . The main focus of that study was an analysis of think-aloud data to determine the role of L1 use during L2 writing. However, students' source use and argumentation behavior have not yet been analyzed or reported on, which is why we considered it appropriate to analyze these texts in the current study.
Each assignment consisted of an issue, such as organ donation or camera surveillance, and six short sources. Participants had to provide their opinion on the issue, convince their readers of their point of view, support their opinion with arguments in a well-structured essay and include at least two of the sources provided in their text in a meaningful way (see Van Weijen et al., 2009 for an example of an assignment and an overview of the issues).
Each student wrote eight texts during four individual sessions, two texts per session, over a 6-8 week period, during their first semester at university. To avoid language and task order effects, the issues were evenly distributed across participants, languages, and sessions to ensure that each participant wrote only once on each issue, and that each task was completed as often in L1 and L2 and across sessions.

Text quality
To answer the first research question, the quality of students' texts was assessed via two procedures, globally and analytically, by two different teams of five raters, one for each language. Texts were first assessed analytically, using a rating scheme which consisted of 18 criteria related to argumentation, content, text structure and conclusions. Raters were asked to rate the extent to which the text matched each of the criteria provided on a 5-point scale, ranging from not at all (1) to meets the set criterion in every way (5). The scheme focused on text characteristics related to argumentation and structure, which might be similar across a writers' texts (cf. Sanders & Schilperoord, 2006).
Second, texts were assessed for global quality two weeks later, by comparing them to an average benchmark essay. For this procedure, raters were also provided with a brief description of the type of global text structure considered acceptable for this type of short essay (for more details on both assessment procedures, see Van Weijen, Van den Bergh, Rijlaarsdam, & Sanders, 2008;Van Weijen et al., 2009). The agreement between raters was satisfactory for the analytical (L1: a = .88; L2: a = .93) and global ratings (L1: a = .82; L2: a = .83). Furthermore, there were strong correlations between the two assessment methods in both L1 (r = .74, r = .87 when corrected for attenuation) and L2 (r = .69; r = .79 when corrected for attenuation). Therefore, we calculated a single quality score per essay in each language by adding up the global and analytical scores. Standardized scores (zscores) per language were used to facilitate comparison across topics within each language. Overall, the average z-scores per topic across eight topics ranged from -.25 to .30 in L1 and -.32 to .31 in L2. Task order did not influence text quality (see Van Weijen et al., 2009). Subsequently, correlations between writers' L1 and L2 text quality scores were calculated as an indication of relatedness.

Source use behavior
WCopyFind, a plagiarism detection tool, was used to determine the extent to which students integrated information from the sources in their essays, and whether this varied between tasks and languages (Bloomfield, 2011;question 2). To find indications of copying from sources in the students' texts, the program filtered out all chunks of four source words or more copied verbatim from the sources. The chosen threshold was in line with earlier studies in which all meaningful chunks had at least four words (Shi, 2004;Weigle & Parker, 2012).
The program produced a comparison between each student's text and the corresponding assignment and sources. Two examples of such comparisons are presented in Fig. 1A, B. In both cases the student's text is shown on the left and the assignment text on the right. The chunks of at least four or more words which were found in both the student's text and the assignment are underlined in both panels. Figure 1A shows a text written by a student who used six short chunks in her text, while Fig. 1B shows a text written by another student on the same assignment who incorporated three longer extracts in her text.
Six variables were used to indicate the extent to which writers' source use varied between languages and tasks. First, we checked whether writers adhered to the source use requirement of at least 2 sources, by calculating the number of unique sources each writer included on average per text (see De La Paz et al., 2012). Second, we determined how heavily students relied on literal copying from the sources, by calculating two related measures: (a) the percentage of ''shared words'' (exact copies) from sources in each text (Keck, 2014), i.e. the percentage of the student's text which consists of source text words, taking text length into account; and (b) the number of chunks of four or more words writers copied from sources, identified by WCopyFind, a measure used in previous research as well (Shi, 2004;Weigle & Parker, 2012, see also Keck, 2014). Third, we determined to what extent writers integrated the copied words in their texts, operationalized as writers' average chunk length (number of words per chunk) per text, calculated by dividing the number of copied words by the number of chunks. The longer the average chunk Fig. 1 A Example of a student's text (left) including six shorter chunks copied from the task's sources (right) marked in bold and underlined. B Example of a student's text (left) including several long chunks copied from the task's sources (right) marked in bold and underlined length, the higher the chance that writers used only a few longer chunks/direct quotes. Finally, we determined whether the chunks writers copied were mainly exact chunks or examples of patchwriting. Therefore, we calculated (a) the number of exact copied chunks per text, defined as the number of chunks containing full sentences or clauses (see Fig. 1A); and (b) the number of patchwriting chunks per text, defined as the number of chunks containing incomplete clauses or short phrases (see Fig. 1B). The exact copied and patchwriting chunks were coded manually from the WCopyFind data. For each of these six measures we calculated the mean per student for L1 and L2, as well as the standard deviation across four tasks within L1 and L2, to determine the variation between tasks in each language (question 2).

Argumentation behavior
Students' argumentation behavior was compared across tasks and languages by examining a number of features of argumentative essays for each text, to determine whether they were stable within writers between languages (question 3). We chose to focus on two types of text features (1) four basic text features included in the analytical rating scheme, and (2) three features related to writers' use of sources containing arguments for or against the issue at hand.
First, we determined whether each essay contained four basic text features: a title, the writer's opinion, a clear conclusion, and a reference list, and checked whether writers consistently included these elements in their eight different texts. In addition, we checked whether the opinion and conclusion provided were clearly for the statement, against it, or contained a nuanced view (weighing both sides).
Second, we focused on how writers used the six different sources provided in each assignment to support their argumentation in their texts. There were three types of sources: sources containing only arguments for the issue, sources with only arguments against the issue, or mixed sources containing arguments which could be used to support both standpoints. Thus based on the sources a writer chose to incorporate in his or her text we could determine whether the direction of the sources used supported the writers' opinion provided in each text. We did this by checking whether (a) the direction of the first source a writer used in the text matched his opinion, (b) the direction of the last source used in the text matched the writer's opinion, and (c) how many times the direction of the sources used switched in the text. For example, one text might include only three sources containing arguments for the issue, which means there would be no switches (for-for-for), while another text might start with a source for the issue, followed by a source against the issue, and conclude with another source containing an argument for the issue, which would be counted as 2 switches (for-against-for). Thus the number of switches in source argument direction can be seen as an indication that writers used both arguments and counter-arguments in their texts, and thus as an indication of argumentation complexity. The higher the number of switches in source direction in a text, the higher the chance that the text contains arguments and counterarguments, and thus a more complex argumentation structure.
Finally, we wanted to determine how writers incorporated the content of each source they used in their texts. As mentioned in the section on source use above, the WCopyFind data were coded to determine how many exact copies or patchwriting chunks writers used per text, regardless of the source(s) they came from. However, the extent to which writers incorporated each source's content in their own words, using paraphrasing, was unclear. To check this, we manually coded for each source whether writers included its content in their texts using exact copies, patchwriting, paraphrasing or a mix of these three.
All source use and argumentation text features were coded by the first author, while a second trained coder coded all texts for one of the eight assignments in L1 and L2 as well (n = 20). The intercoder reliability was sufficient for all these measures (Cohen's Kappa median = .67, mode = 1.00, range = .41-1.00), except for determining source use language for the first source writers included in the text for which Cohen's Kappa was lower (.38) but percentage agreement was sufficient (.60). The intraclass correlation between coders was also found to be satisfactory for the number of unique sources included in each text (icc = .77). The first and second coders discussed the differences in their coding choices, after which the first coder coded the features of all the remaining texts in the same way.

L2 proficiency
Students' L2 proficiency was measured using three different instruments, to determine whether their proficiency influenced the effectiveness of their source use and argumentation behavior (question 4). First, students were given a timed English vocabulary test, which consisted of 64 English sentences, ranked in order of difficulty, in which a missing word, provided in Dutch, had to be filled in (see Van Weijen et al., 2009 for more details). The test's internal consistency was satisfactory (a = .81). On average students answered 77% of the items correctly (M = 46.25 points, SD = 6.77).
The second instrument, was an L2 revision test, in which students were asked to improve the quality of a section of text in a flyer aimed at recruiting high school girls for team sports. The text contained 20 spelling and grammatical errors and points were awarded per error spotted and per successful correction. The task was based on a revision task developed by Hayes, Flower, Schriver, Stratman, and Carey (1985) and adapted for use in Dutch by Broekkamp and Van den Bergh (1996). On average students successfully spotted and corrected 72.13% of the errors in the text (mean score = 28.85, SD = 6.23).
The third and final instrument used to measure students' L2 proficiency was their score on the Written English examination they had to take at the end of the semester. The exam consisted of 120 multiple choice questions on the following topics: Writing (Spelling, Style, Punctuation, Syntagmatics; 30 questions), Argumentation (30 questions), Grammar (30 questions) and Vocabulary (90 questions). On average, students answered two-thirds of the questions correctly (mean score = 6.77 on a 10 point scale, SD = 1.85).
A reliability analysis confirmed that there were sufficient grounds for combining students' scores for these three instruments into a single score which represented their L2 proficiency (a = .64). Therefore the mean score across these three instruments was calculated for each student and then converted to standardized scores for use in further analyses (range: 1.50-2.01).

Analyses
Each writer in this study wrote eight tasks, four in L1 and four in L2, which means the tasks were nested within writers. Therefore, to answer research questions 2, 3 and 4, we applied multilevel analyses via SPSS mixed models where possible, with topics and subjects as random factors. To assess relations between writers' source use and argumentation behavior, we implemented a model with language (L1, L2) as fixed factor. For nominal variables we ran Chi square tests to determine whether they differed between languages or not. To answer the third and fourth research questions we set up models with text quality as dependent variable, two factorslanguage and the explanatory source or argumentation variable-and L2-proficiency as covariate, and included the two way interaction between language and the explanatory variable, and the three way interaction between language, L2proficiency and the explanatory variable.

Text quality and length
To answer our first question, we calculated the mean z-scores per topic in each language and correlations between writers' mean text quality scores and text length within and between languages. Results revealed a positive correlation between text quality in L1 and L2 (r = .73). Second, the variation in text quality across four tasks (SD of mean text quality per person) was calculated as a measure of stability of text quality per language. A cross-linguistic comparison revealed a negative correlation between writers' average L1 text quality and the variation in their L2 text quality scores (r = -.64). Third, an analysis of text length revealed a strong positive correlation between writers' average text length in both languages (r = .90), while further analysis confirmed that students' texts did not differ in length, on average, between languages (F(1, 157) = .089, p = .77; L1: M =292.39 words, SD = 79.03; L2: M = 296.27 words, SD = 84.54). Finally, multilevel analysis revealed longer texts were generally rated with higher scores (F(1, 147.60) = 41.28, p \ .001), but there was no significant difference between languages (F(1, 137.10) = 1.88, p = .17).

Source use behavior in L1 and L2
The second question was to what extent writers' source use and argumentation behavior were related between languages. The WCopyFind output was analyzed to determine the variation within and across languages for six source use measures (see Table 1). First, we carried out multilevel analyses to determine whether the extent to which writers used these features in their texts differed in L1 and L2 (see Table 1).
Results revealed that the number of sources included, the percentage of words and chunks copied from sources, and the number of patchwriting chunks per text, differed between languages, while writers' average chunk length, and the number of exact copied chunks included per text did not (see Table 1). On average, writers included slightly more sources in L1 than in L2, although they included at least two source extracts per text in both languages (range: 0-5), in line with the minimum required by the assignment. Furthermore, writers included a higher percentage of copied words, more copied chunks, and more patchwriting chunks on average in their texts in L2 than in L1.
Second, to gain insight in the relations between features within and between languages and the extent to which these relations were stable across languages, correlations for these measures within and between languages were generated to check for relations between them (see Appendix Table A1 for an overview). Within languages, there appear to be stronger relations between features in L2 than in L1. In L1, each source use feature correlated positively with at least two other features (r range = .59-.81). The only negative correlation in L1 was found between the number of unique sources writers used and mean chunk length (r = -.53); texts with fewer unique sources tend to contain longer chunks. We also found positive correlations in L1 between the number of unique sources per text, the number of chunks, and the number of patchwriting chunks per text; texts with more unique sources are likely to include more patchwriting chunks. In L2, each feature was positively correlated with at least four others (r range = .51-.86), except for the number of unique sources used, which correlated only with mean number of chunks and mean number of patchwriting chunks, as was the case in L1. Finally, a cross-language comparison revealed that all of the features were related to their counterparts across languages (r range = .48-.85), except for the number of unique sources used and patchwriting chunks, for which no significant L1-L2 correlations were found.

Argumentation behavior in L1 and L2
Results of the argumentation behavior analysis revealed that writers' use of general text features such as titles and reference lists did not differ between languages (see Table 2). On average, writers included titles in the majority of their texts, but were much less likely to provide reference lists. However, a significant difference between languages was found for the inclusion of clear conclusions in texts, which appeared more often on average in L2 texts than in those written in L1.
Next, we checked whether each text contained the writer's opinion, and whether the opinion and conclusion provided were clearly for the statement, against it, or provided a nuanced view (weighing both sides). An overview of choices for text features related to writers' opinions in L1 and L2 is provided in Table 3.
Results of the comparison between languages for these features revealed no significant differences in writers' chosen opinion, or their choices for source direction between languages, i.e. only for, only against or a mix of both. Furthermore, we investigated whether the direction of the sources used supported the writers' opinion provided in each text, and found that writers' L1 texts were more likely to contain switches in source direction than their L2 texts (see Table 4).  123 Finally, we checked how writers chose to incorporate the content of each source in their texts: using exact copies, patchwriting, paraphrasing or a mix of these three, and found no significant differences between languages (v2 = 5.68, df = 3, p = .13; see Table 5).

L2 proficiency, source use behavior and text quality
In the last step of the analysis we investigated to what extent writers' source use and argumentation behavior was related to text quality (question 3) and to what extent these relations were moderated by L2 proficiency (question 4). Multilevel analyses were carried out to determine the influence of source use text features and language (L1 or L2) on text quality. Results of the analysis are summarized in Table 6 below. Results of the analysis revealed that average chunk length is positively related to text quality (F(1, 153.331) = 5.596, p = .017), as is the number of exact copied chunks (F(1, 152.515) = 8.629, p = .004), although these relations did not differ significantly between languages. Thus, writers who used longer and more exact chunks were likely to write texts of better quality, regardless of the language written in. No significant differences between languages were found for the other features we studied (see Table 6). Subsequently, multilevel analyses revealed that writers' L2 proficiency level did not influence or moderate the relations between writers' source use behavior and text quality for any of the variables studied, as no significant interactions were found (see Table 6).

L2 proficiency, argumentation behavior and text quality
In line with the source use analysis we used multilevel analyses to determine whether writers' use of these argumentative text features was related to the quality of students' texts across languages (see Table 7).
Positive effects were found for providing a title (F(1, 154.357) = 6.125, p = .014), a clear conclusion (F(1, 143.133) = 11.301, p \ .001) and providing a reference list (F(1, 136.721) = 4.784.931, p \ .001) on text quality, but this effect only differed between languages for the last feature. Providing a reference list was positively related to text quality in L1 (r = .34, p = .002), but not in L2 (r = .02, p = .84). No significant relations were found between the direction of sources used and text quality and writers' choices for integrating sources in their texts and text quality between languages.
Finally we checked whether writers' L2 proficiency level influenced these relations in any way. Unfortunately, besides an effect of L2 proficiency on the relations between writers' method for source content integration in their texts (F(1, 133.647) = 6.319, p = .013), no other influence of proficiency on writers' behavior was found. 123

Conclusion and discussion
The aim of this study was to test whether Cummins' Linguistic Interdependence Hypothesis (LIH) might also apply to writing, by examining the extent to which writers' text quality, source use and argumentation behavior differed between languages and whether their L2 proficiency influenced the relations between them. First, in line with Cummins' LIH, and earlier results for reading (Chuang et al., 2012;Van Gelderen et al., 2007) and writing (Schoonen et al., 2003), we found positive correlations between text quality and text length between languages as well. As these findings are based on four tasks per writer per language, this enables us to draw conclusions about their writing expertise in both languages. Thus better L1 writers seem to be better writers in L2 as well, and those who tend to write longer texts in their L1 are likely to write longer texts in their L2. However, there are also indications that writers' L1 writing competency influences their L2 writing expertise to some extent: writers' L2 text quality appears to be less stable between tasks for weaker L1 writers than for better ones.
Second, we predicted that writers' source use and argumentation behavior would be language independent, i.e. to a large extent learner specific. Multilevel analyses revealed that writers source use behavior differed significantly between languages for four of the six variables studied. On average, they included slightly more unique sources, a higher percentage of copied words, more copied chunks, and more patchwriting chunks in their texts in L2 than in L1, which suggests that they relied more strongly on source content in their L2 writing than in L1.
Further analyses revealed all of the source use features were strongly related to their counterparts across languages, except for the number of unique sources used and patchwriting chunks, for which no significant L1-L2 correlations were found. Writers who tend to copy more words from sources, in longer and more exact chunks in L1, are also likely to do so in L2, which is in line with findings from earlier between-writer comparisons (Keck, 2014;Weigle & Parker, 2012). Furthermore, because writers' use of patchwriting chunks was not correlated between languages, this suggests it might be influenced by L2 proficiency to some extent. Thus, although we found a language effect for four out of six variables, the strong positive correlations found between source use features across languages suggest that in most cases this was more a person than a task effect.
A cross-language comparison of writers' argumentation behavior revealed that whether writers chose to include titles or reference lists in their texts was similar between tasks across languages and thus largely learner specific. For example, only two of the writers in our study included a rudimentary reference list in every single text they wrote, while the others did not. But writers tended to include clearer conclusions in their texts on average in L2 than in L1, which suggests they might be inclined to draw more straightforward or less nuanced conclusions in L2. Moreover, we found that writers were more likely to incorporate both sources for and against the issue at hand and switch between the two points of view in their texts in L1 than in L2. This suggests that when writing in L2, writers were more likely to revert to a more straightforward canonical argumentation structure, in which they only provide arguments from sources which clearly support their chosen opinion, rather than including and refuting counter-arguments as well (cf. Sanders & Schilperoord, 2006). Overall, these results provide a first indication that writers' source use and argumentation behavior might be quite similar between languages and that, as predicted by Cummins' LIH, there might be a common underlying source of writing related knowledge or practices available for writers to use in multiple languages (Cummins, , 2016. Third, we predicted that the effectiveness of writers' source use and argumentation behavior (question 3) would be language independent as well. Results of the analysis revealed that only writers' average chunk length and the number of exact copied chunks used were positively related to text quality and these relations did not differ significantly between languages. This suggests that writers' chunk length and use of exact chunks were effective in both L1 and L2. This might be due in part to the positive correlation between text length and text quality, as including longer exact chunks often results in longer texts. No direct relations or differences between languages were found between the other source use features and text quality. We were somewhat surprised to find no effect of the number of unique sources use on text quality, as earlier research by De La Paz et al. (2012) found that better writers tended to use more sources in their texts, although this might be due to the fact that students in their study wrote argumentative historical essays for which source use can be far more complex.
Similarly, for argumentation behavior, we only found positive effects on text quality for the inclusion of titles and conclusions, which is in line with De La Paz et al. (2012) who also found a small positive effect for the inclusion of conclusions. Texts with titles and clear conclusions were judged to be higher in quality than those lacking these features, regardless of the language written in, while inclusion of a reference list only had a positive effect on text quality in L1, not in L2. Surprisingly, the inclusion of sources including both arguments and counter-arguments did not appear to influence text quality directly, which might be due to the influence of L2 proficiency on these relations, as proposed in earlier studies (e.g. Berman, 1994;Cummins, 1979;Schoonen et al., 2003).
Therefore, in the final step of the analysis we checked whether writers' L2 proficiency level influenced their source use and argumentation behavior and their effectiveness in any way. For source use, we found that L2 proficiency did not influence or moderate any of the relations between source use behavior and text quality studied. However, there was a trend visible for the number of unique sources included, which suggests further research on the influence of L2 proficiency on source use behavior might be warranted. Finally, results for argumentation behavior revealed no clear effect of writers' L2 proficiency on any of the source use or argumentation behavior variables studied, except on the method for integrating source content.
When taken together, all these findings seem to provide additional support for Cummins' LIH hypothesis. Although they are somewhat tentative, our findings, based on a within-writer study in L1 and L2 using multiple tasks, appear to confirm those of earlier studies (Berman, 1994;Cumming et al., 2016;Cummins, 1979;Keck, 2014;Kobayashi & Rinnert, 2008;Schoonen et al., 2003;Weigle & Parker, 2012) and thus provide some additional support for the idea that writers might have a common underlying source of writing related knowledge and practices, such as source use and argumentation behavior, which they can apply in multiple languages. However, the role of L2 proficiency in this process remains unclear and requires further investigation.

Limitations and suggestions for further research
A clear strength of the current study was the fact that we carried out a within-writer cross-language comparison, as proposed by Cumming et al. (2016) with eight texts per writer, four in each language. However, as in any study, the present study also has a number of limitations. First of all, the absence of an L1 proficiency measure, besides L1 text quality, made it impossible to investigate the influence of L1 proficiency on differences between writers' behavior across languages to a greater extent.
Second, while the sample size in our study was reasonable and the within-writer comparison we carried out was novel, the fact that we used eight very similar tasks was perhaps less well chosen: our findings are bound to the specific type of argumentative task we used. If we had chosen to use two different types of argumentative tasks, or tasks with two different contexts, but had still found similarities in writers' behavior between tasks, that would have provided stronger support for the validity of Cummins' LIH.
Third, the fact that we chose to compare two closely related languages, Dutch and English, which share the same orthography and similar basic argumentative structures, is also a possible limitation of our study. If we had chosen to compare two very different languages, such as English and Chinese, then it is likely our results would have been somewhat different, although we would be surprised to find huge differences, because earlier studies for reading (Chuang et al., 2012) and writing (Kobayashi & Rinnert, 2008) found strong relations between languages with different orthographies as well.
Fourth, writers' argumentation behavior is multifaceted and we were not able to cover all its aspects in this study. Therefore, in a future study, we would like to investigate writers' opinions, the number and type of arguments they integrated in their texts and whether these differed between languages in more detail. Finally, although our conclusions are somewhat tentative, we do still think that it might be possible for students to learn writing related practices in one language in such a way that they can apply them in a second language as well. Therefore, future research might focus on intervention studies comparing students' writing behavior in bilingual versus monolingual schools, which could generate pedagogical implications to help students improve the effectiveness of their source based writing in multiple languages at the same time.