The 98 participants of this study were first-year university students on an English Language Education programme at an English-medium university in Turkey. At the outset of the academic year, they were given a participant information sheet, providing information about the research study. The participant information sheet included no specific examples for MWCs. The students gave the researcher informed written consent to use their academic assignments during their first year. The participants also completed a questionnaire requesting information concerning their first language, gender, previous residency in an English-speaking country, proficiency in other languages, and the medium of instruction at their secondary school. The first language of all the participants was Turkish, and they were aged between 17 and 22 years (M = 18.41, SD = 1.16). The majority of the participants (83%, n = 81) were female, and 17% (n = 17) were male. None of the participants had resided in an English-speaking country for more than one month, or possessed advanced language proficiency in another language. For all the participants, the medium of instruction at secondary school was Turkish. Throughout their undergraduate education at the English-medium university, they submitted their assignments in English, except two course units that were in Turkish. The participants had four compulsory course units per semester, and the classroom contact time varied between two and three hours per course unit, per week. They took an ‘Academic Writing’ and ‘Study and Research Skills’ course units in the first and second semester of their first year, respectively, and they read resources on academic writing processes and strategies and analytical academic writing. There was no explicit teaching of MWCs in academic writing class, as reported by the lecturers and students in the interviews (Candarli, 2020). The course units focused on developing essay structure, analysis and synthesis of academic sources, paraphrasing, and citation conventions.
Before commencing their studies, the students were required to pass the university’s English proficiency test with a good score, the equivalent to an overall band of 6.5 in IELTS (Academic), with no less than 6.5 in writing, or TOEFL IBT (at least 79). These minimum scores correspond to a borderline B2/C1 level of the Common European Framework of Reference (CEFR) of Languages (Taylor, 2004). The scores of the students ranged from those that were equivalent to 6.5 to 7.5 in IELTS, M = 6.63, SD = 0.28, and most of the students scored 6.5 in IELTS Academic or in an equivalent test. Hence, the participants can be regarded as advanced L2 learners. Since these students attended secondary schools at which the medium of instruction was Turkish, their first year at an English-medium university represented a transitional stage from secondary school to university.
This study used L1 Turkish university students’ academic assignments submitted for their assessed course work at an English-medium university in Turkey. These assignments were collected from the same participants at three stages during one academic year: The beginning of November (Month 3), the end of January (Month 5), and the beginning of June (Month 9). The assignments, which all received passing grades, at least 50 out of 100 at university, were checked for plagiarism, using ‘Turnitin’ to which the students submitted their assignments. The discipline-specific, un-timed written assignments featured similar topics, including gender differences in education and social media use in education (see S1 in supplementary material for the assignment prompts). The participants were free to consult any reference materials while writing, but no data were collected on this, since source text use was beyond the scope of this study. The students wrote these essays for their assignments in their academic subject rather than for research purposes, which increased the ecological validity of this study. It is worth noting that the researcher was not a lecturer at the university where data were collected and had no control over the topics of the assignments or reading lists of the L2 writers at the university. These assignments can be regarded as ‘analytical exposition’ (Coffin, 1996) essays, which require students to engage with the extant literature, to evaluate and synthesise the arguments therein, and to present their own position. Analytical exposition essays fall within the ‘essay’ genre family in terms of Nesi and Gardner’s (2012) taxonomy of the genres of discipline-specific student writing in UK higher education, and they constitute a hybrid genre of ‘exposition’ and ‘discussion’. As in Li and Schmitt’s (2009) study, the list of references and direct quotations were removed from all the essays.
As Table 1 illustrates, the number of tokens in the L2 writers’ essays increased over time, especially at Month 9 because the suggested word limit was 1500 for the final assignment instead of 500 words at Month 3 and 5. Eight of the students’ essays were absent at Month 9, since they did not submit an essay in June, due to a variety of reasons, including dropping out of university, and mitigating circumstances that enabled submission at a later time.
Input corpus included compulsory readings of the compulsory modules that the participants of this study, one cohort of first-year university students, took during their first year at university. This corpus was built in order to determine whether the frequency and dispersion of MWCs in the reading materials that the students encountered would predict the frequency of MWCs in the L2 writers’ essays (see S3 in supplementary material for the reading resources). Although these academic texts were arguably an estimation of students’ academic reading input and a potential source of target-like academic MWCs, this study does not argue that they constituted the only input for students. The participants took lectures in English, and it is likely that they read English materials and watched television series in English in their free time; however, it is not possible to capture all the input that students were exposed to. Within the usage-based approaches to language learning, input is mainly operationalised in two ways: (1) An L1 corpus (e.g., Ellis & Ferreira-Junior, 2009); (2) Textbooks that students use for their classes (e.g., Bi, 2020). Given that “English neither functions for intranational communication purposes, nor is used for basic communicational goals in Turkish society” (Selvi, 2020, p. 4), the course readings of the L2 writers at university served as the main input in the context of this study (see Bi, 2020). It was also ecologically more valid to determine what kind of input L2 learners were exposed to in non-immersion settings, since ‘‘frequency in a general corpus, even one constructed from second language learner speech, is not necessarily the frequency with which a particular learner experiences the form’’ (Larsen-Freeman, 2015, p. 238).
The reading materials included mostly book chapters and a research article, and the soft copy versions of these resources were obtained as much as possible. When a soft copy was not available, book chapters were scanned, and optical character recognition (OCR) was applied to convert these to plain text files, using the tesseract package (Ooms, 2018) in R (R Core Team, 2019). Any OCR errors were checked and corrected, using Notepad + +, a text editor. All reference lists were removed from each individual file. As seen in Table 2, the participants of this study were assigned to read 74 texts by Month 9 (June) for their compulsory modules, and each reading source in the list (a book chapter or a journal article) was operationalised as a text. There were 22 individual texts by Month 3, 35 individual texts (22 + 13) by Month 5 and 74 texts (22 + 13 + 39) by Month 9, which reflected L2 writers’ cumulative exposure to academic English during their first year, and this corpus was one source of input that was used as a proxy of the L2 writers’ academic reading input. It is worth noting that the students may or may not have referred to these sources at the time of writing their assignments. Due to the laborious nature of scanning book chapters and checking OCR errors, the input corpus only included texts that were assigned as ‘compulsory’ in the reading lists of the participants’ compulsory modules during their first year.
Identification of MWCs
The MWCs in the L2 writers’ essays and in the input corpus were identified using three empirically-derived lists of MWCs (see S2 in supplementary material for the list of MWCs): (a) Biber et al.’s (2004) list of lexical bundles (four-word sequences) that occurred at least 10 times per million words in academic prose; (b) Liu’s (2012) list of the most frequently used MWCs in academic writing (excluding two-word sequences); (c) Simpson-Vlach and Ellis’ (2010) written academic formulas (three- to five-word sequences). Despite different terms (lexical bundles, MWCs, and academic formulas) were used in these studies, they all referred to multi-word sequences that have certain discourse functions in context and occur frequently in academic writing. The three lists were used to identify MWCs in this study for three reasons. First, the corpus of L2 writers’ essays in this study was too small to extract MWCs from the corpus itself, especially at Month 3 and Month 5, since Cortes (2013) argued that a corpus consisting of at least one million words is required to extract lexical bundles from the corpus itself. Second, the MWCs in these lists could minimise topic effects (Yoon, 2016), since they were extracted from large corpora and not topic-bound sequences. Third, the MWCs in the lists served as a proxy for target-like academic MWCs, since the frequency of occurrence and range in academic prose were identification criteria for the MWCs in these lists. In order not to inflate token frequencies, the lists of MWCs were adapted in several ways. First, in the cases of overlaps of MWCs of different lengths featured in the lists, such as ‘as well as’ and ‘as well as the’, only the shorter MWC was counted in the corpora, and longer ones were removed from the lists, except in the case of ‘on the other hand’ which was selected instead of ‘on the other’. When there were partial overlaps of MWCs, such as ‘more likely to’ and ‘is more likely’, the concordance lines were checked to see whether they occurred within the same co-text, and the token frequencies were noted accordingly. For example, when ‘more likely to’ occurred 30 times in the corpus and ‘is more likely’ occurred seven times, the occurrence of ‘is more likely to’ (n = 3) was checked. Then, the frequency of ‘is more likely to’ (n = 3) was subtracted from the frequency of ‘is more likely’ (n = 7) to record the frequency of ‘is more likely’ (see Chen & Baker, 2016). Place names, such as ‘in the United States’ were excluded from the list of MWCs. Lastly, in Liu’s (2012) list of MWCs, the constructions of two words with a schematic representation, such as ‘NP suggest that’ and ‘according to (det + N)’ were excluded. All the MWCs that were compiled from the abovementioned empirically-derived lists were searched in both the L2 writers’ essays and input corpus, using a free corpus tool, #LancsBox version 3.03 (Brezina, McEnery, & Wattam, 2015). Within #LancsBox, the Whelk tool provided frequencies of each MWC for each text. Then, the frequencies of each MWC were recorded on a spreadsheet for each text.
Analysis of MWCs
In terms of analysis, all the MWCs identified in both the L2 writers’ essays and the input corpus were coded structurally, employing the taxonomy of previous studies (Biber et al., 2004; Chen & Baker, 2016), as shown in Table 3. Table 3 also shows the different number of structural types of MWCs that occurred in the L2 writers’ essays over time. Due to the different size of the corpora of the L2 writers’ essays, only the token frequencies of the MWCs were investigated in this study. This also applies to the discoursal categories of MWCs.
The MWCs were also coded according to their discourse functions in both the L2 writers’ essays and in the input corpus. Several taxonomies have been proposed for the discourse functions of MWCs (e.g., Biber et al., 2004; Hyland, 2008). This study employed an adaptation of Biber et al.’s (2004) taxonomy of the discourse functions of lexical bundles for two reasons: First, it is widely used in the literature of academic discourse (Cortes, 2013). Second, Simpson-Vlach and Ellis’ (2010) classification scheme and Liu’s (2012) semantic functional categories of MWCs draw on and show similarities with Biber et al.’s (2004) taxonomy of functional categories. Biber et al. (2004) classified lexical bundles into three main categories: (a) referential expressions, which introduce abstract and concrete entities, and frame propositions; (b) discourse organisers, which signal causative, inferential, and transitive relations in a text; (c) stance expressions, which convey the (un)certainty of the writer, express the writer’s attitudes, and indicate obligations or ability. Biber et al.’s taxonomy (2004) was adapted in two ways. ‘Descriptive’ MWCs (Cortes, 2004) that indicate abstract and concrete entities (e.g., ‘the concept of’) were added to the main category of ‘referential expressions’. ‘Inferential/resultative signals’ were added to the main category of ‘discourse organisers’ to indicate cause-effect relations (e.g., ‘as a result’) in a text (Hyland, 2008).
All the MWCs were coded according to the functional taxonomy presented in Table 4, by examining the concordance lines and wider co-text of each MWC in Word Smith Tools 6.0 (Scott, 2012). Table 4 also shows the different number of discoursal types of MWCs that occurred in the L2 writers’ essays over time. When an MWC possessed multiple discourse functions, the predominant function of each MWC in the data was coded as the functional category (e.g., Chen & Baker, 2016). In order to assess inter-coder agreement, about 25% of the MWCs (n = 59) identified at Month 9 were coded separately by another researcher in applied linguistics, and the Cohen’s kappa value was 0.90, which indicated “almost perfect agreement” according to Landis and Koch’s guidelines (1977). After that, the differences were resolved through discussion. The MWCs that did not fit into categories of structural or discoursal categories of MWCs were coded as ‘others’ and excluded from further analysis.
In addition to the frequency analysis of structural and discoursal categories of MWCs, dispersion measure of MWCs was calculated in the input corpus in order to investigate whether dispersion of MWCs in the reading materials would predict their frequency in the L2 writer’s essays. As a dispersion measure, Gries’ (2008) (normalised) deviance of proportions (DPnorm), which was refined in Lijffijt and Gries (2012), was calculated since DPnorm can handle differently-sized corpus parts and provide a value between 0 and 1, which is easy to compare across studies. Each book chapter or journal article in the reading lists was a corpus part in this study. DPnorm was calculated in the following way (Gries, 2008; Lijffijt & Gries, 2012): (1) the size of each corpus part was computed as percentages of the whole corpus, and expected percentages of a MWC were determined; (2) token frequencies of a MWC within each corpus part were calculated as observed percentages; (3) the absolute pairwise differences between (1) and (2) were computed, summed up and divided by 2. DPnorm can take a value between 0 and 1, which means even and uneven dispersion, respectively. In this study, dispersion was operationalised as the normalised dispersion of MWCs in the input corpus, while frequency was operationalised as the normalised token frequencies of MWCs.
In this study, linear mixed-effects models (LMMs) were employed to analyse changes in the frequencies of MWCs. Mixed/mixed-effects models were preferred over traditional ANOVA, since mixed-effects models quantify both group-level and individual-level patterns within a single analysis, taking into account sources of random variation (e.g., Gries, 2015; Linck & Cunnings, 2015; Murakami, 2016). Mixed-effects models are also robust enough to handle missing data (Linck & Cunnings, 2015).
In order to answer the first research question, individual essays served as the unit of analysis. The frequencies of each main structural category of MWCs (NP-based MWCs, PP-based MWCs, and VP-based MWCs), and each main discoursal category of MWCs (referential expressions, discourse organisers, and stance expressions) were recorded for each essay, and were then normalised per 500 words per text (each text received a normalized, per 500 words, frequency count for each category of MWCs). The recording of the frequencies for each text in a learner corpus would enable generalisations about learners’ language systems (see Durrant & Schmitt, 2009). Two separate LMMs were built to depict the trajectories of the structural categories (model 1) and discoursal categories (model 2) of MWCs. There was not enough data to build models for subcategories of structural and functional categories of MWCs. The dependent variable was the normalised frequency of each category of MWCs for both models (a unique dependent variable for each category of MWCs at each time point in the long data format). Time (months in academic year—3, 5, and 9—categorical variable) was added as a fixed effect. The variables ‘structural_category’ (NP-based MWCs, PP-based MWCs, VP-based MWCs—categorical) and ‘discoursal_category’ (Referential expressions, discourse organisers, stance expressions—categorical) were added as the second fixed effects for the model for the structural categories and discoursal categories of MWCs, respectively. The L2 writers’ English proficiency test scores were also added as the fixed effects variables for both models. The random effects, i.e., those that account for individual variation, were L2 writers with random intercepts and slopes of time and ‘structural_category’ (model 1)/ ‘discoursal_category’ (model 2) and their interactions for writers. All the models in this study were fit with lme4 package version 1.1–21, using lmer function (Bates, Mächler, Bolker, & Walker, 2015) in R version 3.6 (R Core Team, 2019). Then, post-hoc tests, using the Tukey adjustment, were conducted to estimate changes in each category of MWCs across time in lsmeans package (Lenth, 2016). The next section will only present the post hoc tests that estimate longitudinal trajectories of each category MWCs rather than pairwise differences between different categories of MWCs at each time point.
In this study, the text length and the frequencies of MWCs were missing for eight students out of 98 at Month 9 because students had dropped out of university or submitted their essays at a later time. Out of 882 data points in the long data format, only 2.7% of the frequency (n = 24) and 2.7% of (n = 24) text length data points were missing for structural categories and discoursal categories of MWCs, respectively. Since Schafer (1999) argued that missing data points of 5% or less are inconsequential, all the data were included in the models which discarded the missing data points at only Month 9 rather than all the data points of a learner.
Individual MWCs served as the unit of analysis in order to address the second research question. Two separate LMMs were built to determine to what extent time, frequency and dispersion of MWCs in the input data would predict frequencies of the structural categories (model 3) and discoursal categories (model 4) of MWCs in the L2 writers’ essays. The dependent variable was the normalised frequency of each MWC (per 500 words per text) for both models in the L2 writers’ essays. The fixed effects variables were as follows: (1) Normalised frequency of MWCs in the input corpus (per 500 words per corpus); (2) time (months in academic year—3, 5, and 9—categorical variable); (3) DPnorm (normalised DP values for MWCs in the input corpus); (4) Scores of the L2 writers’ English proficiency tests; (5) The variables ‘structural_category’ (NP-based MWCs, PP-based MWCs, VP-based MWCs—categorical) and ‘discoursal_category’ (Referential expressions, discourse organisers, stance expressions—categorical) for the model for the structural categories and discoursal categories of MWCs, respectively. MWCs and L2 writers were included as crossed random effects (see Gries, 2015). The random effects structures at first involved random intercepts and slopes of time and ‘structural_category’ (model 3)/ ‘discoursal_category’ (model 4) and interactions for both MWCs and writers. The random effects structures had to be simplified for both models due to the model convergence issues even with optimisers (see Bates et al., 2015).
For all the four models, optimal random effect structures were selected first, and then optimal fixed effect structures (see Durrant & Brenchley, 2019; Gries, 2015). In order to achieve this, Akaike information criterion (AIC), which provides a relative goodness of fit of different models, was used. The smaller the AIC value, the better the fit the model provides for the data (Murakami, 2016). In terms of model selection, the backward selection heuristic, commencing with the most complex model, with both fixed (all possible fixed effects and their interactions) and random effects that involved maximal random effects structures (random intercepts and slopes for all possible predictors and their interactions—maximal random effects structure for model 1 and 2) was followed (Barr, Levy, Scheepers, & Tily, 2013). The model complexity was reduced until a further reduction indicated a bigger AIC value (Murakami, 2016). P values were derived from the models, using lmerTest package (Kuznetsova, Brockhoff, & Christensen, 2017). The effect sizes were calculated using MuMIn package version 1.43.15 (Bartoń, 2019). Significance of random effects was evaluated, using a parametric bootstrap test with pbkrtest package (Halekoh & Højsgaard, 2014) because parametric bootstrapping could provide more accurate results than the likelihood ratio test (e.g., Bates et al., 2015). Finally, the models met the assumptions of mixed-effects models (see Durrant & Brenchley, 2019) with regard to the normal distribution of residuals and random effects, linear relationship between residuals and predicted values, and homogeneity of residual variance. These were checked via plots with performance package (Lüdecke, Makowski, & Waggoner, 2019) in R. Also, no multicollinearity was found between predictors; and there were no outliers.