Development of vocabulary sophistication across genres in English children’s writing

Durrant, Philip; Brenchley, Mark

doi:10.1007/s11145-018-9932-8

Development of vocabulary sophistication across genres in English children’s writing

Open access
Published: 22 December 2018

Volume 32, pages 1927–1953, (2019)
Cite this article

Download PDF

You have full access to this open access article

Reading and Writing Aims and scope Submit manuscript

Development of vocabulary sophistication across genres in English children’s writing

Download PDF

7696 Accesses
19 Citations
18 Altmetric
1 Mention
Explore all metrics

Abstract

This paper aims to advance our understanding of how children’s use of vocabulary in writing changes as they progress through their school careers. It examines the extent to which a model of lexical sophistication as use of low-frequency, register-appropriate words adequately captures development in vocabulary use across the course of compulsory education in England. We find that the received model needs elaborating in a number of important ways. Specifically: (1) the average frequency of words in the repertoire used by older children is no lower than that of younger children. However, younger children’s writing is characterized by extensive repetition of high frequency verbs and adjectives and of low frequency nouns (the latter being a product of a focus on entities which are rarely discussed in adult writing). The role of repetition in this finding implies that lexical sophistication is inseparable from lexical diversity, a construct which is usually treated as distinct. (2) Younger children’s writing shows a preference for fiction-like vocabulary over academic-like vocabulary. As they mature, children come to make greater use of academic vocabulary in both their literary and non-literary writing, though this increase is greatest in their non-literary writing. Use of fiction vocabulary remains constant across year groups but decreases sharply in non-literary writing, showing an enhanced sense of register appropriateness. This development of register appropriate word use can be captured by relatively simple frequency-based measures that could readily be employed by teachers and researchers to track writers’ development in this aspect of word use.

Reconsidering the Evidence That Systematic Phonics Is More Effective Than Alternative Methods of Reading Instruction

Article Open access 08 January 2020

Jeffrey S. Bowers

Reciprocal relationships among reading and vocabulary over time: a longitudinal study from grade 1 to 5

Article Open access 30 March 2024

Callula Killingly, Linda J. Graham, … Pamela Snow

Quantitative Methodology

Introduction

This paper aims to advance our understanding of how children’s use of vocabulary in writing changes as they progress through their school careers. Specifically, it elaborates on existing models of the features of word use which distinguish the writing of older children from that of younger children. Methodologically, it belongs to a tradition going back to at least the 1930s of studying children’s writing development through quantitative analysis of linguistic features. This approach offers a useful complement to qualitative analyses (e.g., Christie & Derewianka, 2008) in that it enables reliable analysis of large numbers of texts, so allowing patterns to emerge which may not be obvious in smaller samples and supporting robust generalizations. The systematicity required of the approach and the reliance on quantitative analysis to identify patterns also enables a distancing of the analyst from the text which can bring out patterns that may not be obvious to the naked eye.

While the majority of studies in this tradition has focused on syntactic development, the last 15 years have seen growing interest in features of vocabulary (e.g., Crossley, Weston, Sullivan, & McNamara, 2011; Malvern, Richards, Chipere, & Duran, 2004; Massey, Elliott, & Johnson, 2005; Olinghouse & Leaird, 2009). Vocabulary development is particularly well-suited to this type of analysis, both because the units of analysis (words) are more numerous than the units of syntax and because they are more easily identified by automated means, so allowing relatively reliable analysis.

The focus on vocabulary has clear practical importance given the emphasis on this as an aspect of writing development in Anglophone school curricula (Australian Curriculum and Assessment Reporting Authority, 2014; Department for Education, 2014; National Governors Association Center for Best Practices, 2010). It is also especially salient given contemporary concerns about the existence of a “vocabulary gap” that is preventing a significant proportion of students from achieving their full potential (Harley, 2018; Quigley, 2018). Such concerns underline the value of explicit descriptions of vocabulary development, both as a means of clarifying what a “vocabulary gap” might actually entail and for ensuring it is effectively targeted.

Quantitative measures of vocabulary development in children’s writing

Previous work distinguishes three main types of measure of vocabulary development: measures of lexical density, lexical diversity, and lexical sophistication (Read, 2000). Density refers to the proportion of a text which is made up of lexical words (usually defined as verbs, nouns, adjectives and adverbs). This is known to be an important distinguisher of text genres (e.g., Biber, 1988); however, research has shown it to be of little developmental interest (e.g., Berman & Nir, 2010; Uccelli, Dobbs, & Scott, 2013). Diversity refers to the repertoire of different words which a writer uses. This is perhaps the most commonly-used measure of vocabulary development and findings have overwhelmingly supported the conclusion that diversity increases with age (e.g., Berman & Nir, 2010; Crossley et al., 2011; Malvern et al., 2004; Olinghouse & Wilson, 2013; Uccelli et al., 2013).

The literature on lexical sophistication is more wide-ranging and offers fewer clear conclusions. Researchers rarely state exactly what they mean by the term, but Read’s (2000) definition captures most of what it has been construed as covering. For him, sophistication is the “selection of low-frequency words that are appropriate to the topic and style of the writing, rather than just general, everyday vocabulary” (2000, p. 200).

One operationalization of Read’s definition is found in studies which count the proportion of words in a text which are not found on a list of high-frequency vocabulary. Some studies have found this proportion to increase with age (Finn, 1977; Olinghouse & Graham, 2009; Olinghouse & Wilson, 2013; Sun, Zhang, & Scardamalia, 2010), although Malvern et al. (2004) did not find an increase from ages seven to 14, and Lawton (1963) found an increase between 12 and 14 for working class children but not middle-class children. While this method provides an easily-understood measure of sophistication, it is somewhat ‘blunt’ in that each word receives only a binary score: present or missing from the reference list. A great deal of potentially meaningful variation between more-and-less frequent words on both sides of that divide is thereby lost.

Crossley et al. (2011) take a more comprehensive approach by retrieving from a reference corpus a frequency count for each word in a text and taking the mean of these frequencies to define an overall score for the text as a whole. Using this method, they found no significant difference between ninth and eleventh graders, although college writers did exhibit lower averages than school-level writers. Crossley et al.’s approach has the virtue of finer gradation, it suffers from the fact that word frequencies follow a highly skewed distribution. This is likely to be reflected in strongly skewed frequency profiles within each text, implying that mean frequencies may not provide a good summary of the range of vocabulary a particular writer uses. This may be the reason for the lack of a significant difference between school year groups. Another explanation may be that the study deals only with the top of the range of school years—it is possible that measurable development in vocabulary sophistication has levelled off by ninth-grade.

Fewer studies have focused on the second part of Read’s (2000) definition, which refers to appropriateness to the topic and style of writing. Partially relevant here is research which has looked at children’s use of Greek- and -Latin-based words (Corson, 1985; Berman & Nir-Sagiv, 2007) and their use of words taken from the Academic Word List (Sun et al., 2010), both of which were found to increase with age. These studies indicate an overall movement towards greater use of vocabulary which is typical of an academic or ‘learned’ style. However, there is no real attempt to establish whether this shift is appropriate to the different kinds of texts that children are writing or to address vocabulary typical of other topics or styles.

In conclusion, while research on lexical diversity and lexical density point to fairly clear conclusions—the former increases as children mature, the latter does not—work on lexical sophistication is more ambiguous. The model which casts vocabulary sophistication as use of lower frequency, more register-appropriate, words, has strong intuitive appeal but research has not been able to establish that it adequately captures development in children’s writing. Results regarding frequency are inconsistent and hampered by overly-simple binary methods which ignore much of the potential variation between texts. Furthermore, the few studies which can be construed as relating to appropriateness have focused on a single style (characterized by academic and Greco-Latin words) and have not attempted to relate use to the different kinds of texts that children write. The present study aims to move work in this area forward by measuring development in vocabulary sophistication across the course of compulsory education in England and exploring how the existing model might be elaborated to provide a more accurate understanding of children’s vocabulary development.

Methodology

Corpus

This study is based on a new corpus of children’s writing. Texts in the corpus are educationally authentic, in that they were produced as part of children’s regular schoolwork, rather than being elicited for research purposes. Schools from across England were contacted by the project team, briefed as to the nature of the project, and invited to participate. All writing was obtained subject to the students’ voluntary informed consent, with additional consent obtained from the head teacher, the relevant subject teachers, and the students’ legal guardians. The corpus, and related materials, are available for download from the project website https://gigcorpus.com.^{Footnote 1}

We aimed to collect a set of texts that captures the broad range of writing that students are currently producing during the statutory, or key, stages of the English school system. Accordingly, texts were sampled at four points: the ends of Key Stage (KS) 1 (Year 2, when children are 6–7 years old) and KS2 (Year 9, when children are 10–11 years old), encompassing the primary phase of the school system, and the ends of KS3 (Year 9, when children are 13–14 years old) and KS4 (Year 11, when children are 15–16 years old), encompassing the secondary stage. Key stages are intended to constitute coherent educational programmes of learning, with formal assessments undertaken at the end of each. Although the specifics of each stage vary according to both discipline and school, all stages are cued to an overarching ‘national curriculum’ which specifies the “statutory programmes of study and attainment targets for all subjects” (Department for Education, 2014). Collected between September 2015 and December 2017, the present texts were all produced under the version of this curriculum introduced in 2014 (Department for Education, 2014).

Texts were classified into genres on the basis of their overall purpose. Although various schemas were available for this task (e.g., Nesi & Gardner, 2012; Rose & Martin, 2012), following both a review of the texts and extensive discussion with national curriculum specialists at the university where the research was conducted, we decided to use a bespoke classification. This had three benefits. First, it could be efficiently applied to a large number of texts. Second, it could be consistently applied across the three disciplines within the corpus. Third, it could be consistently applied across the four developmental stages within the corpus. The last point was especially valuable, since it allowed texts to be classified in line with their overarching purpose even if the student was not yet able to demonstrate all generic features required by other schemes.

Our classification is based on a two-way distinction between ‘literary’ and ‘non-literary’ tasks. A ‘literary’ text is one which can be evaluated as successful or unsuccessful without considering any kind of propositional or directive relationship to the world. That is, its contents do not need to be judged as either factually accurate or making a persuasive argument in order for the text to be successful. The primary purpose of a literary text is to be appreciated on its own terms as a piece of stylised writing. Within the present corpus, prototypical examples were creative fiction and literary imitations.

‘Non-literary’ texts, on the other hand, do need to bear a propositional or directive relationship to an external world in order to be considered successful. Their primary purpose is to (a) accurately depict a particular state-of-affairs, (b) evaluate a particular state-of-affairs, or (c) argue for a particular state-of-affairs to be the case. Prototypical non-literary texts included autobiographies, historical accounts, complaint letters, literary criticism, experimental reports, and persuasive speeches.

Texts were sampled across three disciplines: English, Science, and the Humanities (i.e. History, Geography, and Religious Studies). As can be seen (Tables 1, 2), this approach did not yield a balanced corpus. Partly this reflects the practical difficulty of accessing Science and Humanities departments. However, it also reflects the general distribution of writing across the curriculum, at least in terms of ‘continuous prose’, which was the intended focus of the corpus. Thus, the predominance of English texts plausibly reflects the marked emphasis of this discipline on the production of continuous prose; the lack of Year 2 Science texts plausibly indicates a tendency of continuous prose to be a later-developing feature of school Science; and the lack of ‘literary’ Humanities and Science texts reflects these disciplines’ emphasis on dealing with the external world (see below for definitions and discussion of our genre categories).

Table 1 Corpus makeup—distribution of texts across year groups, genres and disciplines

Development of vocabulary sophistication across genres in English children’s writing

Abstract

Similar content being viewed by others

Reconsidering the Evidence That Systematic Phonics Is More Effective Than Alternative Methods of Reading Instruction

Reciprocal relationships among reading and vocabulary over time: a longitudinal study from grade 1 to 5

Quantitative Methodology

Introduction

Quantitative measures of vocabulary development in children’s writing

Methodology

Corpus

Reference data

Processing the study corpus

Inferential methods

Analysis

Preliminary analysis: vocabulary diversity across year groups

Frequency profiles

Appropriateness

Discussion and conclusions

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (DOCX 53 kb)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation