Participants
We sampled first grade students from eight public schools (one class in each school) in the Western part of Norway. Of the 143 students in these classes three students were absent for testing. For our main analysis we omitted students who wrote fewer than four words for one or both of the handwritten or typed narrative writing tasks. We adopted a four-word threshold for two reasons. As part of the writing prompt students were given three words to use in their narratives. Writing four words indicated that students had added at least one word on their own. Four words is also sufficient to form a minimal narrative that fulfilled the writing brief (e.g. Greina knakk. Jenta datt. ‘The branch broke. The girl fell’; Isen datt. Pusen smiler. ‘The ice cream fell. The cat smiles’.). Our final sample consisted of 102 students, with a mean age of 6 years, 2 months (SD = 3.5 months) at the first data collection point. Data collection was carried out between September and November 2018. The ethical oversight agency in Norway, Norwegian Centre for Research Data, has approved the study, which is part of the DigiHand project (Gamlem et al., 2020), and it follows the ethical guidelines provided by the National Committee for Research Ethics in the Social Sciences and Humanities.
Educational context
Before starting school 97% of Norwegian three- to five-year-old children go to kindergarten (Norwegian Directorate for Education & Training, 2018). A survey completed by parents of children in our sample (88% response) indicated that all children had attended. Kindergarten does not however include any formal literacy instruction. Norwegian children start school in August the calendar year of their 6th birthday, and this is when formal teaching of letters starts. Although they have not had any formal instruction, most students can recognise and name a few letters when they start school (Sigmundsson et al., 2017). All of the students in our sample received literacy instruction using digital tablets in parallel with handwriting instruction, and all students had a personal digital tablet provided by the school. The typing on this tablet was done by fingers on an electronic touchscreen keyboard where the font was lower case by default. In the survey completed by the parents, only nine parents reported that their child did not have access to a digital tablet at home.
We surveyed all teachers in participating classes about their writing instruction in a questionnaire. In all classes, students learned between two and three letters a week. Students took part in activities that involved writing letters by hand and finding letters on the tablet. Teachers reported that students wrote and drew short texts both by hand and by typing. Thus, the students were familiar with both handwriting and typewriting.
Written composition assessment
Writing tasks were teacher-administered, following instructions provided by the research team three months after the start of school. All classes completed the two writing tasks within the same week, and in two consecutive days. The students were introduced to a teddy bear who loves stories and who would be the audience for the students’ texts. Students were then briefly introduced to the story genre. Examples of narratives were mentioned, and the students were given the following explanation: A narrative is a story about something happening, it can be something exciting, scary, sad or funny. They were then asked to write a story to a picture, answering the question: What has happened, and what will happen next? Two pictures were used as tasks: One picture showed a boy about to drop his ice cream, and the other picture showed a girl about to fall down from a tree. Students were given three important words corresponding to the pictures (is ‘ice’, gut ‘boy’, pus ‘cat’, and jente ‘girl’, tre ‘tree’, ball ‘ball’). Tasks and modality were counterbalanced across classes. The students were allowed to spend 45 min on the task (including the introduction part). Students who finished their composition earlier, were instructed to read quietly in a book. As our aim was to investigate modality effects on measures of text quality, it was important to make sure that all students were given enough time to complete their composition. In order to support the students in the writing process, the teachers were instructed to encourage the students to do their best, but not to help them with for example spelling or punctuation. When writing digitally the students had the possibility to use speech synthesis where they could listen to the sounds, words, and sentences corresponding to what they wrote.
The handwritten texts were first transcribed according to a transcription manual (see Appendix 1 for example of the transcriptions). Inverted letters were corrected as long as it was clear what letter was intended (<b>/<d> substitutions were not corrected). If any characters were hard to identify a second rater was consulted. Spaces between words in the handwritten texts had to be bigger than the distance between the characters within the words to be recognised as space. All verbal text, including numbers, was transcribed, while drawings in the handwritten texts were kept out of the transcriptions and analyses. Similarly, graphical illustrations like pictures and emoticons in the digital texts were excluded.
After all texts were digitalized, texts were scored to give the following measures.
Text length
Text length was measured by counting the number of words written by the child. A word was defined as a character string which represented a phonologically plausible spelling of a Norwegian word that children might plausibly know. If a character string represented two or more plausible Norwegian words, spaces were inserted. Spaces were only inserted to create maximally long words. Character strings that were not possible to identify as a word, were coded as non-words and excluded in the text length measure.
Spacing accuracy
Spacing accuracy reflects the ability to produce text that is orthographically segmented into discrete words. Space use was counted and categorized as correct spaces, and incorrect spaces (missing spaces and overgeneralized spaces [separation of simplexes or compounds]). Punctuation was accepted as correct segmentation. Space use accuracy was scored as proportion of spaces used correctly.
Punctuation (correct use of sentence terminators)
All sentences terminators (period, question mark, exclamation mark, colon) were counted: correctly inserted terminator after sentence, and incorrectly terminators (wrongly inserted terminators, e.g. in the middle of a sentence, and missing terminators after sentence). The measure was made into a binominal variable: more correct terminators than incorrect ones, or the same number of right and wrong and more wrong than correct terminators. Of the 204 texts there were 57 texts that used one or more terminators, and of these only 14 texts had more correct than incorrect use of terminators. From this we conclude that the students in our sample had not yet learned writing conventions related to terminators. We therefore did not include this measure as an outcome variable in out analyses.
Spelling accuracy
Spelling accuracy was operationalised as the total number of correctly spelled words. Spelling is understood as a correct character string. Separation of compounds was not regarded a spelling error (rather it was measured as failure to segment correctly, cf. space use accuracy). The texts were corrected to one of the two written standards of Norwegian (Bokmål or Nynorsk) according to what would give the least number of errors.
Vocabulary sophistication
All lexical lemmas from the 280 texts were extracted, in total 270 types. A sample of 21 teachers and trainee teachers completed an online-survey in which they were asked to respond, for each word, at what age they would expect that word to appear in children’s writing, on a scale from 5 to 14 years. Our measure was, therefore, similar to age-of-acquisition (Carroll & White, 1973; Gilhooly & Logie, 1980) but with a focus on written rather than spoken acquisition.
Inter-rater reliability was relatively low, as is common in subjective ratings of age-of-acquisition (Barrow et al., 2019; mean pairwise inter-rater correlation = 0.48, SD = 0.12, Krippendorff’s α = 0.29). Taken together, however, ratings showed high internal consistency (Cronbach’s α = 0.93, 95% CI [0.93, 0.95]).
Our age-of-acquisition-in-writing score for each word was, therefore, the mean score across all individual ratings for that word, and our vocabulary-age score for each text was the mean of these scores across lexical-lemma types within the text.
Syntax (clause construction)
Each text received a syntax score based on number of clauses, what kind of clause (main or subordinated), and whether these clauses were syntactically correct or contained one or more syntactic errors. The calculation was done according to these rules: 1 point for every syntactically correct main clause, 0.5 point for every main clause with one or more syntactical error, 2 points for every syntactically correct subordinate clause, and 1.5 points for every subordinate clause with one or more syntactical error.
Story grammar
The global narrative structure of the texts was measured through a basic version of story grammar (Labov & Waletzky, 1967) comprising the three stages orientation, complication and resolution. A text was scored zero if it did not have any stages, one point if it contained two stages (introduction and complication or complication and resolution) or two points if all three stages of story grammar were present.
The first author coded all of the texts for story grammar, while the second and the third author coded 50 texts each. Pearson’s r indicated good interrater reliability, with r = 0.89 and 0.88.
Basic narrative features (event count)
The basic story structure is the event, as a story usually is a chain of events linked in time (Labov & Waletzky, 1967). The number of events were counted in each text as a measure of use of simple story structures.
The first author coded all of the texts for events, while the second and the third author coded 50 texts each. Pearson’s r indicated good interrater reliability with r = 0.99 and r = 0.98 for the two rater pairs, respectively.
Advanced narrative features
On local level, other narrative structures than the event are: problem, solution, reaction, effect, comment from narrator and title (Martin & Rose, 2008). These features have in common that they relate to other features, meaning the students who apply these, were able to connect content on another basis than time, like for example causal relations. The number of each of these structures were identified and counted in the students’ texts.
The first author coded all of the texts for local narrative structure, while the second and the third author coded 50 texts each. Pearson’s r indicated good interrater reliability. The coding of each feature ranged from r = 0.82 to 0.95, except for the coding of the feature solution with r = 0.60. For this feature, raters discussed and resolved cases where ratings disagreed.
See Appendix 2 for explanation and examples of the coding of the advanced narrative features.
Holistic quality rating
Each text received a holistic score between 0 and 5, based on criteria described in a rubric (see Appendix 3). The texts were scored by the first author and the second author. Before scoring the texts, raters practiced on a set of 20 texts, and had in-depth discussions around these if there was disagreement. Pearson’s r indicated extremely good interrater reliability, r = 0.99.
Literacy-related measures
Students completed a series of literacy tests in their 2nd to 5th weeks of school. Students were tested individually by members of the DigiHand project team. All tasks, apart from the spelling test, were completed on a digital tablet. Testing sessions lasted for approximately 20 min, and testing was carried out in quiet room at the students’ local school.
Grapheme-to-phoneme mapping
Students saw 24 letters from the Norwegian alphabet, in upper case, in random order, one at a time (Sunde et al., 2019). They were asked to give the sound of the letter. If they named the letter instead, they were then prompted for the sound. They were given one point for each correctly-sounded letter.
Phoneme isolation
Children’s phonological segmentation ability was measured in a 10 item task in which students were asked to speak the first sound in each of 10 words (Solheim et al., 2018; Haaland et al., 2021). Words were common objects like ball ‘ball’, ost ‘cheese’, eple ‘apple’. The researcher would start with two practice trials saying: Dette er en ørn. Den første lyden i ørn er /ø/. Hva er den første lyden i ørn? ‘This is an eagle. The first sound in eagle is /e/. What is the first sound in eagle?’ and then the student repeated the first sound. After two test trials the researcher only named the items, and the student had to identify the first sound. The test was stopped if the child made two consecutive errors. Students were given one point for each correctly isolated phoneme.
Phoneme blending
Children were asked to blend a series of phonemes into a word (Solheim et al., 2018). The student was shown four images and a prerecorded voice named all pictures: (e.g., hus, mur, mus, pus—‘house, wall, mouse, cat’). The student then heard a segmented version of one of the pictures (e.g., /p/, /u/, /s/) and was asked to point to the corresponding picture. They were given two practise trials and then eight ordinary trials of increasing difficulty. All words were regular words, consisting of three to six phonemes. The test was stopped after two consecutive errors. Maximum score was eight points.
Word reading
Single-word naming accuracy was measured by asking participants to read aloud 10 single words (Haaland et al., 2021; Solheim et al., 2018). The words were regular frequent Norwegian words. A word appeared on the screen and the student was asked to read the word. Words were presented with increasing difficulty. If the student gave the letter names or unblended phonemes, the researcher asked “Yes, which word is that?”. The test was stopped after two consecutive errors. Students were given one point for each correctly read word.
Spelling
Children’s spelling ability was assessed as ability to write single words from dictation with pencil on paper (Haaland et al., 2021; Solheim et al., 2018). The words were regular frequent Norwegian words, starting with two- and three-letter-words ending with five-letter-words. The researcher read a sentence and repeated the word that the student should write. One test task was modelled by the researcher, and the child was asked to write the same word. Then there were ten ordinary tasks of increasing difficulty. The test was stopped after two consecutive errors. Words were scored as correct or wrong. Recognisable attempts at shaping the correct letter were accepted. Inverted letters were accepted as long as it was not a lower case <d> or < b>. The distribution of this variable was positively skewed, with a large proportion of students scoring zero (54%). Therefore, the variable was dichotomized (0 = students who scored zero, one or two [67%], 1 = students who scored three or more).
Vocabulary
Children’s productive vocabulary was assessed using a short version of the Norwegian Vocabulary test (Størksen et al., 2013), this short version has been used in previous research (Solheim et al., 2018). The students were presented with a picture of an object on the tablet screen and asked to name the object. All students completed the 20 items. One follow-up question was allowed: If the student for example answered with a less precise name, like “bird” instead of the correct “ostrich”, the researcher could ask “do you know what kind of bird?”. Each correct answer gave one point.