Participants were children diagnosed with developmental dyslexia (34 boys, 18 girls) and typically developing children (52 boys, 53 girls). All participants spoke Dutch as their primary language and all parents gave active consent to use the collected data for research purposes.
The children with dyslexia were diagnosed between 2009 and 2013 following the protocol by Blomert (2006), which is in line with the definition of dyslexia of the International Dyslexia Association (2002). The Dutch protocol for diagnosis of developmental dyslexia (Blomert, 2006) states that teachers have to prove persistent reading problems (resistance to treatment) and severity (weak performances on word reading and spelling during one and a half school year). In the subsequent diagnosis, a phonological deficit needs to be evidenced and other explanations of reading or spelling problems excluded by a certified clinical psychologist. After formally being diagnosed with dyslexia, children received an in-service phonic through spelling intervention in a clinic for assessment and intervention for children with learning difficulties. The mean age of this group of children at the start of the assessment was 8.97 years (SD = .94). Children were in grade 2 (n = 17), grade 3 (n = 23), grade 4 (n = 9), grade 5 (n = 2), and grade 6 (n = 1). All children had semantics within the normal range both in total scores (mean = 108.35, SD = 12.18) and standardized subtest scores (minformation = 10.50, SDinformation = 2.42, msimularities = 12.25, SDsimularities = 2.83, mvocabulary = 11.31, SDvocabulary = 2.32, mcomprehension = 11.58, SDcomprehension = 2.41).
The mean age of the children in the control group was 8.88 years (SD = .89). Children were in grade 2 (n = 25), grade 3 (n = 52), and grade 4 (n = 28). All children had semantics within the normal range both in total scores (mean = 101.67, SD = 13.84) and standardized subtest scores (minformation = 10.40, SDinformation = 2.63, msimularities = 10.98, SDsimularities = 2.81, mvocabulary = 10.46, SDvocabulary = 2.82, mcomprehension = 9.34, SDcomprehension = 2.81). The fact that both groups scored within the normal range on the four measures of the semantic ability seems to converge with the aptitude-achievement discrepancy model that defined dyslexia as a discrepancy between rather normal intelligence and weak reading scores (Fletcher, Lyon, Fuchs, & Barnes, 2007).
With respect to the recruitment of the control group, five schools for mainstream primary education throughout the Netherlands were asked by letter and telephone to participate in the present study. When a school agreed to participate, parents gave active consent to let their child participate in the present study.
With respect to the group of children with dyslexia, the data from the current study was based on existing data collected by a clinic for assessment and intervention of children with learning disorders. All children were tested between 2009 and 2013 in two consecutive mornings by clinicians on rapid automatized naming, phonological awareness, verbal working memory, pseudoword decoding, word decoding, spelling, and semantics. Two or 3 weeks after the assessment, the phonics through spelling intervention started. After the intervention, all participants received the posttest, including pseudoword decoding, word decoding, and spelling measures. All children from the control group were tested once, during one school day.
Spelling was measured with the standardized “PI word dictation” (Geelhoed & Reitsma, 1999). In this task, children were asked to write single Dutch words correctly. The dictation consisted of 135 words, divided into 9 blocks of 15 words. First, a sentence was read aloud and afterwards, the target word was repeated. The test was terminated when a child failed to write at least eight words correctly within one block. The number of correctly written words was counted. There were two versions available of the test (version A and version B). The reliability of both version A and version B differs per age but is at least .90 (Geelhoed & Reitsma, 1999).
Two subtests from the “Screening Test for Dyslexia” (Kort et al., 2005a) were used. First, during “Phoneme Deletion,” children were asked to omit a phoneme from an orally presented word and speak out the remaining word (e.g., “dak” [roof] minus “k” [f] is “da” [roo]). Testing was terminated after four consecutive mistakes. Second, during the subtest “Spoonerism,” children had to switch the first sounds of two words (e.g., “John Lennon” becomes “Lohn Jennon”). Testing was terminated after five consecutive mistakes. The reliability differs per age but is at least .60.
Rapid automatized naming
RAN was measured using two subtests of “Continuous Naming and Reading Words” (van den Bos & Lutje Spelberg, 2010). During “Naming Letters,” children had to read out loud 50 letters. During “Naming Digits,” they were asked to read out loud 50 digits. Children were asked to name these visual stimuli as quickly as possible. The time in seconds needed to finish each subtest was used for analysis which means that a higher score reflects a weaker performance on RAN. The reliability of this measure differs per age but is at least .75.
Verbal working memory
Verbal working memory was measured using the backward task of the Number Recall subtest from the Wechsler Intelligence Scale for Children-III (WISC-IIINL) (Kort et al., 2005b). In this task, the experimenter pronounces sequences of digits that the child was asked to repeat in backward order. Testing was terminated after two consecutive mistakes. The number of correctly recalled sequences was counted. The reliability of this measure differs per age but is at least .50.
Semantics were measured by adding the z-scores of four subtests from the WISC-IIINL (Kort et al., 2005b). Based on the manual, the child received zero, one, or two points for each item. Testing was terminated after four or five (Information) consecutive mistakes. The reliability differs per age but is between .64 and .77 (Kort et al., 2005b). First, during “Information,” the child has to answer verbally asked questions to test their general knowledge about events, objects, places, and people. Secondly, during “Similarities,” the child has to name the similarity between two concepts. Thirdly, during “Productive vocabulary,” the experimenter pronounces a word and the task of the child was to define the given word. Fourthly, during “Comprehension,” the experimenter asked questions about social situations or common concepts. Kaufman (1975) already showed that these four measures together form a factor named “verbal comprehension,” which is also the case in the current sample (van Rijthoven et al., 2018).
Spelling error classification
The PI word dictation consisted of 135 words. However, for most children, testing was terminated earlier. We therefore selected the first four blocks (60 words). All possible types of errors within these 60 words were listed and labeled (e.g., phoneme addition, end d/t, ei/ij). These errors were divided into three categories: phonological, morphological, and orthographic errors following Tops et al. (2014), Vanderswalmen et al. (2010), and Worthy and Viise (1996), see Appendix Table 6. Some types of errors could not be classified exclusively into phonological, morphological, or orthographic errors. Words containing these types of errors were removed from the dataset, see Appendix Table 7 (i.e., 21.66% (version A) and 18.33% (version B)). After the removal of the above-mentioned words, the total amount of possible errors was calculated based on the descriptions of the total amount of possible errors within each category. This was done for each version of the PI word dictation (versions A and B) to correct for any differences between the two versions.
Next, the dictation tasks of all participants were screened on the amount and type of errors made by the child. Following Tops et al. (2014), the error classification was based on the end-product and not on the strategy used by the child. For each child with dyslexia, two dictations were screened (pre- and posttest). For typically developing children, a single dictation was screened (all version A). The inter-rater reliability of the two MSc students who did the screening was good: Cohen’s kappa is .84 (A-version) and .88 (B-version).
For each dictation task, all errors were entered in a dataset in which each error was assigned to the type of error. One word could contain multiple errors following the descriptions in Appendix Table 6. In case of early termination of the task (after eight errors in a block of 15 words), all possible errors in non-written words were entered in the dataset based on the assumption that the upcoming words would be too difficult for the child. This is following procedures of other tests such as the WISC-IIINL (Kort et al., 2005a) or the PPVT (Dunn & Dunn, 1997). In the end, the total amount of phonological, morphological, and orthographic errors was calculated by adding the types of errors as these have been classified in Appendix Table 6. Finally, the percentage of errors for each classification was calculated based on the total amount of possible errors for each category per version.
Phonics through spelling intervention
A phonics through spelling intervention aims to reach a functional level of technical reading and spelling by means of combining reading and writing in one intervention following the protocol by Blomert (2006). Unique to a phonics through spelling intervention is that during the intervention, both reading and spelling instruction and exercises were equally balanced in terms of spent time (50–50). This is rather unique as most studies include less or even no spelling instruction or exercises. Children had a weekly 45-min session with a clinician. The mean length of the intervention was 27.06 weeks (SD = 4.79). Variation in the length of the intervention occurred due to variation in the post-intervention assessment schedule (for instance due to holidays or personal circumstances). Furthermore, variation in the length of the intervention occurred due to variation in time needed to acquire 80% accuracy levels and improved fluency levels as described below. During the sessions, the clinician tailored the intervention as much as possible to each child’s needs. Explicit direct instruction, guided exercises, and feedback were given according to each child’s needs. Approximately half of the time was spent on reading activities and the other half of the time was spent on spelling activities. The continuity of quality during assessment and intervention was guaranteed by supervision of certified clinical health psychologists. The intervention included two stages:
The intervention started with practice of the phonological base of reading and spelling due to learning the grapheme-phoneme correspondences (GPCs). After learning the GPCs, children learned to use this letter knowledge in reading and writing words and sentences/texts by using an explicit strategy. When children mastered the basic levels, children learned to read and write words based on syllables as well. Accuracy was trained first, followed by efficiency and words and sentences/texts increased in difficulty. Feedback was given on accuracy and later also on efficiency.
Dutch is a rather transparent language, but still morphological rules and orthographic patterns need to be learned to write and read words (mostly polysyllabic words) correctly. The morphological rules and orthographic patterns can be found in Appendix Table 6 and were taught according to each child’s needs.
In order to rehearse the above-mentioned spelling and reading knowledge, children had to do home exercises for reading and spelling. Parents were asked to train four times a week during 15 min with prescribed exercises. All parents have confirmed that this has been complied with. Parents reflected on the home exercises in a day-to-day logbook. When a child reached an accuracy of 80% during practice (read or write 80% of the words correctly) and improved significant in their fluency (more fluent compared to the first time words were read), the clinician moved on to the next topic of intervention. This formative testing was sustained throughout the entire intervention. Therefore, variation in the length of the program is present.