Improving the reading fluency of dyslexic and poor readers is a major educational goal (e.g., Report of the National Reading Panel, 2000). Reading fluency is a complex, multifaceted construct defined as accurate, fast and effortless reading with good comprehension (Hudson, Pullen, Lane, & Torgesen, 2009; see also Kuhn, Schwanenflugel, & Meisinger, 2010). Some models that describe the development of word reading fluency proposed a continuous process that is characterized by several phases (Adams, 1990; Ehri, 1998, 1999, 2005). Other models emphasized the increments that occur in the acquisition process (cf. Hinton, McClelland, & Rumelhart, 1986; Perfetti, 1992). The complexity of a writing system affects the development of word reading fluency: children learning to read an alphabetic language with a transparent orthography do so faster and more efficiently than children learning an opaque orthography as English (cf. Aro & Wimmer, 2003; Ellis et al., 2004; Seymour, Aro, & Erskine, 2003; Wimmer & Goswami, 1994). In addition to this, the nature of the orthography influences the types of reading difficulties children may experience (Caravolas, 2007): children reading opaque orthographies make more reading errors than children reading transparent orthographies (Aro & Wimmer, 2003; Frith, Wimmer, & Landerl, 1998; Patel, Snowling, & de Jong, 2004; Seymour et al., 2003). A prime characteristic of reading disorders in languages with a transparent orthography is the impairment in reading speed (de Jong & van der Leij, 2003; Landerl, 2001; Serrano & Defior, 2008).

This study examines the development of word reading fluency in Dutch, an orthographically transparent orthography. We will focus on the reading of words with a phonological consonant–vowel-consonant (CVC) structure. These words—with the exception of a few loan words—are orthographically fully transparent in reading, that is, they can be read by applying only context-free grapheme-phoneme correspondence (GPC) rules. Many polysyllabic words, in contrast, are more complex to decode because they require the use of additional contextual, graphotactical or morphological rules. Verhoeven and van Leeuwe (2009) investigated the development of CVC word reading fluency in Dutch typical and poor readers. Their study showed that gains in accuracy occur very rapidly from the beginning of reading instruction and taper off thereafter. Growth of decoding skills thereafter—in typical and poor readers—was found to be largely a matter of increased reading speed. Remediation has been shown to improve accuracy in Dutch poor readers, but reading speed tends to remain low compared to normally developing children (van der Leij & van Daal, 1989; see also Martens & de Jong, 2008; Scheltinga, van der Leij, & Struiksma, 2010; Yap & van der Leij, 1993). Torgesen, Rashotte and Alexander (2001) showed that a low reading speed in identifying single words is the most important factor accounting for individual differences in reading fluency (see also Jenkins, Fuchs, van den Broek, Espin, & Deno, 2003; Schwanenflugel, Meisinger, Wisenbaker, Kuhn, Strauss, & Morris, 2006; Vadasy & Sanders, 2009). Therefore, our research on remedial intervention focuses on the identification speed of single words. The present study investigates the differential effects of training methods targeted on the word reading speed of poor readers. The study contrasts a training method focused on words read correctly (successes) with a training method focused on words read incorrectly (failures) and investigates the interaction of these with the effect of being informed or not about the focus of training material.

To improve the identification of single words, repetition is proposed to be a key element, next to other beneficial measures such as providing immediate and corrective feedback, direct instruction, and scaffolding (see e.g., Chard, Vaughn, & Tyler, 2002; Kuhn & Stahl, 2003; Meyer & Felton, 1999; Swanson, 1999; Swanson, Hoskyn, & Lee, 1999). Poor readers require more practice than normally developing children to speedily identify words (Reitsma, 1983). Berends and Reitsma (2006) argued that it is more beneficial to repeatedly train on a few words over once reading many words. If word repetition is important, the question needs to be answered which words should be repeated: words that are read correctly, or words that appear to offer some difficulty, as they are read erroneously. That is the central question of this study. Focusing on successes is operationalized in this study by removing from the training set words that were read incorrectly during flashcard training sessions. In the training focusing on failures, words read correctly during the training are removed from the training set. Arguments favoring either one of these procedures will be discussed below.

In following Ehri’s (1998, 1999) framework, Wolf and Katzir-Cohen (2001) emphasized that instruction for fluency development should begin focusing on the accuracy of the word level and its underlying representations. In line with this view, the recommended practice in the remedial teaching of poor readers is to focus on words that are read incorrectly (e.g., Bender, 2004; Martens, Witt, Daly, & Volmer, 1999). Insofar as reading errors offer a window on the reading process that children use to read words (Allen, 1976; Au, 1977; Goodman, 1969; Savage, Stuart, & Hill, 2001; Singleton, 2005; Weber, 1970), the analysis of errors and miscues may be important in determining directions in reading instruction (McKenna & Picard, 2006), and therefore is advocated in popular textbooks for reading instruction in the primary school (e.g., Beard, 1990; Graham & Kelly, 1997; Roberts, 1989). Hall (2003) showed that a study of reading errors can be very instructive and helps teachers to gain an understanding of children and their difficulties. Such considerations and findings emphasized concentration on errors in reading practice and form the foundations of an intervention focused on failures.

There are, however, arguments in favor of the opposite approach, focusing on successes, which entails that children reread words they have successfully decoded. One argument is that, to stimulate fast and effortless reading, children must be able to automatize the application of GPC rules or automatically identify words. According to the self-teaching hypothesis (cf. Share, 1995, 1999), each successful identification of a given word increases the likelihood to successfully read that word again, as the reader obtains word-specific orthographic information. The self-teaching hypothesis argues that fast and accurate word identification depends on the frequency to which a child has been exposed to a particular word. Words that are read repeatedly are likely to be recognized visually with minimal phonological processing from the very earliest stages of reading acquisition. Thus, repeatedly reading successes should enable readers to catalyze the process of reading words from phonological recoding into reading words that have become increasingly lexicalized.

A second argument stems from our earlier study (Steenbeek-Planting, van Bon, & Schreuder, 2011). We investigated the instability of errors in Dutch CVC words, that is, how often words were read correctly at one time and incorrectly at another. This study showed that typically developing readers in first and second grade, and reading-level-matched poor readers did not repeatedly misread the same items. As for the words that were read inaccurately by the children, only a fourth of those words were read incorrectly twice, and three-fourths of those words were read incorrectly at one time, but correctly at the other. Errors thus were unstable to an important degree, and they were, most probably, not caused by a lack of GPC rule knowledge, but rather by inattentiveness or other, stochastic processes. If errors are determined largely by random factors, a focus on such errors will not be effective for enhancing reading speed because they are too unreliable to determine word selection. Moreover, the words that were read consistently wrong by beginning readers were not representative for the CVC orthographic type as such. They were characterized by a low word frequency, a low bigram frequency and a rather small and low frequency neighborhood. Thus, a training focused on failures inevitably uses words that are orthographically unlike most other CVC words, with sublexical components that are atypical for CVC words. If, children, however, practice on successes, they practice on typical CVC words, and therefore on the common GPC rules, which, consequently, can be expected to bring about greater transfer to CVC word reading than a training focusing on failures.

The effects of a training focusing either on failures or on successes may be influenced by motivational factors. Children experiencing learning problems often feel discouraged by their failure to learn (Riddick, 2010). A reading intervention focusing on failures may therefore discourage the learners even more. Repeatedly being confronted with failure may affect their self-esteem and as a consequence would be counterproductive (Shute, 2008). Some training theorists therefore suggested that learning should focus on successful events paired with positive reinforcement (Latham, 1989). On the other hand, confronting the readers with their performance by means of a training procedure that focuses on failures, might urge them through negative reinforcement to avoid making errors and to overcome their failures by working harder and more efficiently (Kluger & DeNisi, 1996; Locke & Latham, 2002). To investigate whether improvement is additionally affected by positive or negative reinforcement, the children in our study are either informed or not informed about the focus of the training. Children who are informed are expected to show greater learning effects either through positive reinforcement (in the training focused on successes) or through negative reinforcement (in the training focused on failures). It is expected that children who are not informed about the focus of the training, will not or to a lesser degree show the aforementioned effects and will show smaller learning effects, because they do not know that their practicing is associated with either their failures or successes.

This research seeks to answer the following questions: Are reading speed and accuracy differentially affected by a training focused on failures and a training focused on successes? Are such training effects strengthened by informing children about the training focus? The questions are answered by using a randomized controlled trial, in which children are randomly assigned to one of four flashcard training conditions:

  1. 1.

    SI: Focus on successes (S) and informed (I) about this focus.

  2. 2.

    SN: Focus on successes (S), but not informed (N) about this focus.

  3. 3.

    FI: Focus on failures (F) and informed (I) about this focus.

  4. 4.

    FN: Focus on failures (F), but not informed (N) about this focus.

In every condition, all children receive immediate feedback that qualifies each oral reading response as correct or incorrect.

Studies accounting for individual differences in response to intervention (see Bracht, 1970; Shavelson, Berliner, Ravitch, & Loeding, 1974) have resulted in systematic analyses of what has been termed Aptitude-Treatment Interactions (ATIs) (Cronbach & Snow, 1977; Snow, 1991; see also Kanfer & Ackerman, 1989). The data therefore will be used to investigate also whether some poor readers benefit more from a training focused on failures and other poor readers benefit more from a focus on successes, and whether training outcomes are related to the children’s initial reading level.

Method

Participants

Poor readers (n = 83, of whom 50 male) were selected from four primary schools for special education. Children are in this type of education (see Eurybase, 2008) because of learning disabilities, mild mental retardation or mild behavioral problems. The majority of these children (73%) are poor readers (van Bon, Bouwmans, & Broeders, 2006). To ensure a sample of typical IQ, children with mental retardation or behavioral problems were excluded from participation. Formal reading instruction in these and other Dutch primary schools is based on phonics instruction and starts in Grade 1.

Four children left the study due to illness or moving to another school. Therefore, results on 79 children (of whom 47 male) will be reported. Children were defined as poor readers because they scored below the 10th percentile for their grade level on at least two of the following standardized screening tests that also served as pretests: Lexical Decision Tests (LDT1, LDT2), Word Decoding Tests (WDT1, WDT2, WDT3) or a Nonword Reading Test (NRT). All children were able to sound out the graphemes according to the Dutch GPC rules, had Dutch as their first language, and did not have any diagnosed neurological problems, nor a speech, hearing or visual impairment. The teachers also classified these children as poor readers and considered the reading problems of these children as not caused by behavioral problems. Parents were informed about the participation of the children and had given written consent for participation.

Children were randomly assigned to one of the four experimental training conditions SI, SN, FI or FN (see end of introduction). Boys and girls were evenly distributed over experimental groups. The training groups were evenly distributed across schools and classes to reduce school-specific or teacher-specific effects.

In order to assess whether ATI effects of initial reading level apply to our data, children were qualified as having either a high initial reading level (HI) or a low initial reading level (LI), based on their reading composite score on the WDT1 prior to intervention. Thirty-seven children with a reading composite score at or below the median (35 words read correctly per minute,) were considered LI children; 42 children with a reading composite score above the median formed the group of HI children.Footnote 1

The training groups did not differ in age (F < 1), number of months of formal reading instruction (F < 1), on any pretest, both as to composite scores as well as to the disaggregated component scores for accuracy and speed (F < 1 for all tests). The training groups did not differ on these variables either when analyses were conducted separately for LI children and HI children (Fs < 1). The HI children performed better than the LI children on all pretests, both as to the composite scores (F (6,72) = 22.3, p < .001, η 2 p  = .65), as well as to the component scores for accuracy (F (7,71) = 6.63, p < .001, η 2 p  = .45), and speed (F (7,71) = 14.57, p < .001, η 2 p  = .59). The LI children received fewer months of reading instruction than the HI children (F (1,77) = 5.36, p < .05, η 2 p  = .07), however, the groups did not differ in chronological age (F < 1).

Table 1 provides an overview of characteristics of the experimental subgroups. Scores on pretests can be found in Tables 5 and 6.

Table 1 Composition of the training groups as to gender, mean age and reading instruction (RI) in months

Procedure and instruments

Computerized flashcard training

Stimuli in the training set

Words with a phonological CVCFootnote 2 structure were selected from the Celex Database (Baayen, Piepenbrock, & van Rijn, 1993). The selected words were the lemmas for words that can occur independently in a language (i.e., nouns, verbs, adjectives, adverbs, and numbers). Words that were considered rather idiosyncratic or shocking, proper names and words with a foreign orthography or phonology were eliminated (Nunn, 1998; Booij, 1995). Additionally, some words were considered unsuitable for training purposes. These were low-frequency words (e.g., zijl [drainage watercourse]) having a homophonic high-frequency counterpart (e.g., zeil [sail]). Only the high-frequency homophones (zeil) were used, as otherwise children might memorize the low-frequency written word form (zijl) with the high-frequency word meaning. Of the initial set of 1,078 words, 845 thus constituted the training set.

General training setup

The training consisted of 10 sessions of approximately 20 min each. Children practiced individually once or twice a week. The training used the flashcards format of van den Bosch, van Bon and Schreuder (1995). In each training session 100 CVC words were presented, randomly selected from the training set, one at a time on a computer screen. Children were instructed to name the presented word as accurately and as fast as possible. At the end of the 10 sessions, each child had thus read 1,000 times a word from the training set. CVC words were presented in black, lower case letters (Arial, size 48). on a white background in the centre of a computer screen. The letters had a height of approximately 1.5 cm and words ranged from 2 to 6.5 cm in length. The child was seated in front of the computer screen, at a distance of approximately 60–80 cm.

The exposure duration of the words was varied to maintain the accuracy rate at an approximately constant level. After each trial, the reading accuracy of the last word and the previous five words were evaluated. The exposure duration of the next word was increased by 17 ms if four or more words out of these six had been read incorrectly, and was decreased by 17 ms if five or six out of the six words were read correctly. In the other cases (three or four correct), exposure duration remained unchanged. In this way, the accuracy rate was maintained at an approximately constant level of 67%. Due to software failure, each session started with an exposure duration of 350 ms, instead of the exposure duration the previous session ended with.Footnote 3

Each word was preceded by a fixation cross (a+) in the center of the screen for 800 ms. After a blank screen for 200 ms, the word was presented with varying exposure duration. The word was followed by hash marks (###) in order to prevent further visual processing of the letter string. As soon as a voice key registered a verbal response, the hash marks disappeared and were followed by visual feedback (1,000 ms). The visual feedback on the screen indicated whether the verbal response was correct (smiley) or incorrect (sad face). At the end of each session, the child was shown a computer graph that depicted the presentation times of the words read in the current and in the previous sessions and the meaning of the graph was explained if necessary. This graph visualized the child’s progress and should motivate the children to perform well (Kluger & DeNisi, 1996). Each session started with a short practice block of six randomly chosen CV and VC words in order to get accustomed to the training.

Training conditions

Children received training according to one of the four different conditions (SI, SN, FI and FN). In each session, 100 words, taken randomly without replacement from the training set (initial n = 845), were presented. For training groups focusing on successes (SI and SN), the words read incorrectly during the training session were removed from further training, thus reducing the number of words in the training set. In the next session, again 100 words were randomly taken without replacement from the remaining training set. This procedure was applied to all 10 training sessions. For the groups focusing on failures (FI and FN), words read correctly during the training session were removed. In both training conditions, words used in invalid trials were not removed.

Apart from focus, training conditions differed in the information that was given about the focus of the training. In the non-informed condition, children were not told that their training was either focused on failures or on successes. In contrast, children in the informed condition were explicitly shown, reminded by graphics and a short movie,Footnote 4 and told at the beginning of each session that they were practicing on their past successes and new items or on their past failures and new items.

Laptops with 14” screens were used. The correctness of verbal responses was recorded by the experimenter, and stored in the computer by means of a buttonbox.

Pre- and post-tests

Children were screened for selection with LDT1 and LDT2 and thereafter, WDT1, 2 and 3, NRT were additionally used for selection. These tests also served as pretest, together with a Sentence Verification Test (SVT). Parallel versions of the tests were used at the posttest.

Lexical Decision Test (LDT)

The students were asked to complete two versions of a standardized paper-and-pen Lexical Decision Test (van Bon, 2007). Each version involves a card with words distributed across it in columns. LDT1 is composed of CVCC and CCVC words following the dominant orthographic rules. Sixty high-frequency nouns that are likely to be known in their spoken forms by 6-year-olds (Kohnstamm, Schaerlaekens, de Vries, Akkerhuis, & Frooninckxs, 1999) were interspersed with 20 pseudowords. LDT2 is composed of bisyllabic words following the dominant orthographic rules, and has 90 high-frequency nouns likely to be known in their spoken forms by 6-year-olds, interspersed with 30 pseudowords. This test is administered in class and students are asked to silently read the items and cross out every pseudoword. The raw score for each test is the number of words judged within a minute minus the number of errors. Test–retest reliability for children in Grades 1 to 3 is considered sufficient, .81 for LDT1 and .82 for LDT2, (van Bon, 2007). The LDT taps word reading skills and semantic knowledge. We incorporated the LDT, as it is an adequate and reliable alternative for using oral reading tests (van Bon, Hoevenaars, & Jongeneelen, 2004).

Word Decoding Test (WDT)

A standardized word reading test (Verhoeven, 1995) was administered individually to assess the oral reading abilities of the students for words in isolation. This test consists of three cards with words listed in columns (WDT1: simple monosyllabic words; WDT2: monosyllabic words with one or two consonant clusters; WDT3: two-, three-, and four-syllable words). Students are instructed to read the words aloud as quickly and accurately as possible. The composite score for each card is the number of words read correctly in 1 min. The reported reliability of the three cards (Cronbach’s α) ranges from .86 to .94 (Verhoeven & van Leeuwe, 2003) and is judged sufficient.

Nonword Reading Test (NRT)

In order to assess the decoding ability of the students for pseudowords, a standardized Nonword Reading Test was administered (van den Bos, Lutje Spelberg, Scheepstra, & de Vries, 1994). The test consists of pseudowords of increasing length. The students are instructed to read the pseudowords aloud as quickly and accurately as possible. The test score is the number of nonwords read correctly in 2 min. The parallel reliability is good, .93 and above (van den Bos et al., 1994).

Sentence Verification Test (SVT)

In order to determine word reading skill in sentence context and comprehension skill, a computerized sentence verification task derived from van den Bosch et al. (1995) and Wentink (1997) was used. Thirty semantically correct sentences (e.g., Kaas is geel. [Cheese is yellow.]) and fifteen semantically incorrect sentences (e.g., De trein is zuur. [The train is sour.]) were presented one-by-one on a computer screen in random order. Sentences consisted of high-frequency monosyllabic words that follow the dominant orthographic rules, and that are likely to be known in their spoken forms by 6-year-olds (Kohnstamm et al., 1999). Children were asked to silently read the sentences as quickly and accurately as possible, and then to judge the sentences as meaningful or not meaningful by pressing a button. Thereafter, the sentence disappeared and a new sentence appeared on the screen. A blank screen of 2 s appeared in between the presentation of the sentences. Children got acquainted with the test by judging four training sentences. There were no time limitations to this task and no feedback was given. The score is the percentage of correctly judged sentences. SVT responses and latencies were recorded by the laptops.

Results

First we present data on word reading collected during the flashcard training. The stimuli changed from session to session for each individual, and therefore we answer the questions whether the content of the training differed for the training groups with respect to the number of words and their word characteristics in the last training set. Changes during the training are explored by investigating the accuracy scores and exposure durations from the first to the last training session. Planned comparisons are made with respect to Focus (successes vs. failures), Information (informed vs. non-informed) and Initial Level (high vs. low).

Second, we present the results on transfer measures of word reading skills. We conduct repeated measures MANOVAs with the pre- and post-test component scores for reading speed and accuracy on LDTs, WDTs, NRT, and SVT as dependent variables. Between-subjects factors are Focus, Information, and Initial Level. Time (pre- vs. posttests) are the within-subjects factor. To assess whether our interventions had additional practice value, in the last section, we compare the pre- and posttest scores of the participants on the standardized tests to the respective age norms.

The flashcard training

Stimuli of the last training set

An analysis of variance was conducted with Number of words in the last training set as a dependent variable and Focus, Information, and Initial Level as between-subjects factors. A main effect of Focus (F (1,71) = 283.67, p < .001, η 2 p  = .80) was further qualified by an interaction of Focus by Initial Level (F (1,71) = 15.44, p < .001, η 2 p  = .18). Post hoc tests with Bonferroni correction showed that children focusing on successes had more words in their last training sets than children focusing on failures (see Fig. 1), both as to the LI children (p < .001) and HI children (p < .001). As for the children focusing on failures, the LI children had more words in their last training sets than the HI children (p < .05), whereas this effect was (marginally) reversed for the children focusing on successes (p = .07). Thus, the difference in number of words between groups focusing on failures versus successes is largest for the HI children and smallest for the LI children.

Fig. 1
figure 1

Mean number of words in the training set of the last session for children with a low and a high initial reading level in the training focused on failures and on successes

Next, we analyzed the lexical and sublexical characteristics: (1) length, (2) mean log bigram frequency, (3) frequency (i.e., the natural logarithm of a word’s frequency per million, Baayen et al., 1993), (4) number of orthographic neighbors (i.e., words differing in one letter), (5) number of phonological neighbors (i.e., words differing in one phoneme), (6) frequency of the most frequent orthographic neighbor, and (7) frequency of the most frequent phonological neighbor. These characteristics (see Table 2) were reduced to three uncorrelated factors with principal components analysis: pattern frequency (marked by high loadings of bigram frequency and frequency for the most frequent orthographic and phonological neighbors), neighborhood size (marked by high loadings for the number of phonological and orthographical neighbors and by a negative loading for word length), and word frequency (marked by word frequency).

Table 2 Means and standard deviations of word characteristics, grouped within three factors, in the training set of session 10 for the training focused on failures and on successes, split to children with a low and high initial level

These three factor scores were entered as the dependent variables into a multivariate analysis of variance. Focus, Information, and Initial Level were the between-subjects variables. Words trained on by children focusing on successes have a higher pattern frequency (F (1,28,271) = 17.95, p < .001, η 2 p  < .01), larger neighborhood size (F (1,28,271) = 150.96, p < .001, η 2 p  = .01), and higher word frequency (F (1,28,271) = 165.73, p < .001, η 2 p  = .01) than words trained on by children focusing on failures. As for the groups focusing on failures, words trained on by LI children have a higher word frequency (F (1,10,782) = 6.03, p < .05, η 2 p  < .01) and a lower neighborhood size (F (1,10,782) = 6.64, p < .01, η 2 p  < .01) than words trained on by HI children. It seems that LI children do not only have difficulty reading low-frequency words, but also words with a low neighborhood size.

Accuracy during training

A repeated measures ANOVA was conducted with log odds of the percentage correct of each session (see Allerup & Elbro, 1998) as a dependent variable (see Table 3). Time (Session 1 to Session 10) was entered as a within-subjects factor and Focus, Information, and Initial Level as between-subjects factors.

Table 3 Means and standard deviations of reading accuracy (in percentages) in each training session of the training groups

Results show main effects of Time (F (9, 63) = 4.43, p < .001, η 2 p  = .39), and Initial Level (F (1,71) = 16.92, p < .001, η 2 p  = .19), indicating that HI children have higher accuracy scores than the LI children (see Fig. 2). The LI children had lower accuracy scores in the first session than the HI children (F (1,77) = 14.08, p < .001, η 2 p  = .16), and this difference apparently did not disappear. An interaction of Time by Initial Level was also found (Huyn-Feldt correction applied, see Keselman et al., 1998) (F (7.35, 521.47) = 2.33, p < .05, η 2 p  = .03). Tests of within-subjects contrasts (polynomial) show an interaction of Time by Initial Level at the cubic level only (F (1,77) = 4.87, p < .05 η 2 p  = .06). The HI and LI groups increase their accuracy scores from session 1 to session 10, but they do so in a different time course. LI children show an initial stagnation, after which they increase in accuracy and this is best described as a linear increase (F (1,41) = 22.19, p < .001, η 2 p  = .35) and marginally as cubic (F (1,41) = 4.01, p = .05, η 2 p  = .09). HI children show a slight increase over all ten sessions and their improvement over time can only be described as linear (F (1,36) = 9.98, p < .01, η 2 p  = .22). No effects of Focus and Information (F < 1) on reading accuracy were found.

Fig. 2
figure 2

Mean accuracy score (in percentages) of each training session for children with a low and high initial reading level

Exposure duration during training

A repeated measures ANOVA was conducted with the mean log exposure duration of each session (see Table 4) as a dependent variable, Time (Session 1 to Session 10) as a within-subjects factor, and Focus, Information, and Initial Level as between-subjects factors.

Table 4 Means and standard deviations of exposure duration (in milliseconds) in each training session of the training groups

A main effect for Time (F (9, 63) = 13.24, p < .001, η 2 p  = .65) indicates that the mean exposure duration decreased from the first to the last session. A main effect of Initial Level (F (1,71) = 40.92, p < .001, η 2 p  = .37) indicates that the exposure durations for LI children were longer than for HI children. LI children had longer exposure durations in the first session (F (1,77) = 32.62, p < .001, η 2 p  = .30), and this difference was apparent throughout all training sessions. Interestingly, two interactions were found. The interaction of Time by Focus (F (8.54, 606.61) = 1.96, p < .05, η 2 p  = .03, with Huyn-Feldt correction) is illustrated by Fig. 3. Tests of within-subjects contrasts (polynomial) show an interaction of Time by Focus at the linear level only (F (1,77) = 6.37, p < .05 η 2 p  = .08). For children focusing on successes exposure duration decreased in a linear trend (F (1,40) = 22.61, p < .001, η 2 p  = .36), and more rapidly than for children focusing on failures, whose decrease in exposure duration can be described as linear (F (1,37) = 76.47, p < .001, η 2 p  = .67), but also as quadratic (F (1,37) = 6.16, p < .05, η 2 p  = .14).

Fig. 3
figure 3

Mean exposure duration (in milliseconds) in each training session for children in the training focused on failures and successes

The interaction of Time by Information (F (8.54, 606.61) = 2.78, p < .01, η 2 p  = .04) is further qualified by tests of within-subjects contrasts (polynomial) that show an interaction of Time by Information at the linear level only (F (1,77) = 12.63, p < .001 η 2 p  = .14). For children uninformed, the decrease in exposure duration can only be described as being linear (F (1,38) = 15.95, p < .001, η 2 p  = .30) and for children who are informed this can be described as being linear (F (1,39) = 102.29, p < .001, η 2 p  = .72), but also as quadratic (F (1,39) = 4.75, p < .05, η 2 p  = .11). As is illustrated in Fig. 4, for children who are informed the exposure duration decreased more over sessions than for children who are uninformed.

Fig. 4
figure 4

Mean exposure duration (in milliseconds) of each training session for children being informed about the focus of the training, and children being uninformed

Transfer measures of word reading skills

Reading accuracy

A repeated measures MANOVA was conducted with the accuracy scores on the reading tests as the dependent variables. Accuracy for each test was determined as the percentage of correct responses and is displayed in Table 5.

Table 5 Reading accuracy (as percentage correct) at pretest and posttest of the Lexical Decision Test (LDT1 and LDT2), Word Decoding Test (WDT1, WDT2 and WDT3), Nonword Reading Test (NRT) and Sentence Verification Test (SVT)

An effect of Time was observed (F (7, 65) = 5.84, p < .001, η 2 p  = .39) indicating that accuracy improved from pre- to posttest. An effect of Initial Level was observed (F (7, 65) = 7.38, p < .001, η 2 p  = .44), indicating that HI children have higher accuracy scores than LI children. This holds for all tests (p < .001). No other main effects were observed (F < 1). No significant interactions were observed (Time by Information (F (7, 65) = 1.34, p > .05, η 2 p  = .13); Time by Focus by Information (F (7, 65) = 1.34, p > .05, η 2 p  = .13); Time by Focus by Initial Level (F (7, 65) = 1.74, p > .05, η 2 p  = .16); all other interactions F < 1).

The effect of Time was significant for LDT1 (F (1,71) = 4.16, p < .05, η 2 p  = .055), LDT2 (F (1,71) = 12.98, p < .001, η 2 p  = .16), and SVT (F (1,71) = 13.35, p < .001, η 2 p  = .16), indicating that accuracy improved for these reading tests only. No improvement in accuracy was observed for the other tests (WDT1 (F (1,71) = 2.68, p > .05, η 2 p  = .04), WDT2, WDT3 and NRT F < 1). Interestingly, the LDTs and the SVT are tests in which the child needs to judge whether an item (word or sentence) is meaningful or not.

These results on reading accuracy indicate that children improve their reading accuracy on the two Lexical Decision Tests and the Sentence Verification Test. No differential effects of training condition on reading accuracy improvement were found, however.

Reading speed

A repeated measures MANOVA was conducted with reading speed as the dependent variable. Reading speed was determined as the number of words read within 1 min for the LDT, WDT and NWR. For the SVT, reading speed was determined as the log median latency time over the semantically correct sentences. Descriptives are displayed in Table 6.

Table 6 Reading speed at pretest and posttest measured as number of words read per minute of the Lexical Decision Test (LDT1 and LDT2) and Word Decoding Test (WDT1, WDT2 and WDT3); as number of words read per 2 min of the Nonword Reading Test (NRT); and as latency time (in MS) for the Sentence Verification Test (SVT)

A main effect of Time was observed (F (7, 65) = 22.88, p < .001, η 2 p  = .71), indicating that children improved their reading speed from pre- to posttest. A main effect of Initial Level (F (7, 65) = 13.66, p < .001, η 2 p  = .60) indicates that HI children read faster than LI children, as expected. No other main effects were observed (Focus (F (7, 65) = 1.32, p > .05, η 2 p  = .12), Information (F (7, 65) = 1.82, p > .05, η 2 p  = .16). Interestingly, the interaction of Time by Focus by Initial Level (F (7, 65) = 2.73, p < .05, η 2 p  = .23) was significant. No other interactions were present (Time by Initial Level (F (7, 65) = 1.36, p > .05, η 2 p  = .13); Time by Information by Initial Level (F (7, 65) = 1.75, p > .05, η 2 p  = .16); Time by Focus by Information by Initial Level (F (7, 65) = 1.54, p > .05, η 2 p  = .14); all other interactions F < 1).

The effect of Time was significant for each reading test, indicating that children improved their reading speed on all reading tests: LDT1 (F (1,71) = 22.23, p < .001, η 2 p  = .24), LDT2 (F (1,71) = 21.99, p < .001, η 2 p  = .24), WDT1 (F (1,71) = 73.10 p < .001, η 2 p  = .51), WDT2 (F (1,71) = 43.61, p < .001, η 2 p  = .38), WDT3 (F (1,71) = 35.50, p < .001, η 2 p  = .33), NRT (F (1,71) = 14.54, p < .001, η 2 p  = .17), and SVT (F (1,71) = 87.00, p < .001, η 2 p  = .55).

The interaction of Time by Focus by Initial Level was significant for LDT2 (F (1,71) = 4.05, p < .05, η 2 p  = .05), WDT3 (F (1,71) = 10.13, p < .01, η 2 p  = .13), and NRT (F (1,71) = 4.77, p < .05, η 2 p  = .06). Other tests did not show this interaction (WDT1 (F (1,71) = 1.73, p > .05, η 2 p  = .02); WDT2 (F (1,71) = 2.27, p > .05, η 2 p  = .03); LDT1 and SVT both F < 1). The interaction of Time by Focus by Initial Level observed in LDT2, WDT3 and NRT shows an ATI effect. In the groups training on successes, LI children improved their reading speed more than HI children (post hoc tests with Bonferroni correction, LDT2: p < .001, WDT3: p < .001, NRT: p < .01). And, conversely, in the groups training on failures, HI children improved more than LI children (LDT2: p < .05, WDT3: p < .001, NRT: p < .05). Thus, a training on failures is most beneficial for HI children, and conversely, a training focused on successes is most beneficial for LI children. Crucially, as the previous part of the results section pointed out, the training groups did not show any differential effect on reading accuracy. Improved reading speed evidently did not go together with a detrimental effect on reading accuracy. In other words, no speed-accuracy trade-off is observed as a result of the flashcard training focusing either on successes or on failures.

Comparison with normal reading improvement

The pre- and posttest scores of the LDTs, WDTs and NRT were classified into five levels: below the 10th percentile, the next 15%, and each of the next quartiles. The Wilcoxon Signed Ranks Test indicated that children improved from pre- to posttest by scoring in a higher norm class on the LDT1 (Z = −2.14, p < .05) and WDT1 (Z = −2.11, p < .05) (On LDT2, WDT2, WDT3 and NRT progress was not significant.). This suggests that for poor readers, intervention focused on CVC words was beneficial as children improved their reading of untrained monosyllabic words more than can be expected from regular education.

Discussion

The present study compares the effects of two training procedures in poor readers. A randomized controlled trial design was used to answer the questions (a) whether a training focused on failures versus a training focused on successes differentially improve reading speed and accuracy, (b) whether the effect of training focus interacts with the effect of being informed or not about the training focus, and (c) whether such training effects interact with the child’s initial reading level. The interventions were focused on reading speed, while maintaining an approximately constant level of reading accuracy. The training material consisted of regularly spelled Dutch CVC words and outcome transfer measures consisted of a Lexical Decision Test (LDT), Word Decoding Test (WDT), Nonword Reading Test (NRT), and a Sentence Verification Test (SVT).

The main findings of the pretest–posttest comparison are as follows. An improvement of reading speed was observed for all reading tests. Improvement was characterized by an ATI effect (Cronbach & Snow, 1977; Snow, 1991). The HI children among the poor readers improved their reading speed most by the training focused on failures, and conversely, the LI children in the training focused on successes. It should be noted that the ATI effect was restricted to the reading speed of untrained words with an orthographical structure that can be characterized as more complex than the orthographical structure of the trained words (observed in LDT2, WDT3, and NRT that is composed primarily of polysyllabic words). A possible cause for this immediate transfer to polysyllabic words may be a more efficient syllable processing. Poor readers tend to read longer words letter-by-letter. Possibly, as a result of the CVC training, the poor readers progressively shifted towards a more syllable-bound decoding strategy. Such a shift after training is evidenced for Dutch poor readers by Wentink, van Bon, and Schreuder (1997) (see also Huemer, Aro, Landerl, & Lyytinen, 2010). An ATI effect was not found for words of the CVC structure used in the training. Improved reading competence in a rather transparent orthography as Dutch is mostly a matter of increased speed (Verhoeven & van Leeuwe, 2009) and low reading speed is an important characteristic of Dutch poor readers (van der Leij & van Daal, 1989). The transfer of increased reading speed to untrained words in our study, therefore, is promising.

Reading accuracy, on the other hand, improved equally for all children on CVC word reading in LDT1, on the LDT2, which uses bisyllabic words, and intermediately transferred to the SVT, which uses high-frequency monosyllabic words. Interestingly, increases in accuracy included exactly those tests (lexical decision and sentence verification) that assess semantic processing. This might reflect that the training enhanced the connection of semantic properties to orthographic word features (see Ehri, 1998, 1999, 2005). During training, the LI and HI children showed different growth trajectories in accuracy. The HI children improved slightly over all sessions, whereas the LI children showed initial stagnation after which they increased in accuracy. During training immediate feedback was provided on whether an item had been correct. It is an issue of future research to verify whether children show steeper improvement if scaffolding or corrective feedback is provided. Importantly, no significant speed-accuracy trade-off occurred as an effect of the training. Evidently, gains in reading speed did not lead to a loss of accuracy.

The ATI effect found for reading speed implies that the two intervention approaches interact with the children’s reading level at the start of the training. It can be concluded, therefore, that neither of the two intervention approaches is superior to the other in general (cf. related studies of Eckert, Dunn, & Ardoin, 2006; and Worsdell, Iwata, Dozier, Johnson, Neidert, & Thomason, 2005), but that the best approach depends on the reading level of the child.

As an alternative approach to assess whether our intervention had additional practice value, we compared the improvement in the training groups to normed scores. Results showed that the training groups improved more on the monosyllabic words that were practiced (LDT1 and WDT1) than can be expected from normal reading education. This suggests that the intervention was beneficial. Comparisons of the training groups to customary improvement could not be made on the non-normed SVT. However, the effect sizes for this test were large, which might point towards a similar improvement on an intermediate transfer to sentence comprehension as well.

To enhance fast and effortless reading for the LI children, training should focus on words of a type—CVC, in this case—they are able to read with sufficient accuracy. As the analysis of the stimuli in the last training set shows, these words can be considered representative for many other words of the same orthographic structure. They have sublexical units with a high chance of occurrence in the language, are characterized by high bigram frequency, and they typically have many and high-frequency orthographic and phonological neighbors. Also, children are likely to re-encounter these words in later reading, as they are characterized by a high frequency of use. Therefore, mastering these words reflects mastering a common core of (CVC) reading material. Improved competence in reading this common core apparently transfers to improved reading speed of words that are orthographically more complex. In contrast, for the HI children, the training approach focused on failures is the most effective one. Children in this training group have been practicing on words that can be viewed as being in the periphery of the CVC common core. These words have a low probability of occurrence in the language and their constituent sublexical units occur less in other words. If HI children focus on these rather uncommon and less familiar words, reading speed improves and transfers to untrained words that are orthographically more complex than the words they have practiced. For the HI children, these CVC words may have been more of a challenge to read and probably are in their zone of proximal development (Vygotsky, 1978). They may involve practicing at a level of difficulty that is neither too high nor too low, the level at which optimal learning takes place. Perhaps, practicing on successes was too easy for the HI children, whereas practicing on failures was too hard for the LI children. Surprisingly, the training on failures did not prove to be most beneficial for the LI children, as would be predicted by Ehri (1998, 1999). Rather, it seems that repeatedly practicing successes increases item-based knowledge (Share, 1995, 1999). Improving reading speed in the HI children by practicing on the more demanding words is consistent with expert learning theories of Gobet (2005) and Ericsson (2004), who show that performance can be optimized by continued practice on failures accompanied by detailed feedback. Our results also partly underscore Podsakoff and Farh’s (1989) conclusion that goals that are hard to reach lead to better results than goals that are easy to meet.

The content of the different trainings was further investigated by exploring the characteristics of the words in the last training set. Analyses on the number of words show that the ATI effect is not explained by the number of different words children have practiced with, because children with the fewest words in their last training set, be it successes or failures, do not consistently show the greatest improvement in reading speed of untrained words. Thus, improvement of general reading speed is not primarily related to word specific training effects. This differs from a conclusion by Berends and Reitsma (2006) that the practical value of repeated reading with Dutch poor readers lies in its word specific training effects. In contrast to Berends and Reitsma, who only found effects on trained monosyllabic words with consonant clusters and no transfer effects to untrained words, our study leads to the contrary conclusion, that decoding many different words transfers to untrained words, probably by improving skill in applying GPC rules. The explanation for the contrasting outcomes perhaps is, that it is not so much the number of words in training that is crucially for improving general reading skills, but rather which words children practice with: either their failures or successes.

The training data suggest that informing students about the focus of the training positively affects training: Exposure durations for children informed about the focus of the training decreased more than for children who were not informed. This is in line with studies of Swanson et al. (1999), Conte and Hintze (2000), and Kluger and DeNisi (1998) who stress the importance of goal-setting for students in education. Possibly, the informed children increased their learning not only through positive or negative reinforcement, but also through a clearer metacognitive focus on the goal of the training. However, the effect of informing children was not found in the pretest–posttest comparison. The additional effects of informing children apparently were not robust or powerful enough for a transfer to general reading performance.

It should be noted that our training improved reading speed at the word level and intermediately transferred to the sentence level. Improved reading speed at the sentence level as a result of a training targeted at the sentence level has been reported for poor readers (Breznitz, 2006; Karni et al., 2005; Snellings, van der Leij, de Jong, & Blok, 2009). Our study goes beyond these sentence-focused trainings, as our focus on the word level affects reading speed at the sentence level as well. Future research should verify whether our training brings about not only intermediate transfer to sentence comprehension, but a far transfer to passage reading and text comprehension as well.

Our study focused on increasing reading speed, because a low reading speed is a prime characteristic of reading disorders in languages with a transparent orthography. Children with reading disorders in opaque orthographies generally suffer more from a low accuracy. Moreover, Ehri (2005) indicated that children reading transparent orthographies seem to progress faster through developmental phases of reading fluency than children reading opaque orthographies. Whether children reading in an opaque language benefit from a differential training like we observed, is an issue for future research.

Our study suggests that the improvement of general reading speed in a transparent orthography is closely related to both the type of words children practice with (common and familiar words vs. uncommon and less familiar words) and to their initial reading level. The training approach that focuses on CVC words that are representative for many other CVC words, is the most effective for the poor readers with a low initial reading level. These readers constitute only a small number of the population at large: We selected the participating students with a reading score below the 10th percentile and thereafter split the group in half. Thus, our poor readers with a low initial reading level would constitute only 5% of the population. It is these children, however, who are the readers that deserve the most effort to improve their reading skill.