As a result of globalization and migration, many children receive literacy instruction in their second language (L2). In Germany, the proportion of primary school children with a migration background is 37% on average, and more than 60% in metropolitan areas (Autorengruppe Bildungsberichterstattung 2018; Klatte et al. 2017a). About 20% of these children are learning German as L2 (Koschollek et al. 2019). Main languages spoken at home differ from the language of instruction in 22% and 15% of the students in the US and EU, respectively (Camarota and Zeigler 2015; Wendt and Schwippert 2017). Studies consistently show that L2 learners perform worse in literacy tests when compared to their monolingual peers (Farnia and Geva 2013; Klatte et al. 2017a; Melby-Lervåg and Lervåg 2014; Weis et al. 2019; Wendt and Schwippert 2017). These group differences are at least partly mediated by differences in proficiency in the language of instruction (Wendt and Schwippert 2017). However, lower socioeconomic status and parental education, and less favorable home literacy environments also contribute to L2 learners’ difficulties in literacy acquisition (Autorengruppe Bildungsberichterstattung 2018; Klatte et al. 2017a; Niklas and Schneider 2013; Wendt and Schwippert 2017).

Due to long-lasting and intensive research, a lot is known concerning early literacy instruction and intervention for children instructed in their native language (L1; e.g. Galuschka et al. 2014; Teale et al. 2018). Concerning literacy instruction in L2, research has mainly focused on English language learners. Findings indicate that the development of basic reading skills is highly comparable for L2 and L1 learners (for a recent review, see Schaars et al. 2019), and that evidence-based instruction programs designed for native children are also effective in students instructed in L2 (Goldenberg 2011; Ludwig et al. 2019; Richards-Tutor et al. 2016). However, less evidence is available for more transparent orthographies such as German. In the current study, we investigated whether and to what extent a computerized grapho-phonological training program that has been successfully evaluated in monolingual German children is also effective in immigrant children learning German as L2. Based on the empirical evidence described below, we expected beneficial training effects (1) on early literacy skills, and (2) transfer effects on German vocabulary.

Training components and their role in L1 and L2 literacy acquisition

In the current study, training was administered via the computerized training program Lautarium (Klatte et al. 2017b), which has proved effective in children with dyslexia (Klatte et al. 2014, 2017b), first-grade at-risk children (Klatte et al. 2016), and typically developing readers (Klatte et al. 2018). Lautarium fosters basic literacy skills through a combination of systematic phonics instruction and auditory-phonological training. Phonics instruction means that children are taught to read and spell syllables and words via phonics-based strategies, i.e., by acquisition and application of grapheme-phoneme and phoneme-grapheme correspondences. Several systematic reviews und meta-analyses consistently proved the effectiveness of phonics-based instruction in fostering literacy acquisition in struggling and typically developing readers (Ehri et al. 2001; Galuschka et al. 2014; McArthur et al. 2012, 2018). Phonics-based interventions help the children to grasp the alphabetic principle, i.e., to understand how letters in written words map onto phonemes in spoken words (Hatcher et al. 2004; Snowling and Hulme, 2012). According to the “self-teaching hypothesis” proposed by Share (1995), phonological decoding, i.e., the ability to decode unfamiliar words through application of grapheme-phoneme mappings, enables acquisition of word-specific orthographic representations, which in turn provide the foundation of rapid word recognition. The proposed mechanism has been confirmed in simulation studies (Ziegler et al. 2014). Fostering decoding abilities thus set the ball rolling for acquisition of orthographic knowledge and fluent reading.

Several studies confirmed that, for L2 learners, phonics instruction is as effective, or even more effective when compared to children instructed in L1 (Ginns et al. 2019; Goldenberg 2011; Ludwig et al. 2019). Importantly, this holds true regardless of oral L2 proficiency, i.e., even students with poor oral language scores profit from phonics-based instruction (Goldenberg 2011; Gunn et al. 2005; Manis et al. 2004).

In Lautarium, phonics instruction is combined with intensive auditory-phonological training, i.e., phoneme perception and phonological awareness. Phonological awareness, i.e., the ability to consciously access and manipulate the sound units of language, has proved fundamental for reading and spelling acquisition in numerous studies since the 1980s (e.g. Bradley and Bryant 1983; Wagner and Torgesen 1987). Phonological awareness in kindergarten predicts later literacy skills (for reviews, see Melby-Lervåg et al. 2012; Pfost 2015), and children with literacy disorders exhibit severe difficulties in phonological awareness tasks when compared to typically developing children matched for chronological age, or reading age (Melby-Lervåg et al. 2012). Furthermore, training phonological awareness in preschoolers fosters reading and spelling acquisition in the early grades (Ehri et al. 2001; Fischer and Pfost 2015). Concerning phonological awareness in L2 learners, mixed results have been reported. Jongejan et al. (2007) found comparable performance levels and developmental trajectories for phonological awareness from Grade 1 to 4 in children with English as a first vs. second language. In a meta-analysis of 43 studies using phonemic awareness tests, a small but significant effect was found in favor of the first-language learners (Melby-Lervåg and Lervåg 2014). For L2 learners of German, some studies reported phonological awareness performance comparable to native German children (Czapka et al. 2019; Duzy et al. 2013a, b; Limbird et al. 2014; Limbird and Stanat 2006), whereas others found group differences in favor of the native children (Blatter et al. 2013; Pröscholdt et al. 2013; Schöppe et al. 2013; Weber et al. 2007). Like in native-speaking children, phonological awareness predicts reading and spelling skills in L2 learners of English (Harrison et al. 2016; Jongejan et al. 2007; Quiroga et al. 2002) and German (Czapka et al. 2019; Duzy et al. 2013a; Limbird et al. 2014). Furthermore, phonological awareness training fosters early literacy skills in L2 (Blatter et al. 2013; Schöppe et al. 2013; Weber et al. 2007; Yeung et al. 2013).

In contrast to other phonics-based programs, Lautarium includes intensive training of phoneme perception. Phoneme categories of the native language are established within the first year of life (Kuhl 2004; Werker and Tees 1984), but continue to be refined up until adolescence (Hazan and Barrett 2000). Phoneme perception predicts reading acquisition in typically developing children (Boets et al. 2011), and is deficient in children with reading disorders (Klatte et al. 2013; Manis et al. 1997; McBride-Chang 1995). Several authors argue that speech perception deficits are causally involved in disordered reading acquisition (Hornickel et al. 2009; Noordenbos and Serniclaes 2015; Ziegler et al. 2009). Poor phoneme perception may result in less specified (Vandermosten et al. 2020) or noisy phonological representations in the mental lexicon which, in turn, affect processing, storage, and access to phonological information, and hamper the building-up of phoneme-grapheme-correspondences (Elbro and Jensen 2005; Swan and Goswami 1997). Concerning second-language learners, studies revealed deficits in L2 phoneme perception in migrant preschool children learning Dutch (Janssen et al. 2017) and English (McCarthy et al. 2014) as L2, and in primary school children learning German as L2 (Darcy and Krüger 2012). Deficits in L2 speech perception (indicated by speech recognition in noise) were even found in bilingual adults speaking fluent, accent-free English, indicating less robust L2 phoneme representations despite high L2 proficiency (Rogers et al. 2006; Tabri et al. 2011).

Phoneme perception tasks in Lautarium require discrimination and identification of consonants and vowel lengths. Vowel length perception is included since, in German language, vowel length is phonemic and orthographically marked. For example, the spoken German words kann (/kan/, [can]) and Kahn (/ka:n/, [barge]) differ only in vowel length. In orthography, vowel length is marked by the letters following the vowel according to specific rules. Thus, in order to understand and use the respective rules, a child needs to be able to classify a vowel as short or long. Impaired vowel length perception has been reported for German children with literacy disorders (Klatte et al. 2013; Landerl 2003; Steinbrink et al. 2014), and for children learning German as L2 (Brunner 2012).

Altogether, the available evidence indicates that the training components of Lautarium may be efficient in fostering early literacy skills in L2 learners. In the following section, we will summarize the evidence underlying our second hypothesis, stating that Lautarium training fosters vocabulary knowledge in L2 learners.

Effects of literacy acquisition on spoken language in L1 and L2 learners

Despite huge individual differences, as a group, second-language learners with an immigrant background have lower proficiency in the language of instruction when compared to their monolingual peers (Melby-Lervåg and Lervåg 2014; Schaars et al. 2019; Wendt and Schwippert 2017). This disadvantage is especially strong for children exposed to another language at home (Hoff 2013; Wendt and Schwippert 2017). Proficiency in the language of instruction is closely related to school achievement, especially with respect to literacy skills (Prevoo et al. 2016). Concerning the latter, studies consistently showed that L2 vocabulary knowledge reliably predicts reading (text) comprehension, but has a minor or no effect on word decoding and phonological awareness (Babayiğit, 2015; Knell 2018; Melby-Lervåg and Lervåg 2014; Prevoo et al. 2016). These findings led to the general conclusion that instruction in early literacy skills in L2 should not be postponed until a certain vocabulary size is acquired (Goldenberg 2011; Knell 2018). Here, we go a step further, arguing that L2 vocabulary acquisition may be enhanced by training early reading skills in L2. This assumption is based on recent evidence concerning the effects of reading acquisition on phonological long-term representations, and their role in learning new words. Especially, it has been argued that learning to read in an alphabetic system fosters the building-up of fine-grained, precise phonological representations on the lexical (i.e., words) and sub-lexical levels (syllables and phonemes), which in turn support precise encoding and maintenance of novel phonological forms. Different lines of evidence confirm this view. First, studies on categorical speech perception revealed that literacy acquisition improves precision of phoneme boundaries in adults (Serniclaes et al. 2005) and children (Burnham 2003; Hoonhorst et al. 2011). Second, it has been shown that phonological awareness predicts vocabulary acquisition in L1 and L2 learners (Kalia et al. 2018; Marecka et al. 2018), and that training of phonological awareness (i.e., blending, deletion, and substitution of phonemes in words) improves later recall of the target words much more than rhyme training or teaching definitions (Melby-Lervåg and Hulme 2010). In addition, training of phonological awareness and letter-sound mappings in preschoolers fosters long-term learning of new, unfamiliar names associated with cuddle toys (de Jong et al. 2000). Third, studies revealed beneficial effects of reading instruction on verbal short-term memory (for review, see Demoulin and Kolinsky 2016). Especially, it has been shown that word decoding skills in 6-year-olds predict subsequent growth in nonword repetition (Nation and Hulme 2011), which in turn is a valid predictor of vocabulary acquisition in L1 (Baddeley et al. 1998) and L2 (Hummel and French 2016; Service 1992). Fourth, studies with school-age children revealed that exposure to printed words enhances learning of unfamiliar pseudowords (Jubenville et al. 2014; Ricketts et al. 2009; Rosenthal and Ehri 2008) and of novel L2 words (Chambrè et al. 2020; Hu 2008). For example, in the Jubenville et al. (2014) study, children had to learn novel labels for unfamiliar objects presented pictorially. In one learning condition, printed words were presented in addition to the pictures and spoken labels. Print reliably facilitated the acquisition and recall of the words in both monolingual and bilingual children. This “orthographic facilitation effect” is still more pronounced when students have to decode the words during the learning phase (Chambrè et al. 2020), and is attributed to the interaction of orthographic and phonological representations in the mental lexicon. Especially, it has been argued that, during reading acquisition, increasing orthographic knowledge permanently refines phonological word representations (Muneaux and Ziegler 2004), and that lexical representations integrating semantic, phonological and orthographic knowledge are especially stable and easily retrieved (Perfetti and Hart 2002).

Based on this evidence, we expect the intensive phonics and phonological training provided by Lautarium to foster both literacy skills and vocabulary knowledge in immigrant children with German as a second language.



Participants were 26 s-grade children (14 male, mean age 7;11 SD 5 months) with an immigrant background learning German as a second language. The children were recruited from four classes of a primary school in Bavaria. Children from two classes (n = 12; 5 male, mean age 7;9, SD 4.2 months) were assigned to the training group. The remaining children (n = 14, 9 male, mean age 8;0, SD 5.4 months) served as controls. One child from the control group did not participate in the follow-up test, so the number of controls was reduced to 13.

Written informed consent was provided by the parents. The study was approved by the responsible Bavarian school authority and data protection officer. However, applicable data protection rules did not allow inquiry of the children’s heritage languages.


Short- and long-term effects of Lautarium-training were assessed by means of a pretest–posttest-follow-up design (see Fig. 1). The study started with a pretest in the middle of the school year. Thereafter, the training group performed Lautarium during school lessons, 5 times per week, for 20–30 min, over a period of 8 weeks. The control group continued to receive standard classroom instruction. The training phase was followed by posttest. The follow-up test was performed after an interval of 9 weeks, in which both groups attended regular classroom lessons.

Fig. 1
figure 1

Design of the study

Children’s reading, spelling, phonological awareness, and vocabulary scores were assessed at each measurement point (pretest, posttest, follow-up). In addition, a nonverbal reasoning task was included in the pretest in order to rule out group differences in general cognitive abilities. Testing was done in the morning in a separate classroom in the school. Tests of reading comprehension, spelling, phonological awareness and nonverbal intelligence were administered in groups. Decoding skills (reading aloud) and vocabulary (picture naming) were assessed individually.

Diagnostic tests

Word and pseudoword decoding were assessed by means of the “One Minute Test of Reading Fluency”, a subtest from the “Salzburg Reading and Spelling Test” (SLRT-II; Moll and Landerl 2014). Two lists are presented to each child, one containing words and one containing pseudowords constructed according to German phonotactic rules. The task is to read the words and pseudowords out aloud correctly as fast as possible. Raw scores represent the number of items correctly read within 1 min.

Reading comprehension on the level of words was assessed by means of a subtest of the “Test of Reading comprehension for Grade 1 to 6” (ELFE 1-6; Lenhard and Schneider 2006). The children have to select, out of four alternatives, the word that matches a target picture. Raw scores represent the number of items completed correctly within 3 min.

For spelling, a standardized German spelling test for second-graders was used, in which the children have to write down 15 words and 3 sentences according to dictation (Hamburg Spelling Test, HSP 2; May 2012b, c). Five raw scores are derived from this test: number of correct words, number of correct graphemes, and success in application of letter-sound mappings (“alphabetic strategy”), spelling rules, and morphemic knowledge. As the norms provided in this test are applicable only at the end of the second school year, a further version of the “Hamburg Spelling Test” (HSP 1 + ; May 2012a) with norms for the middle of the school year was additionally included in the pretest.

Phonological awareness was assessed by means of three subtests from the “Kaiserslauterer Gruppentest zur Lautbewusstheit” (Klatte et al. manuscript in preparation). The subtests required speeded identification, deletion, and substitution of sounds in words presented pictorially. Concerning identification, the children have to decide whether or not a specific speech sound is present in a number of target words. In case of “yes”-answers, the children also have to indicate the position of the sound in the word (beginning, middle, end). For sound deletion and substitution, the children have to indicate, for each target word, which word emerges when the second phoneme is eliminated or substituted, respectively. For each subtest, raw scores represent the number of items correctly solved within 3 min.

For vocabulary assessment, a subtest from a German test battery for language proficiency in 5- to 10-year-old children was included (SET 5-10; Petermann et al. 2010). The test consists of 40 pictures representing objects and actions. For each picture, the child has to pronounce the appropriate German word. The raw score represents the number of items correctly solved.

For assessment of nonverbal abilities, 16 matrices increasing in difficulty were selected from a subtest of nonverbal intelligence in children. The test is part of a widely used developmental test battery for primary school children in Germany (Esser et al. 2008). For each matrix, the missing element has to be chosen out of 5 alternatives. The raw score represents the number of items correctly solved.


Training was administered via the computer program Lautarium (Klatte et al. 2017a, b). The program combines auditory-phonological training with training of grapheme-phoneme-mappings and reading and spelling of transparent words. In addition, rapid access from written words to meaning is included by means of a word-to-picture matching task with fast presentation times.

Structure and contents of the training program

Lautarium uses two types of building blocks representing phonemes and the corresponding graphemes, respectively. Phonemes are represented by blocks with pictures of easy-to-name objects. The phoneme represented by a specific block is the initial sound of the picture’s verbal label (e.g., the picture of a ball represents the phoneme/b/). Graphemes are represented by blocks with a single letter or a letter combination that is usually used for a specific phoneme in German orthography (Thomé 2000), e.g., the phoneme/a/is usually represented by the grapheme 〈ei〉.

The speech material comprises about 1400 German nouns and 2500 pseudowords differing in phonological structure (CV, VCV, CVC, CCV, VCCV). Each speech item is implemented through high-resolution recordings produced by two professional speakers (a male and a female). For about 40% of the nouns, pictorial presentation is also available.

Tasks belonging to different training domains are intermixed, and within each domain, children start with simple tasks and proceed to more difficult ones. Due to the combinations of tasks and materials, Lautarium consists of 58 different exercises. Each exercise comprises 10–30 trials, depending on task complexity. Responses are followed by immediate feedback (correct, incorrect, time-out). In case of errors or time-outs, the respective trial is repeated until the correct answer is provided in time. Depending on the percentage of trials correctly solved in the first attempt, the child either has to repeat the respective exercise, or proceeds to the following one.

Before starting a new task, children have to work through an interactive instruction that provides explanation, practice with examples, and informative feedback. In addition, Lautarium is equipped with a token system fostering focused, intensive training.

Phoneme perception

Phoneme perception tasks require discrimination (same/different response to pairs of pseudowords) and identification of plosives and vowel lengths. Plosive perception tasks proceed from simple syllable structures to more complex structures in which the target plosives are embedded in consonant clusters. For the latter, phoneme perception deficits in children with literacy disorders are especially pronounced (Klatte et al. 2013). In vowel lengths identification tasks, target words are presented auditory or pictorially, and children have to classify the lengths of the vowels as short or long.

Phonological awareness

Phonological awareness tasks require sound-to-word matching, segmentation, blending, and matching of initial or final sounds in words and pseudowords. For sound-to-word matching, a phoneme block is presented, followed by two words presented auditorily (each synchronized with a response button) or pictorially. The children have to decide whether or not the target phoneme is present in one of the words. For segmentation, words are presented auditorily or pictorially. The corresponding phoneme blocks have to be selected and arranged in the correct order. In blending tasks, a sequence of phoneme blocks constituting a word is presented, and the children have to select the correct word by clicking on the corresponding picture. Matching of initial or final sounds is trained by means of odd-one-out tasks originally used by Bradley and Bryant (1983). Sequences of three words or pseudowords are presented, each synchronized with a response button. The children have to decide which of the three items differs from the others with respect to the initial or final sound.

Grapheme-phoneme correspondences

In Lautarium, training of grapheme-phoneme-mappings and phonological training are interlinked. For example, when the correct phonemes have been selected in phoneme identification or segmentation tasks, the corresponding graphemes have to be assigned.

Reading and spelling

In Lautarium, newly acquired phonological skills and letter knowledge are immediately applied in reading and spelling tasks. Reading tasks comprise matching printed words to spoken words (and vice versa), and matching pictures to printed words. The latter task focuses on reading speed, aiming to foster direct access from print to meaning and thus enlarge the children’s sight word vocabulary. Spelling tasks require segmentation of target words presented auditorily or pictorially into their constituent graphemes.

Statistical analyses

In the spelling test, raw scores reflecting application of spelling rules and morpheme knowledge were highly correlated (pretest: r = 0.77, posttest: r = 0.71, follow-up: r = 0.85) and thus summed up to a single score termed “orthographic strategy” for the analyses.

Potential group differences in learning gains between pretest and posttest, and between pretest and follow-up were assessed by means of analysis of covariance (ANCOVA), with the pretest score from the respective test included as covariate (O’Connell et al. 2017; Rausch et al. 2003). In case of violation of ANCOVA assumptions, repeated measurement analyses of variance (ANOVAs) were performed including measurement point as within-subject and treatment group as between-subject factor. For significant results (p < 0.05), effect sizes corrected for pretest scores (dcorr) were calculated (Klauer 1989). As performing multiple analyses bears the danger of cumulation of type 1 error, we verified our results by performing repeated measurement multivariate analyses of variance (MANOVAs) for each of the outcome domains (reading, spelling, phonological awareness). MANOVA statistics are provided in the appendix.


For the sample as a whole, one-sample t-tests revealed that standard scores (T-scores) for reading comprehension, spelling, and vocabulary were significantly below the means of the norm samples (one sample t-tests, all ps < 0.05), with effect sizes ranging from d = 1.86 for vocabulary to d = 0.46 for reading comprehension (see Table 1). Concerning decoding, mean scores for words and pseudowords corresponded to the percentile rank bands 47–49 and 50–53, respectively (no T-scores were available for this test). Thus, as expected from prior research, deficits in reading comprehension, spelling, and—most pronounced—vocabulary were confirmed in the current sample of L2 learners, with decoding skills being in the average range.

Table 1 Mean standard scores (T-Scores) at pretest and deviations from the norm sample means (T = 50) for vocabulary, spelling, and reading (N = 26)

Training effects

Performance raw scores and statistical results concerning treatment effects at posttest and at follow-up are provided in Tables 2 and 3, respectively. For all variables, pretest scores did not differ significantly between groups (t-tests, all ps > 0.05).

Table 2 Mean performance in training and control group and treatment effects at posttest (n = 26)
Table 3 Mean performance (raw scores) in training and control group and treatment effects at follow-up (N = 25)

Reading No significant treatment effects were found, neither for word comprehension nor for decoding of words and pseudowords (all ps > 0.21).

Spelling Due to inhomogeneity of regression slopes for number of correct words, number of correct graphemes, and application of letter-sound-mappings (“alphabetic strategy”), training effects at posttest were analyzed by repeated measurement ANOVAs, with time (pretest vs. posttest) as within-subject factor and treatment group as between-subject factor. These analyses revealed non-significant main effects of group (F(1,24) < 1 in all cases), and significant (correct words: F(1,24) = 26.23, p < 0.001; correct graphemes: F(1,24) = 43.95, p < 0.001) and marginally significant (alphabetic strategy: F(1,24) = 4.17, p = 0.05) main effects of time. More important, for each of the 3 variables, significant time × group interactions confirmed larger gains in the training group when compared to the controls (see Table 2). The advantages of the training group endured until follow-up. For application of orthographic rules, a significant training effect of medium effect size emerged at follow-up. Effect sizes were in the medium range (dcorr = 0.35–dcorr = 0.71). The results were confirmed through MANOVAs on the four spelling variables by significant time × group interactions (see Table 4 in the appendix).

Phonological awareness At posttest, significant and marginally significant (p = 0.053) effects of strong effect size in favor of the training group were found for sound deletion and sound substitution, respectively. However, these effects were no longer significant at follow-up. MANOVAs on the three awareness measures (see Table 4 in the appendix) revealed non-significant time × group interactions when three measurement times were included, and when pretest and follow-up were included (p > 0.10 in both cases), whereas a MANOVA including pretest and posttest data confirmed a significant time × group interaction (p =0.025).

Vocabulary Significant beneficial training effects of moderate effect size were found at both posttest (dcorr = 0.72) and follow-up (dcorr = 0.62).


In the current study, the effectiveness of a computerized phonological training program was assessed in children with an immigrant background receiving literacy instruction in German as L2. Training effects were assessed by means of a pretest–posttest-follow-up design. Training was administered via the program Lautarium, which has proved effective in prior studies with native German children with and without literacy disorders. In the current sample of L2 learners, we found significant and enduring training gains of medium effect size in spelling and vocabulary knowledge. Concerning phonological awareness, beneficial training effects of strong effect sizes were found in 2 of 3 subtests at posttest. For reading, no training effects were found, neither for decoding (reading aloud) of words and pseudowords, nor for comprehension of words read silently.

Pretest scores confirmed substantial deficits in German vocabulary knowledge in the current sample of second-graders instructed in German as L2. Average scores were more than 1.5 SD below the mean of the norm sample. Inspection of individual data revealed that vocabulary scores were at least 1 SD below the norm for 23 children (88%), and at least 2 SDs below the norm for 9 children (35%). In contrast, concerning basic literacy skills, the children performed within the average range for word and pseudoword decoding, and only slightly (i.e., less than one half of an SD) below the norm for spelling and for comprehension of words read silently. This pattern of results indicates that acquisition of basic reading skills in L2 does not depend on L2 vocabulary knowledge. In line with this argument, vocabulary scores at pretest were uncorrelated to reading, spelling, and phonological awareness (all ps > 0.24). These findings replicate prior results showing average or near-to-average decoding skills despite deficient vocabulary knowledge in children instructed in L2 (Babayiğit 2015; Knell 2018; Limbird et al. 2014; Melby-Lervåg and Lervåg 2014; Prevoo et al. 2016). Concerning instruction, the results of the current study add further evidence against the argument that teaching basic literacy skills in L2 learners should be postponed until a basic vocabulary is established (Goldenberg 2011; Knell 2018).

Analyses of the training effects revealed significant and marginally significant (p = 0.053) effects of strong effect size in two of three measures of phonological awareness, i.e., deletion and substitution of speech sounds in words. However, these effects were no longer significant at follow-up.

Sound deletion and substitution are not explicitly trained in Lautarium. The latter focusses on blending of phonemes into words, and segmentation of words into the constituent phonemes. These abilities are a precondition for reading and spelling, respectively (Schuele and Boudreau 2008). Deletion and substitution are more complex tasks, as they require not only segmentation, but also manipulation of sounds in words (Pufpaff 2009; Schuele and Boudreau 2008). Inspection of Tables 2 and 3 indicates that the children from the training group made strong gains during the training period (Table 2), but no further progress after termination of the training, i.e. between posttest and follow-up (Table 3). The controls, in contrast, improved continuously between pretest and follow-up, where they caught up with the training group. We may thus conclude that phonological awareness training in Lautarium accelerates the acquisition of complex, non-trained phonological awareness skills. Unfortunately, training gains on segmentation and blending could not be assessed, as respective tasks were not included in the phonological awareness test. However, we may assume that the enduring effects on spelling described below are accompanied by significant gains in segmentation.

Lautarium-training evoked significant and enduring effects of medium effect size on spelling accuracy (number of words and graphemes correctly spelled; correct application of letter-sound-mappings). Thus, the beneficial effects of Lautarium-training on spelling found in monolingual German children (Klatte et al. 2016, 2017a, b) and in classes of second-graders with and without a migration background (Klatte et al. 2018) were replicated in a sample of children instructed in German as L2. This finding is in line with prior studies demonstrating beneficial effects of phonics-based literacy instruction in both L1 and L2 learners (Ginns et al. 2019; Goldenberg 2011; Ludwig et al. 2019).

In addition to overall spelling accuracy, the spelling test used in the current study allowed analyses of orthographic skills (“orthographic strategy”, i.e., correct application of spelling rules and morpheme knowledge when spelling non-transparent words). Even though Lautarium-training does not include rule- or morpheme-based spelling strategies, a significant beneficial effect on these skills was found at follow-up. Different mechanisms might contribute to this transfer effect. First, it might be attributed to the intensive training of vowel-length perception. As described in the introduction, vowel length is phonemic in German, and orthographically marked. In order to understand and successfully applicate the respective orthographic rules, children need to be able to classify a vowel as short or long. Therefore, effective training of vowel-length perception eliminates a major barrier of spelling rule acquisition. Second, Lautarium-training includes rapid word recognition (word-to-picture matching) of non-transparent words. This might foster the acquisition of orthographic representations of these words, and also of novel words that share orthographical or morphological features (Pacton et al. 2018; Tucker et al. 2016). Third, the significant improvement in spelling transparent words (“alphabetic strategy”) might enhance the “self-teaching mechanism” for spellings of novel, non-transparent words. It has been shown that, in addition to phonological recoding (Share, 1995), spelling acts as a powerful self-teaching tool in orthographic learning (Conrad et al. 2019; Shahar-Yames and Share 2008).

Concerning reading, no effects of Lautarium-training were found. A possible explanation for this is that, for transparent orthographies such as German, phonological skills are more closely related to spelling when compared to reading. Respective studies in German children revealed that the variance explained by phonological awareness declines rapidly after the first school year for reading, but remains reliable in later grades for spelling (Ennemoser et al. 2012; Landerl and Wimmer 2008; Wimmer and Mayringer 2002). This is because grapheme-phoneme-relations are highly regular, whereas phoneme-grapheme-relationships are not (Wimmer and Mayringer 2002). Due to their regularity, letter-sound mappings are easily learned, and word reading accuracy reaches ceiling at the end of the first school year (Seymour et al. 2003). Therefore, learning to read in German requires only basic phonological awareness skills. As phonological awareness training is a key aspect in Lautarium, and the current study was performed in second-graders, stronger effects on spelling when compared to reading were to be expected. Furthermore, studies with children instructed in German language indicate that the predictive strength of phonological awareness for reading is still lower in L2 when compared to L1 learners (Duzy et al. 2013a, b).

The most important finding from the current study is that computerized grapho-phonological training evokes enduring beneficial effects on vocabulary acquisition in L2 learners. This finding confirms our assumption that phonics-based instruction in L2 learners effectively fosters both written and oral language skills.

We may attribute the significant gains in vocabulary to the integration of auditory-phonological and orthographic training. As outlined in the introduction, phonological awareness skills predict vocabulary acquisition in L1 and L2 learners (Kalia et al. 2018; Marecka et al. 2018), and phonological awareness training fosters word learning in children (de Jong et al. 2000; Melby-Lervåg and Hulme 2010). Concerning L2 learners, phonological training boosts the acquisition of long-term phonological knowledge of the L2, e.g., sensitivity for the phoneme structure of words, and for phonemic features and phonotactic regularities that might be non-existent in L1. The increasing knowledge on the phonological characteristics of the L2 acts as a scaffold when phonological representations of novel words are build up. Especially, when encountering a novel word, existing phonological representations are activated and used to create a representation of the new word in short-term memory. This representation can later be encoded in long-term memory. The efficiency of this process depends on the range and quality of phonological representations already available to the learner (Marecka et al. 2018). Studies confirmed that pre-existing phonological knowledge supports vocabulary acquisition in L2 learners (Majerus et al. 2008; Masoura and Gathercole 2005).

Acquisition of phonological representations is further supported by practicing grapheme-phoneme-mappings and reading and spelling of words. The beneficial effect of print exposure on novel word learning (orthographic facilitation) described in the introduction has proved reliable in a recent systematic review of 23 studies (Colenbrander et al. 2019). In alphabetic systems, orthographic word forms are well specified and systematically related to pronunciations. Therefore, orthographic representations help learners to specify and clarify phonological forms, leading to distinctive and robust phonological representations. Recent evidence suggests that the orthographic facilitation effect is still more pronounced in L2 when compared to L1 learners (Qu et al. 2018). A possible explanation is that, in L2 learners, language representations are often acquired through spoken and written language input in parallel. Therefore, the interlinks between phonological and orthographic representations are presumably stronger when compared to L1 learners. In Lautarium, word meaning is provided by pictures accompanying the target words. The program thus enables the learner to establish high-quality lexical representations consisting of well-integrated semantic, phonological, and orthographic constituents (Perfetti and Hart 2002).

Despite the encouraging findings, the current study has a number of limitations. First, the sample size is small, and—for reasons of data protection regularities—little is known concerning the language backgrounds of the participants (e.g., characteristics of native language and native language proficiency, main language spoken at home, age at onset of L2 acquisition). Analyses of potential moderating effects of the heritage language was therefore impossible. Second, allocation of participants to treatment conditions (training vs. control) was not randomized, but based on school classes. Even though the groups proved highly comparable at pretest, we cannot rule out differences in regular class instruction during the study period. Third, due to the complexity of the training program, the effectiveness of specific components cannot be evaluated. However, the literature reviewed above lets us assume that the success of the program results from its complexity, as this allows integration of phonological, orthographic, and semantic learning. Finally, we do not know whether and in what extent the current results can be generalized to other forms of phonics-based instruction. It might be possible that the effects result from the specific training format (computerized training), schedule (intensive, daily training for a period of 9 weeks), and/or specific features of materials and implementation. For example, in Lautarium, each speech item is represented by 4 high-quality recordings produced by 2 professional speakers (a male and a female). This variability might contribute to the beneficial training effect on L2 learners’ vocabulary (Sommers and Barcroft 2011).

In line with prior studies, the current results suggest that teaching basic literacy skills (i.e., phonological awareness, letter-sound mappings, and word-level reading and spelling) is effective in immigrant children with deficient vocabulary knowledge in the language of instruction. Furthermore, becoming literate in L2 enhances L2 vocabulary acquisition. Consequently, in courses targeting vocabulary knowledge, teachers should include written words in addition to phonology and semantics as soon as basic reading skills are established.