Introduction

Mastering the associations between letters and sounds is a sine qua non for exhaustive reading (Hulme et al., 2012; Paige et al., 2018). Additionally, fluent reading requires the identification of words based on their orthographic properties. Learning these properties is a consequence of reading experience, as children encode the letters and their relative positions in a word, and internalize sublexical regularities on the one hand—sequences of letters, structural redundancies, positional frequencies—(Conrad et al., 2013), and lexical whole-word information on the other hand. This knowledge facilitates automatic access to words´ mental representations (Chetail, 2015; Chetail et al., 2015) and has proven to be a good predictor of reading and writing in primary school, beyond phonological knowledge or intelligence (Deacon et al., 2019; Rakhlin et al., 2019; Rothe et al., 2015; Zaric et al., 2020). Specifically, a child's knowledge about combinations of letters and syllables at the sublexical level is related to greater accuracy in reading and writing, while word-lexical knowledge is related to greater fluency (Conrad et al., 2013; Karageorgos et al., 2020).

Despite the evidence about the role of orthographic knowledge in reading and writing, the question of how this knowledge develops is still an unanswered issue. One source of debate comes from explanatory theories of development. While stage theories assume that sublexical knowledge is a prior stage to the construction and use of lexical representations (Ehri, 2014), dual and continuity theories argue that both sublexical and lexical knowledge emerge early and work hand to hand for efficient access to word´s mental representations (Pritchard et al., 2018; Treiman, 2017). Previous studies exploring the evolution of orthographic knowledge in primary school support the idea of a developmental time window for extracting sublexical and lexical orthographic properties of words. For example, Bahr et al. (2012) observed in a cross-sectional study that in a free writing task in English, 1st graders made mostly phonological errors and once the grapheme-phoneme association was mastered, mostly in 3rd grade, these errors were reduced giving way to morphological and lexical errors. In the same line, in a longitudinal study with French children, Bosse et al. (2021) observed that lexical writing errors decreased between 12 and 15 years of age. However, even at that age lexical knowledge was not homogeneous across words. More concretely, certain errors were persistent especially in words with inconsistent letters—the same letter can be associated to different sounds—and in homophones—words with the same phonology but different spelling structure, like “aux/eau”—.

However, writing assessments do not take into account the directionality in the phonology-orthography association. Indeed, another source of debate comes from the distinct processes involved in the Orthography-to-Phonology or Phonology-to-Orthography association—OP, PO henceforth—required for reading and writing respectively, which settle fundamental differences in exploring orthographic knowledge. While reading is possible even when the orthographic structure is not completely internalized—it is possible to phonologically realize a partially activated orthographic structure using vocabulary knowledge, for instance-, writing requires a process of retrieving the complete orthographic structure, where all the letter identities and positions across the string are well specified and retrieved from the phonological input. This might be the reason why between 2nd and 5th grade of primary school, children can still write incorrectly words that are able to read correctly (Binamé et al., 2015). Still, while some authors suggest that spelling performance might be the best indicator that the word is being associated with the proper orthographic structure (Bosse et al., 2014; Pontart et al., 2013; Pritchard et al., 2020) other authors claim that exhaustive naming of words containing specific structures is a good proxy that these structures are being adequately internalized (Compton et al., 2020; Share, 2008; Tucker et al., 2016).

The above mentioned results suggest that both the OP-PO directionality and the degree of inconsistency in the relationship between certain letters and sounds may ostensibly play a role on the developmental pathway reported across studies. For instance, opaque orthographies such as English or French show a high degree of inconsistency both from graphemes to phonemes (OP or feedforward: the letter i is associated with a different sound in pint/mint) and from phonemes to graphemes (PO or feedback: sound i can be written i as in mint, ee as in deep, or ea, as in heap). On the other hand, transparent orthographies such as Spanish or Italian show a low degree of inconsistency, which is limited to a small set of letter-sound connections. Cross-sectional studies have shown that the high degree of inconsistency in opaque orthographies not only influences the amount of errors but could also boost children’s reliance on lexical strategies (Bosse et al., 2015).

More concretely, studies in opaque orthographies have shown that PO and OP inconsistency effects—committing more errors in inconsistent than consistent words—decrease significantly in reading and writing between 4 and 5th grade (Weeks et al., 2006). Schmalz et al. (2020) observed that in English and German, context-dependent letters—OP inconsistency—hindered word reading in 2nd, 3rd, and 4th grades, suggesting that during this period, children relied on sublexical strategies to read and write. Despite this, robust frequency and neighborhood effects have been observed in these orthographies, reflecting the employmet of lexical strategies at early ages. For example, Leté et al. (2008) observed that although French children from 1st to 5th grade wrote consistent words with fewer errors than inconsistent words in spelling to dictation tasks, all grades showed facilitative word frequency effects—fewer writing errors in frequent than infrequent words. Similarly, Bosse et al. (2003) and Leté et al. (2008), observed in French children in the same age range, that the number of neighbors of a word—words with a similar orthography except for the substitution or transposition of a letter—, had a facilitative effect, indicating that children used their lexical knowledge to extract information about a word´s orthographic structure. These data suggest that in opaque orthographies, lexical strategies are helpful to construct the orthographic code and in fact are combined with sublexical strategies from 1st grade. However, the univocal rules governing letter-sound associations in transparent orthographies lead children to master the alphabetic code even by first grade (Seymour et al., 2003). Due to this reason, it has been suggested that children in such orthographies might rely more on sublexical processes, modulating the type of errors observed during development, although the evidence in this regard is scarce and inconclusive.

Studies in transparent orthographies have shown inconsistency effects similar to those observed in opaque orthographies, with scarce evidence of lexical effects at an early age. Regarding reading, Goikoetxea (2006, see also Jiménez & Hernández, 2000) explored the errors made by Spanish children in 1st and 2nd grade, and observed that most errors were not visual (rotations, substitutions) but phonological, and corresponding to context-dependent letters (ca/ce). In these studies certain use of lexical knowledge was found—children made fewer reading errors in words than in pseudowords—, however word frequency was not manipulated. Álvarez-Cañizo et al. (2018) provided additional support for the claim that context-dependent errors reveal specific sublexical difficulties even in 3rd grade, although in this case only pseudoword reading was assessed.

Regarding writing, Abchi et al. (2009) observed persistent spelling to dictation errors in words containing context-dependent letters, but no hint of frequency effects in a longitudinal study in children from 1st to 2nd grade (see also Serrano & Defior, 2012). However, in a cross-sectional study, Carrillo et al. (2013) observed lexical frequency effects in word writing from 2nd to 5th grade, even in words with context-dependent letters suggesting that lexical effects in Spanish begin to emerge in 2nd grade. In another study with children in the same age range, Carrillo and Alegría (2014) observed persistent writing errors in pseudowords with inconsistent letters (b/v). Additionally, the error rate was significantly greater for infrequent syllabic combinations—e.g., the combination ba is more infrequent in Spanish than the combination bo-. This outcome supports the idea of a sensitivity of Spanish children towards sublexical structures, and goes in line with recent findings with children from kindergarten to 3rd grade (Zhang et al., 2021). These authors settled three developmental phases according to specific error profiles. In Phase 1 the most frequent errors entailed vowel substitutions, additions and omissions; in Phase 2 these errors entailed silent h omissions and consonant substitutions, additions and omissions. Finally in Phase 3, errors involved silent h additions and same-sound consonant substitutions (c/k, and particularly b/v). Most 3rd grade children were in this phase, suggesting that even when the alphabetic code is mastered, sublexical PO inconsistencies might be problematic for writing in Spanish.

Taken together, the reported findings suggest that even in orthographic transparent orthographies, specific letter-sound inconsistencies can modulate the strategies used to consolidate orthographic representations, and support the view of a preeminence of sublexical strategies in the first years of reading and writing experience. However these findings do not rule out the hypothesis of a combination of sublexical and lexical knowledge, since key comparisons that might allow capturing lexical effects were not included in previous works. To date no study has explored this progress by systematically manipulating sublexical variables—OP and PO inconsistency—and lexical variables—frequency—, in reading and writing tasks with the same sample.

The present study

This study has two aims. The first one is to explore the extent to which Spanish children use sublexical and lexical strategies when they read and write consistent and inconsistent orthographic structures. Spanish is a highly transparent orthography with a few specific exceptions. Context-dependent letters involving OP inconsistency such as c and g, are pronounced /k/ and /g/ when followed by vowels a, o, u, but are pronounced /θ/ and /ɣ/ when followed by vowels e, i. Inconsistent letters involving PO inconsistency such as b and v share the same pronunciation, and are not subject to any rule, so that the words containing them need to be learned by rote memory. If sublexical strategies have a preeminent role in Spanish, words and pseudowords with context-dependent letters and with inconsistent letters will generate interference leading to more errors than neutral ones, particularly in the first grades of primary school. If lexical strategies are also employed to retrieve the accurate word representation, frequency effects (better writing and reading or high frequency words) might emerge. Additionally, errors might be modulated by the task, with OP inconsistencies being mainly problematic for reading and PO inconsistencies being mainly problematic for writing.

The second aim of the study is to identify the most frequent errors at different grades, as well as the characteristics of the words that present these errors. Previous studies have reported several systematic writing and reading errors including letter substitutions, additions, omissions and transpositions in Spanish children until 3rd grade, together with other type of syllabic errors and incorrect use of silent letter h (Álvarez-Cañizo et al., 2018; Zhang et al., 2021). We set out to identify specific orthographic knowledge difficulties in 3rd to 5th grade children, by exploring the common errors committed at this period in which orthographic representations are consolidating.

To explore these issues, three types of words were employed in the reading and writing tasks: totally consistent neutral words—pato—, words with a context-dependent letter—OP inconsistency, the word cena can be incorrectly read with the sound k—, and words with an inconsistent letter—PO inconsistency, the word veda can be incorrectly spelled with the letter b—. Half of the items in each experimental condition (neutral, context-dependent and inconsistent) were high frequency, and the other half were low frequency items. An important concern to take into account is that almost all inconsistent and context dependent Spanish words have orthographic neighbors and thus, are prone to receive lexical interference. Due to this reason, two distinct sources of neighborhood influence were explored separately. In the first experiment (Experiment 1), words had at least one neighbor that did not affect the key letter manipulated (in the lexicon there is a word with a similar orthographic structure except for a consonant which does not affect the translation of the key letter—palo, ceja, vela). In the second experiment (Experiment 2), words had at least one neighbor dissimilar in the key letter affecting the PO–OP association (in the lexicon there is a word that differs in the key letter with respect to the one to be written or read affecting its phonological or orthographic resolution—e.g., the tested words are dedo, ceja, vaca, when in the lexicon exist dado, caja, boca—). Previous evidence with adults has revealed that lexical neighbors facilitate reading (Yates et al., 2008), but that the existence of neighbors in the lexicon that differ in their OP–PO translation at the same position hinders the activation of precise orthographic representations (Pollatsek et al., 2005). If this is so, the pattern of errors and the strategies employed to read and write context-dependent and inconsistent words might differ between experiments.

Method

Participants

A total of 118 primary school children in 3rd grade (N = 43, age range 7.4–8.4, M = 8.1, SD = 0.25, 23 females), 4th grade (N = 45, age range 8.5–9.4, M = 9.1, SD = 0.26, 25 females) and 5th grade (N = 30, age range 9.5–10.6, M = 10.1, SD = 0.26, 15 females) participated in this experiment with the written consent of their parents. Sample size was appropriate for a power estimation of 0.95 and an expected effect size of 0.40. This sample was recruited from a school located in the urban area of Bilbao (Basque Country). Children who participated in the study met the following inclusion criteria: (a) be enrolled in 3rd, 4th, or 5th grade; (b) absence of neuropsychiatric (ADHD, Autism Spectrum Disorders, etc.) and sensory problems; (c) no history of special education services or reading and/or language therapy; (d) normal or corrected vision. No grade retention was reported. The method of reading instruction used school was the phonic or synthetic method, which implies that all children were explicitly instructed in the alphabetic code from the first year of primary school. All participants were Spanish-Basque bilinguals with Spanish being the L1. They all were enrolled in Spanish model—in these schools Spanish is the schooling language, and Basque language is one subject in the curriculum, whereas in the Basque model, Basque is the main schooling language—. Level of exposure to home language was measured with a questionnaire filled out by the families. This questionnaire explored the language preference in terms of use at the time of testing (average percentage of use of language across various situations, i.e., speaking in everyday situations, amount of language hearing during the day, preferred language when playing with friends, watching TV, reading and writing). All children showed a similar level of exposure to Spanish outside school with a preferential use of that language in a percentage of 80% (M = 79.8, SD = 12.8) with respect to the 20% of Basque (M = 20.2, SD = 11.3). SES was assessed asking parents mark their income-value range which could low (below 1500€), middle or high (above 3000€). In all cases income was ranked in the middle range (between 1500 and 3000€), revealing a lack of variability in family SES.

Materials and design

Experiment 1

For the first dictation and naming tests, 96 words of medium length were selected (M = 5 letters; range 4–6 letters). These words were divided into three experimental sets: 32 neutral words (salto), 32 words with a context-dependent letter (sucio), and 32 with an inconsistent letter (favor). In order to control repetition effects, items were counterbalanced in two lists. In each experimental set the 16 dictation items from list 1 were used as naming items on list 2, and the 16 naming items from list 1 were used as dictation items on list 2. Thus, each child wrote half of the items and named the other half, and dictation and naming measures were obtained for all items.

Within each set, words were divided into two groups based on frequency: low frequency (mean frequency per million = 26.4, range 1–54) and high frequency (mean frequency per million = 517.1, range 96–641.9) respectively in the ONESC child spelling frequencies and neighbors database (Martín & Pérez, 2008). In half of the items the key letter was located on the first syllable, and in the other half it was located on the second syllable. All these words had at least one congruent neighbor (a spelling change that did not imply a phonological or orthographic inconsistency in the key letter (mean N = 13; HFN = 4; and mean N = 10; HFN = 1.9 for the low and high frequency set, respectively) For example, the word salto has in the lexicon the neighbor saldo, the word sucio has the neighbor socio, and the word favor has the neighbor pavor. Given the length of the test and to avoid ceiling effects, pseudowords were only evaluated in the dictation task. To this aim, an additional set of 48 pseudowords was created that fulfilled the same characteristics as the words used in the dictation test. These were created by changing a single consonant of the target words so that both length and spelling structure were the same (for example, for the word salto, there is the pseudoword malto, for dulce, tulce, and for favor, navor.) The list of items is presented in Appendix 1.

Experiment 2

For the second dictation and naming test, 84 words of medium length were selected (M = 4.6 letters; range 4–5 letters). These words were divided into three experimental sets: 28 neutral words (moda), 28 words with a context-dependent letter (cera), and 28 with an inconsistent letter (bola). The key letter was in the first syllable in half of the items, and in the second syllable in the other half. Within each set, the words were divided into two groups based on frequency: low frequency (mean frequency per million = 52, range 1.4–110) and high frequency (mean frequency per million = 435, range 27–3900) respectively in the ONESC child spelling frequencies and neighbors database (Martín & Pérez, 2008). All these words had at least one incongruent neighbor (which implied a spelling or phonological change in the key letter (mean N = 12; HFN = 4.3; and mean N = 8.7; HFN = 1.6 for the low and high frequency set, respectively). For example, the word moda has the neighbor muda, the word cera has the neighbor cara, and the word bola has the neighbor vela. An additional set of 84 pseudowords was created that fulfilled the same characteristics as the words. These were created by changing a single consonant of the target words so that both length and spelling structure were the same. The list of items is presented in Appendix 2. To control the effect of item repetition, items were also counterbalanced in this experiment within two lists. The 48 items that were used for dictation in list 1 were used as naming items in list 2, and those naming items in list 1 were used as dictation items in list 2. Thus, in both experiments, each child wrote half of the items and named the other half, and dictation and naming measures were obtained for all items. To be part of the same experimental design, items must be paired one by one in all key characteristics. As words with incongruent neighbors in the lexical database were shorter and slightly higher in frequency than words with congruent neighbors, it was necessary to split the study in two separate experiments.

Procedure

Dictation task

For the dictation task, children were evaluated in group in their own class. Each participant was given a blank Din A4 sheet with grey lines. Children were asked to write each word that the examiner named aloud one by one. Later, the experimenter collected the sheets and categorized all mistakes made. Items in each condition were randomly presented following the same order for all participants within Lists 1 and 2 in both tasks.

Naming task

For the naming task, each child was individually assessed by the examiner in a separate classroom. Each child was asked to read aloud the words that were presented individually on a computer screen. The experiment was carried out with the DMDX program (Forster & Forster, 2003). This program allows recording the naming response by using a microphone connected to the computer. It also allows a homogeneous presentation of stimuli. For each item, a fixation point ("+") was presented in the centre of the screen for 500 ms. Immediately after fixation, the item appeared on the screen and remained for a maximum of 5 s before the next fixation point was displayed. After each naming response, the item disappeared and the cycle was repeated with the next item. The items were presented in Courier font, 14. Once the responses were transcribed from the recordings, the errors were counted and categorized.

Results

In both experiments percentage of errors was used as a measure of analysis. An analysis of variance by participants and by items (ANOVA) was carried out based on a 2 task (named, dictation) × 2 frequency (high, low) × 3 type of word (word dependent on context, inconsistent and neutral) design, with grade as a between subject variable. List was included as a dummy variable to extract the variance due to the error associated with the list (Pollatsek & Well, 1995). Z scores were calculated for correctly named and written items by condition based on each grade sample. This score places children in a range higher or lower than 0, and allows a more reliable analysis of the effects when working with samples that differ in age (see Faust et al., 1999). This score was used as a measure for the analysis of variance. As usual, separate analyses were conducted for word and non-word targets. All significant effects had p values less than the 0.05 level.

Experiment 1

Descriptive data on the percentage of errors by condition and grade for words are presented in Table 1.

Table 1 Mean and standard deviation of error percentages in dictation and naming of words with a congruent neighbor by condition

Word data

The analysis showed an effect of Task, F1 (1, 113) = 5.5, p = 0.021, Mses = 1.96, η2 = 0.046; F2 (1, 168) = 77.01, p = 0.001, Mses = 1.63, η2 = 0.314, indicating that naming implied a lower error rate than spelling (2.6% vs. 19.8%, respectively). A Frequency effect revealed that participants committed fewer errors in frequent than in infrequent words (11.2% vs. 12.3%), F1 (1, 113) = 15.68, p = 0.011, MSe = 0.488, η2 = 0.122; F2 < 1. The Type effect showed that context-dependent and inconsistent words showed more errors than neutral ones (14.2, 12.3 y 7.05%, respectively), F1 (2, 226) = 57.14, p = 0.001, MSe = 1.03, η2 = 0.01; F2 (1, 168) = 41.87, p = 0.001, MSe = 1.635, η2 = 0.053. There was no significant effect of grade.

The Task x Frequency interaction was significant in the analysis by participants, F1 (1, 113) = 13.23, p = 0.001, MSe = 0.509, η2 = 0.105; F2 < 1, and revealed a significant frequency effect only in the naming task (2.1% error in frequent words and 3.4% in infrequent words), F1 (1, 115) = 30.11, p = 0.001, MSe = 0.171, η2 = 0.202, but not in spelling (20.3% vs. 19.4% in frequent and infrequent words, respectively), p = 0.92.

Additionally, the Task × Type interaction, F1 (2, 226) = 11.59, p = 0.001, MSE = 0.928, η2 = 0.93; F2 (2, 168) = 2.75, p = 0.004, MSe = 0.635, η2 = 0.032, revealed that in the naming task context-dependent words generated significantly more errors than inconsistent ones, and these more than neutral ones, showing a descending linear slope, F1 (2, 230) = 75.27, p = 0.001, MSe = 0.467, η2 = 0.452; F2 (2, 95) = 8.56, p = 0.001, MSe = 0.078, η2 = 0.156, (4.9, 1.9, y 1.02%, respectively). However, in the dictation task, both context-dependent and inconsistent words showed a significantly higher error rate than neutral words. (23.5 y 22.7%, vs. 13%, respectively), F1 (2, 230) = 62.52, p = 0.001, MSe = 0.338, η2 = 0.284; F2 (1, 63) = 4.46, p = 0.039, MSe = 0.841, η2 = 0.074; y F1 (2,230) = 29.09, p = 0.001, MSe = 0.734, η2 = 0.284; F2 (1, 63) = 4.93, p = 0.003, MSe = 0.945, η2 = 0.076. However, the difference between dependent and inconsistent words was not significant, p = 0.70. This pattern, shown in Fig. 3, reflects that context-dependent words are problematic for writing and naming, while inconsistent words are problematic specifically for writing. This result will be described in detail in the discussion.

Pseudoword data

Descriptive data on percentage of errors by condition and grade for pseudowords are presented in Table 2.

Table 2 Mean and standard deviation of error percentages in pseudoword dictation by condition

The analysis of variance for the pseudowords in the dictation task showed an effect of Frequency by participants, F1 (1, 113) = 5.36, p = 0.022, MSe = 0.725, η2 = 0.45; F2 < 1. Children showed a 2% higher error rate in pseudowords constructed from high-frequency words than in those constructed from low-frequency words. A Type effect was also observed, F1 (2, 226) = 128.18, p = 0.001, MSe = 1.01, η2 = 0.705; F2 (2, 84) = 8.76, p = 0.001, MSe = 0.881, η2 = 0.123, reflecting a quadratic pattern with a higher writing error rate in pseudowords with context-dependent and inconsistent letters, with respect to neutral pseudowords (28, 36 and 13%, respectively).

The Frequency × Word Type interaction was significant in the analysis by participants, F1 (2, 226) = 7.32, p = 0.001, MSe = 0.825, η2 = 0.132; F2 < 1, and reflected a frequency effect in writing pseudowords with context-dependent letters (those constructed from high-frequency words were written with more lexicalization errors than low-frequency ones), F1 (1, 115) = 5.82, p = 0.017, MSe = 0.579, η2 = 0.042; F2 < 1. This effect was not significant in inconsistent or neutral pseudowords.

Discussion

This experiment tested whether Spanish children from 3rd to 5th grade apply lexical or sublexical strategies when reading and writing words and pseudowords with different orthographic structures. Target words and pseudowords had orthographic neighbors that did not interfere orthographically or phonologically with the manipulated structure. Results showed that children committed significantly more errors in writing than in naming, and that consistent neutral words and pseudowords were significantly better read and written than inconsistent or context-dependent words and pseudowords, sustaining the claim of a preference for sublexical strategies in Spanish. This claim was supported by the fact that frequency effects were not strong and only significant by participants. This is consistent with other studies showing that learning inconsistent structures can be problematic even in transparent orthographies (Álvarez-Cañizo et al., 2018; Zhang et al., 2021). Our study adds to the growing evidence that these errors persist even until 5th grade, and that difficulties with context-dependent structures are particularly prevalent in naming, whilst inconsistent structures tend to be more problematic for writing in all grades. These results suggest that both type of inconsistency, and directionality in the OP–PO translation influence the development of orthographic knowledge in Spanish, a similar finding to that observed in opaque orthographies (Binamé et al., 2015). However, many inconsistent words in Spanish have orthographic neighbors that are incongruent in the inconsistent position, that is, there are many words in the Spanish lexicon that share the same orthographic structure except for a letter that leads to an alternative spelling or pronunciation. Evidence with adults has revealed a general facilitative effect of neighbors in word reading due to general overall activation, but also specific inhibitory effects of inconsistent structures due to sublexical interference (Pollatsek et al., 2005). We explore the pattern of errors and strategies applied by children in these words in Experiment 2.

Experiment 2

Descriptive data on times and errors by condition and grade of the words are presented in Table 3.

Table 3 Mean and standard deviation of error percentages in dictation and naming of words with incongruent neighbor by experimental condition

Word data

The analysis of variance for words showed an effect of Task, F1 (1,110) = 210.27, p = 0.001, MSe = 1.24, η2 = 0.655; F2 (1, 168) = 51.87, p = 0.001, MSe = 0.625, η2 = 0.265. The naming task showed a significantly lower error rate than the dictation task (2.1% vs. 17.6%, respectively). A Frequency effect revealed that frequent words showed fewer errors than infrequent words (8.9% vs. 10.3%), F1 (1,110) = 297.85, p = 0.001, MSe = 0.730, η2 = 0.713; F2 (1, 168) = 4.52, p = 0.031, MSe = 0.625, η2 = 0.030. A general effect of Type revealed a quadratic pattern in which inconsistent words showed significantly more errors than context-dependent and neutral words. (12.1, 5.6, y 3.4%, respectively), F1 (2, 220) = 184.45, p = 0.001, MSe = 1.01, η2 = 0.627; F2 (2, 168) = 14.79, p = 0.001, MSe = 0.5275, η2 = 0.170. Unlike Experiment 1, a grade effect was observed, F1 (2, 110) = 204.58, p = 0.001, MSe = 2.07, η2 = 0.803; F2 (2,288) = 47.75, p = 0.001, MSe = 0.230, η2 = 0.2495, (13.7, 9.2, y 6.7% error in 3º, 4º y 5º grade, respectively).

In addition, a triple Task x Type x Grade interaction was observed, F1 (2,220) = 502.96, p = 0.001, MSe = 0.797, η2 = 0.901; F2 (2,288) = 6.12, p = 0.001, MSe = 0.209, η2 = 0.128. This interaction revealed that in 3rd grade the error rate in the naming task was significantly higher for context-dependent words than in the dictation task (12% vs. 5%), F1 (1,41) = 14.31, p = 0.001, MSe = 0.409, η2 = 0.259; F2 (1,48) = 6.96, p = 0.001, MSe = 0.437, η2 = 0.127, while the error rate was significantly higher in the dictation task for inconsistent words (33.2% vs. 2.5%), F1 (1,41) = 15.59, p = 0.001, MSe = 0.606, η2 = 0.275; F2 (1,48) = 40.41, p = 0.001, MSe = 0.936, η2 = 0.457. No significant difference between naming and dictation was found for neutral words in this grade (10.2% vs 2.2%), p = 0.44.

In 4th grade, the greater error rate difference in the dictation task with respect to the naming task in inconsistent words remained significant, F1 (1, 41) = 16.12, p = 0.001, MSe = 0.501, η2 = 0.1929; F2 (1, 48) = 46.20, p = 0.001, MSe = 0.8717, η2 = 0.490, (22.2% vs. 1.2%), but there was no task difference in context-dependent words (8.9% vs 2.8%), or neutral words (5% vs. 1.2%), ps > 0.05. In 5th grade, the greater error rate in the dictation task remained significant in inconsistent words, F1 (1, 28) = 328.56, p = 0.000, MSe = 0.531, η2 = 0.992; F2 (1, 48) = 17.50, p = 0.001, MSe = 1.59, η2 = 0.267, whilst this difference was not significant in context-dependent or neutral words,, both ps > 0.05.These results show that the presence of inconsistent letters is particularly harming in writing, while context dependent letters exert an inhibitory effect in naming, especially in 3rd grade.

Finally, a triple Frequency x Type x Grade interaction was found in the analysis by participants, F1 (4, 220) = 486.81, p = 0.001, MSe = 0.904, η2 = 0.713; F2 < 1. This interaction showed that in 3rd grade, the frequency effect was significant in neutral words (4% vs. 8.5%, for frequent and infrequent words, respectively), F1 (1, 41) = 21.12, p = 0.001, MSe = 0.358, η2 = 0.301, but not in inconsistent (17% vs. 17%), p = 0.29, or in context-dependent ones (8% vs. 10%), p = 0.23. In 4th grade, the frequency effect was significant in neutral words (1.7% vs. 3.7%), F1 (1, 44) = 2.98, p = 0.05, MSe = 0.377, η2 = 0.263; and in context-dependent ones (2.9% vs. 7.7%), F1 (1, 44) = 11.60, p = 0.001, MSe = 0.348, η2 = 0.041; but not in inconsistent ones (10.7% vs. 12.5%), p = 0.48. In 5th grade, the frequency effect was significant in all, neutral (0.5% vs. 1.9%), F1 (1, 28) = 15.65, p = 0.001, MSe = 0.400, η2 = 0.629; context-dependent (0.9% vs. 4.5%), F1 (1, 28) = 10.39, p = 0.003, MSe = 0.705, η2 = 0.517; and inconsistent words (4.4% vs. 7.8%), F1 (1, 28) = 3909.1, p = 0.000, MSe = 0.386, η2 = 0.145. These data indicate that context-dependent and especially inconsistent words require more time to be consolidated in the lexicon, and therefore frequency effects arise later. These results will be described in the discussion.

Pseudoword data

Descriptive data on pseudoword times and errors by condition and grade are presented in Table 4.

Table 4 Mean and standard deviation of error percentages in dictation and naming of pseudowords with incongruent neighbor by condition

The analysis of variance for pseudowords showed an effect of Task, F1 (1, 110) = 48.20, p = 0.001, MSe = 1.824, η2 = 0.261; F2 (1, 144) = 61.98, p = 0.001, MSe = 1.93, η2 = 0.301; the error rate in the naming task was lower than in the dictation task (2.9% vs. 23.4%, respectively), and a general Type effect reflecting quadratic pattern, F1 (2, 220) = 25.815, p = 0.001, MSe = 0.751, η2 = 0.185; F2 (2144) = 2.77, p = 0.004, MSe = 1.93, η2 = 0.046. Pseudowords with inconsistent letters showed more errors than those with context-dependent and neutral letters (17.4, 11.8, y 10.2%, respectively). An effect of grade was found, F1 (2, 110) = 12.35, p = 0.001, MSe = 2.32, η2 = 0.167; F2 (2, 288) = 12.39, p = 0.001, MSe = 0.165, η2 = 0.173, reflecting a decreasing linear slope (16.3, 12, and 9.7% error in 3rd, 4th and 5th grade, respectively).

Additionally, a Task × Type interaction, F1 (2,110) = 214.87, p = 0.001, MSe = 1.24; F2 (2, 144) = 3.60, p = 0.030, MSe = 1.93, η2 = 0.048, revealed that the error rate difference in the dictation task with respect to naming was not significant in context-dependent pseudowords, p = 0.83 (19% vs. 4%); or neutral pseudowords, p = 0.37 (17% vs. 3%); but it was in inconsistent ones (32% vs 2%), F1 (1, 115) = 99.89, p = 0.001, MSe = 0.764, η2 = 0.501; F2 (1,48) = 34.71, p = 0.001, MSe = 0.252, η2 = 0.420.

Discussion

The aim of this experiment was to test whether the presence of incongruent neighbors in the lexicon affected the observed error pattern and strategies employed by Spanish children when reading different types of words and pseudowords. Results again indicated that writing was more difficult than naming, and that context-dependent and inconsistent structures were especially problematic. The pattern of results is consistent with the claim that Spanish children rely on sublexical strategies to apply their orthographic knowledge, and reinforces the idea that the directionality of the OP-PO translation poses distinct difficulty for distinct orthographic structures. While context-dependent letters were difficult for reading, inconsistent letters were difficult for writing. Nevertheless, in this experiment context dependent errors in naming were significant only in 3rd grade, whilst inconsistent letter errors in writing persisted until 5th grade. This finding, together with the frequency effects observed in neutral words earlier than in context-dependent and inconsistent words, suggests that children acquire lexical knowledge about consistent and frequent representations quite easily, but that incongruent letter-sound patterns need more time to be finely tuned. Extending the results of Zhang et al. (2021), our data reveal that inconsistent structures seem to be particularly reluctant to consolidation. This finding will be further discussed in the general discussion section.

Type of error analysis by items and grade

The second aim of the study was to identify which types of words imply the greatest source of error in Spanish children in primary school. In Experiments 1 and 2, data showed that children are prone to generate phonological errors when reading and writing (write zereza instead of cereza, or read kereza instead of cereza), and generate spelling errors with inconsistent letters mainly in writing (write bela, instead of vela), errors that diminish with grade. This however, does not inform which words generate these errors, or if there are other types of errors such as omissions, substitutions or transpositions. To answer the second question, the pattern of errors made in both tasks was examined, by establishing ten error-categories based on previous reading and writing studies in Spanish (Álvarez-Cañizo et al., 2018; Goikoetxea, 2006; Justicia et al., 1999; Zhang et al., 2021):

  1. 1.

    Inconsistent consonant substitution, when b is substituted for v or vice versa (write bela instead of vela)

  2. 2.

    Context-dependent consonant substitution, when a c, g, or r is substituted for a letter with the same sound (write zesta or read kesta instead of cesta)

  3. 3.

    Addition, when a letter that is not part of the word is added (cueva-cuesva)

  4. 4.

    Substitution, when a consonant that is not context dependent or inconsistent is replaced (suela-sueta)

  5. 5.

    Omission of h, when the letter without sound is omitted in the dictation task (uerta)

  6. 6.

    Addition of a complete syllable, when a syllable is added to the word (cueva-cueneva)

  7. 7.

    Addition of h, when an unnecessary h is added in the dictation task (halbor)

  8. 8.

    Lexicalization, when one word is substituted for another one, or when a pseudoword is substituted for a word (cita-cinta, pamo-mano)

  9. 9.

    Omission of a letter (sable-sabe)

  10. 10.

    Transposition of two adjacent letters (sobre-sorbe)

Only one error per item was counted—no more than one error was detected per item, or any error that could not be classified in the aforementioned categories—. In order to guarantee reliability, all words were coded by a second evaluator. Inter-rater reliability was calculated with correlation coefficients by error type, the coefficient being greater than 0.95 in all cases.

Data showed that the highest error rate corresponded to categories 1 and 2. Concretely, the error rate in category 1 (Inconsistent consonant substitution) was especially high in the items with inconsistent letters, while the error rate in category 2 (Context-dependent consonant substitution) was high in items with context-dependent letter (see Fig. 1a, b). Figure 1a clearly reflects that the substitution of the inconsistent consonant (b/v) occurred in the dictation task and mainly in words with an incongruent neighbor. Figure 1b shows that the substitution of the context-dependent consonant (c, g, r) occurred both in the dictation and naming tasks.

Fig. 1
figure 1

a Percentage of type (1) error: inconsistent error by word type and task. b Percentage of type (2) error: Context-dependent error by word type and task

The general distribution of the type of errors showed that there were also other error categories which, although very rare, seem persistent throughout grades. Figure 2a, b show that these errors belong to category 4 (substitutions: 0.8% in words, 1.9% in pseudowords), and to category 8 (lexicalizations: 0.4% in words and 0.7% in pseudowords). Errors were also sporadically observed in categories 3 (addition of letter: 0.2%) and 6 (addition of syllable: 0.3%) in word naming. In general, the total error rate was similar for words and pseudowords (M = 1.9% in dictation and M = 0.2% in naming). Word errors were mainly observed in 3rd grade, while pseudoword errors remained stable in all grades. It is necessary to note that the error rate was low, and only numerical differences that are not statistically significant are described. In fact, if means are considered by grade, data shows that accuracy was 77% in dictation and 96% in naming in 3rd grade, in 4th grade it was 84% in dictation and 98% in naming, and in 5th grade it was 91% in dictation and 98% in naming.

Fig. 2
figure 2

a Distribution of error types by grade and task in words. b Distribution of error types by grade and task in pseudowords

Finally, we identified specific words that generated difficulty in each experiment by grade (3rd, 4th and 5th) and task (naming, dictation). These words can be observed in Fig. 3. In all grades, the most frequent errors in both tasks correspond to both low and high frequency items (in a distribution of 70 and 30%, respectively) that contain context-dependent letters (ge, gi, ce, ci) and inconsistent letters to a lesser degree (ve, vi). Post-hoc analysis revealed that the errors did not depend on the position of the key letter within the word (the error-rate difference when the letter was in the first syllable or in the second syllable was not significant).

Fig. 3
figure 3

Words with errors by task and grade

A potential hypothesis is that errors could be due to the syllabic frequency. To explore this question, the mean syllabic frequency of the words employed in each task was calculated, as well as the type frequency in the first and second syllable (number of words in the lexicon that contain that syllable in that position). These data were obtained from the B-pal Spanish lexical frequency databases (Davis & Perea, 2005). Subsequently, the mean difference was calculated between the syllabic frequencies of incorrect items and the syllabic frequencies of the correct items, as a function of the key letter position—first or second syllable—. Results showed that words that generated errors were those with the key letter in a syllable with a lower positional frequency than that of the words correctly read or written. This difference was statistically significant for the first syllable frequency between incorrect and correct words (see Table 5). Although same happened in the second syllable, in this case the difference was numerical but not statistically significant. In sum, those items that were poorly written or read, had a lower positional syllabic frequency than the items that were correctly written or read.

Table 5 Syllabic frequency values of the items used in both experiments

General discussion

This study investigated the strategies employed and the types of errors committed by Spanish children between 3rd and 5th grade when reading and writing different orthographic structures. To that aim, children were assessed in reading and writing consistent-neutral words, words containing context-dependent letters and inconsistent words of high and low frequency. In the first experiment, words had orthographic neighbors that did not interfere with the processing of the key structure (favor has the neighbor pavor, sucio has the neighbor socio). In the second experiment these neighbors could interfere with the processing of the key structure, either from orthography to phonology (bola can interfere with vela because both letters share the same sound), or from phonology to orthography (cara can interfere with cera, because they sound differently despite sharing the same letter). Results showed: (i) a greater difficulty in writing with respect to naming, (ii) a difficulty with context-dependent and inconsistent letter errors across grades, and (iii) a prevalence of such errors in low-frequency syllables.

Our first aim was to explore the degree to which children in a transparent orthography such as Spanish apply their sublexical or lexical orthographic knowledge to reading and writing. According to stage theories, the attainment of sublexical knowledge is necessary for the construction and application of lexical knowledge (Ehri, 2014). Evidence from opaque orthographies does not support this claim by showing that children combine their knowledge about sublexical structures and whole-words when reading and writing even from 1st grade (Leté et al., 2008). However the developmental pattern in Spanish might differ from that observed in opaque orthographies, because transparency could favour the reliance on sublexical structures for a longer period, delaying the construction and use of lexical knowledge. As in previous experiments, frequency was used as proxy of the use of lexical strategies. Inconsistent and context-dependent errors and the persistence of these errors in words as well as in pseudowords were used as proxies of the use of sublexical strategies. Experiment 1 corroborates the hypothesis of a preferential use of sublexical strategies in all grades in line with previous studies with similar ages (Carrillo & Alegría, 2014; Carrillo et al., 2013; Weekes et al., 2006). The rate of errors in neutral words was lower than in context-dependent and inconsistent words in all grades, showing that these structures were more difficult to internalize. However, it was observed that not only the rate but the type of error differed depending on the task.

Concretely, in the naming task context-dependent words generated more errors (pronouncing sukio when seeing sucio), while in the dictation task both context-dependent and inconsistent words generated higher error rates compared to neutral words (writing suzio or bela instead of sucio or vela). This data has two important implications. First, the cost of processing inconsistencies differs between tasks. In the case of reading, which implies an OP association, access to the sound seems to have a special cost when there is another sound that competes for the same letter. The proper resolution of this competition involves internalizing the context on which the resolution depends, that is, the complete syllable (ci, ce, vs. ca, co, cu). In the case of writing, which entails a PO association, the cost lies in the selection of the letters potentially associated with the same sound (e.g., letters b, v are associated to the same sound /b/; and letters c, z are associated to the same sound /z/) for correct spelling. Second, the difficulty of internalizing specific sublexical structures leads to a higher and more evident error rate in dictation. In all grades, the naming task showed better execution rates than the dictation task, which suggests that accessing the written representation from the oral word is not possible if such representation is not well consolidated, and thus the paper of writing in this consolidation may be fundamental (Graham & Santagelo, 2014; Rothe et al., 2015). This cost was observed in both word and pseudoword writing, which indicates that orthographic structures of words with context-dependent and inconsistent letters are not fully internalized until the end of primary school, as is the case in other orthographies (Bosse et al., 2021; Schmalz et al., 2020).

These data support previous evidence revealing specific difficulty with context—dependent and inconsistent letters particularly in writing (Álvarez-Cañizo et al., 2018; Zhang et al., 2021) and suggest a pre-eminence of sublexical strategies even after 3rd grade. One potential explanation for the lack of clear lexical effects in this experiment is that these might be diluted in dictation and naming tasks since these impose the use of sublexical strategies (at least more than in automatic word identification tasks, see Moret-Tatay & Perea, 2011). This could explain the absence of clear frequency effects in previous studies in Spanish using writing or reading errors as a measure of orthographic knowledge (Abchi et al., 2009; Jiménez & Hernández, 2000). However, this rationale could be limited to words that have no conflicting competitors in the lexicon. It could be that when children read or write words with a lexical neighbor inconsistent only in the key structure, they use their knowledge about such neighbour to detect and specify the sublexical structure entailing such inconsistency. Results in Experiment 2 support this view.

In experiment 2, the naming task showed better execution rates than the dictation task again, especially in 4th and 5th grades, when the naming errors decreased significantly. In the same way, data showed that words with context-dependent letters generated more errors in naming, especially in 3rd grade (pronounce kera when seeing cera), while in dictation the inconsistent ones generated a higher error rate in all grades (write bela in place of vela). Again, this cost was observed both in word and pseudoword writing, which indicates a special difficulty to internalize inconsistent structures and to transcribe them from oral to written form. If the presence of context-dependent and inconsistent letter errors were a proxy of sublexical strategies, the finding that both types of errors remain significant until 4th grade and that inconsistent errors prevail until 5th grade confirm the view of a prevalence of sublexical strategies for certain types of words in Spanish. This view is supported by the lack of frequency effects for words containing these structures until 5th grade.

However, in this experiment, a stable frequency effect was observed in both tasks, which increased with age, suggesting that when a word has a lexical neighbor it can either be processed lexically if the word is not conflicting, leading to stable representations and early facilitative frequency effects; or sublexically if the neighbor implies interference, in order to find the correct transcription. In conflicting neighbors, the lexical competitor can lead to an error if the structure is not well internalized, as occurs in infrequent words (see Chen & Mirman, 2012). This hypothesis is supported in our study by the fact that in 3rd grade the frequency effect was evident only for neutral words. In 4th grade, this effect was observed in neutral and context-dependent but not in inconsistent words. Finally in 5th grade, the frequency effect was evident in the three types of words, and errors were ascribed mostly to inconsistent words. This indicates that words with inconsistent letters are the ones that receive the greatest interference from their potential neighbors, possibly because they can only be solved by rote memory (e.g., vela, bala, vaso, beso can only be learned by heart and not based on rules such as context-dependent letters). This might be the reason why inconsistent words lead to more and more persistent errors in writing, in line with the results previously reported by Zhang et al. (2021) with younger children.

These findings go in line with dual and continuity theories (Pritchard et al., 2018; Treiman, 2017) and suggest that different orthographic structures can follow distinct developmental trajectories. Our data clearly show that the preference for sublexical or lexical strategies depends on several factors such as the type of inconsistency the word entails, the processing demands of the OP-PO translation, or the potential competition in the lexicon of words that share the same structure but generate an alternative spelling to one sound (b/v), or an alternative pronunciation of one letter (c/k).

The second aim of the study was to examine the developmental pattern of types of errors in both tasks. These data showed that in both tasks most errors occurred in inconsistent and context-dependent letters, and that a small percentage of errors consisted of substitution, addition, omission of letters and lexicalization errors. This outcome goes in line with the errors reported in previous works. Concretely, the systematic errors observed in reading studies with Spanish children up to 3rd grade included letter substitutions, additions, omissions and transpositions, syllable repetitions, lexicalizations and context-dependent letter errors (Álvarez-Cañizo et al., 2018; Goikoetxea, 2006). In a similar vein, writing errors have been classified in vowel and consonant substitutions, additions, omissions and transpositions, syllabic fragmentations, incorrect use of letter h, and inconsistent same-sound letter errors (Justicia et al., 1999; Zhang et al., 2021). Our results show that although some addition, substitution and lexicalization errors were observed particularly in 3rd and 4th grade, the error rate was negligible compared to the rates reported in previous studies with younger samples. This finding offers a clear picture of the process of orthographic knowledge acquisition in transparent orthographies, by showing that between 3rd and 5th grade, children commit very few errors related to letter coding difficulties, but a substantial amount of errors related to ambiguous transcription of phonology to orthography.

Again, results reinforce the idea of sensitivity towards sublexical units in Spanish. The fact that most errors were made in infrequent syllables, especially when the infrequent syllable was in the first position, corroborates this idea (Carrillo & Alegría, 2014; Maïonchi-Pino et al., 2010). Our finding confirms the facilitative effect of first syllable frequency previously observed in adult (Carreiras & Perea, 2004) and children reading (Jiménez & Hernández, 2000), and reveals that the frequency of the sublexical unit together with the existence of an inconsistency in that unit might determine the strategy used to access the representation of the word. It has been suggested that the initial syllable activates a cohort of competitions sharing its sound/letter, so that when this syllable is high frequency, a facilitative effect is observed. In the case of context-dependent and inconsistent words specifically, a first low syllable frequency might receive activation from the competing syllable leading to difficulties to resolve and internalize these units.

This result differs from that reported by Álvarez-Cañizo et al. (2018) who found a syllable complexity but not a syllable frequency effect in 3rd grade children writing measures. One reason for this discrepancy might be that only 16 pseudowords were employed in this study and frequency was orthogonally manipulated. Our analysis was comprised over 180 words by comparing the positional syllable frequency of the well translated and bad translated items post-hoc. This might have provided a clearer picture of the word characteristics that led to error, bringing out the importance of the syllable frequency on the process of learning orthographic structures in Spanish.

It is possible that the relevance of syllable units slows down the development of lexical orthographic knowledge in transparent orthographies; but the pre-eminence of sublexical strategies through multiple exposures in time could also favour the lexical quality of the representations (Hsiao & Nation, 2018). This might explain the relatively low error rates from 3rd grade on, far below the 30% threshold established as the minimum level of orthographic knowledge expected to be attained between 1st and 2nd grade of primary school (Karageorgos et al., 2020).

In summary, this study serves to shed light on the development of orthographic knowledge in Spanish, and on the mechanisms used by primary school children to access that knowledge. It reveals that reading and writing are different indicators of the crystallization of orthographic knowledge, and that sublexical variables such as syllabic frequency and the inconsistency between letter and sound associations determine the emergence of lexical activation processes and the potential reading and writing errors. These results have important educational implications. First, they suggest that accurate assessment of orthographic knowledge should combine reading and writing tasks on the basis of words with particular phonological/orthographic attributes, if children with specific weaknesses in OP-PO translations are to be recognized. Likewise, training should scaffold reading, copying and dictation with corrective feedback, and enhance the comparison or neighbors with inconsistent and context-dependent letters to ensure the detailed specification of such words. Nevertheless some limitations offer a cautionary note about the generalization of these conclusions. First our study was cross-sectional, and did not include children at early ages in primary school, offering only a partial view of children´s developmental pathway. Additionally this study did not explore the attainment of morphological patterns, another key feature that might influence the acquisition of orthographic knowledge. Future research should examine these issues longitudinally, and explore how increasing knowledge of sublexical inconsistencies influences the speed with which words containing different types of representations are consolidated in the lexicon. This may guarantee better reading and writing outcomes at earlier ages.