Characterization of reading errors in languages with different orthographic regularity: an Italian–English comparison

The study examined whether a classification of errors based on Hendriks and Kolk’s (1997) proposal would effectively characterize the reading profile of children learning two orthographies varying for regularity, such as Italian and English. The study considered both an age-match and a grade-match comparison. Offline analysis of error production was carried out for two lists of stimuli: List 1 including regular words varying for frequency and matched non-words and List 2 including low-frequency words varying for regularity. In List 1, Italian-reading children made more multiple attempts characterized by a slow and progressive approach to the target (sounding-out behavior) than English-reading children, while the latter made relatively more word substitutions and non-word lexicalizations. As for List 2, Italian-reading children made relatively more multiple attempts and progressive approaches to the target compared to the English-reading children (with more sounding-out behaviors and syllabications), while the opposite occurred for phonological-visual errors, word substitutions, morphological, and semantic errors. Both groups showed a high proportion of phonological-visual and regularization errors (stress assignment in the case of Italian-reading children). Overall, the use of an error coding system specifically tuned to the characteristics of the orthographies investigated allowed a more comprehensive identification of reading difficulties which allowed the different strategies used by children of different languages to emerge more clearly (more reliance on sub-lexical routines in Italian readers and on lexical routines in English readers). These results call for more attention to error patterns in the identification of reading difficulties in children of different languages including those learning a transparent orthography where error analyses have largely been ignored.


Introduction
It is well established that orthographic regularity has a profound effect on reading performance (Seymour et al., 2003). Up until the'90s, most research was focused on highly irregular languages, such as English (Share, 2008). More recently, there has been a growing interest in examining reading development and the characteristics of reading deficits in languages with regular orthographies, such as Austrian/German, Dutch, Finnish, Greek, or Italian. Wimmer (1993) made the point that children with problems in learning German, a highly regular orthography, were slow but did not make a sizeable number of errors. This led several authors to compare reading in different orthographies using time measures such as reaction times (RTs) to examine reading acquisition (Welsh-English comparison; Ellis et al., 2001; Albanian-Greek-Hiragana-Kanji-English comparison; Ellis et al., 2004;Dutch-English comparison;Patel et al., 2004;Italian-English comparison;Marinelli et al., 2014, 2016, Mauti et al., 2023 and/or to compare children with and without a reading deficit (German-English comparison; Ziegler et al., 2003). However, even though reading accuracy may be generally high in languages with regular orthographies, type of errors may also differ and require ad hoc classifications.
An original formalization of reading errors in a regular orthography (Dutch) was proposed by Bakker (1992), who distinguished between ''time consuming'' and ''word substitution'' errors. The former are characterized by slow and labored approaches to the target stimulus, while the latter represent a general category that includes most of the types of errors present in classical cognitive classifications. It should be noted, however, that Bakker's analysis was mainly oriented toward studying dyslexia in relation to neural mechanisms (such as hemispheric lateralization) and developing a classification into P-types (perceptual: displaying accurate but slow and laborious reading) and L-types (linguistic: displaying fast reading but with many errors), and so was not focused on a characterization of the linguistic impairments.
Working on Dutch, Hendriks and Kolk (1997) developed Bakker's (1992) original insight and proposed a more analytical and explicit categorization of reading errors. They identified two main difficulties: sounding-out behavior and word-substitution errors. In sounding-out behavior, the child gradually approaches the correct pronunciation of the word. This may occur after several attempts: for example, ''complesso'' (''complex'') is read ''co…com…complesso''. Sounding-out is referred to as a ''behavior'' rather than an error since the progressive approach to the target may lead to the correct reading of the word. In word-substitutions, a word phonologically or visually related to the target is produced in error; for example, ''soffitto'' (''ceiling'') is read ''soffio'' (''breath''). Importantly, Hendriks and Kolk (1997) posit that the presence of a sounding-out behavior is indicative of the underlying process, i.e., that, by sounding out parts of the word before uttering the whole word, the child demonstrates its reliance on the phonological, sub-lexical routine. Specifically, Hendriks and Kolk state that ''children who predominantly sound out the words during reading would have a lexical route that is relatively less efficient than the nonlexical route, whereas the reverse would hold true for children who read at a normal rate but produce substitutional errors'' (p. 323, Hendriks & Kolk, 1997). In subsequent work, Trenta et al. (2013) showed that the classification of errors proposed by Hendriks and Kolk (1997) was useful to characterize single word and text reading in children learning Italian, a highly regular orthography. In reading lists of single words, sounding-out behaviors, errors in stress assignment, and the production of form-related nonwords were predictive of the presence of a reading disorder. In reading passages, sounding-out behaviors and formrelated errors were the best predictors.
A more systematic and comprehensive coding scheme of errors may allow us to compare types of reading difficulties across different orthographies. Presently, regularization errors are generally not considered in Italian, whereas time consuming errors or sounding-out behavior are not typically examined in English-reading children. In general, the cognitive classification of reading errors in English does not consider multiple attempts, such as circumlocutions, self-corrections, and multiple responses (Coltheart, 1980). This choice was originally motivated by cases of deep dyslexia who typically have a defect in monitoring their own performance and, henceforth, produce very few multiple attempts, but it has been kept in further research. Thus, the classification described by Bakker (1992) and Hendriks and Kolk (1997) has not been applied to English-reading children although there are indications that it may be useful. Fairbanks (1937), in a study on American college students with and without reading difficulties, carefully examined the students' reading errors together with patterns of eye movements, a procedure quite advanced at the time. Young adults with reading difficulties showed more ''hesitations'' (i.e., multiple attempts to word pronunciations) than typical readers and these correlated with other types of errors. Thus, Fairbanks concluded that ''inclusion of hesitations as errors raised the correlation between errors and [eye movements] regressions, suggesting that even though the determination of hesitations is a qualitative step, more central errors were included'' (for a detailed analysis of the contribution of Fairbanks see De Luca et al., 2013). In other words, Fairbanks underscored the idea that a progressive approach to pronunciation in young adults with reading problems was a selective difficulty in the decoding of text and not simply to an articulatory output problem.

Aims of the study
In the present study, we analyzed the reading performance of Italian and English readers at different times during primary school focusing on accuracy. The work was part of a bigger study in which both reading reaction times (RT) and accuracy were examined. An analysis of reading RTs has already been published together with an in-depth description of the overall study (Marinelli et al., 2016). In the study by Marinelli et al. (2016) accuracy was examined only holistically to justify the use of time measures (as only RTs to correct responses were considered). Here, instead, we carried out an in-depth analysis of reading errors using an error classification system broadly derived from that of Hendriks and Kolk (1997). This classification was expanded by adding the categories of 'fragment' (i.e., only the first part of the stimulus is read) and 'syllabication' (in both cases independent of whether they generate a correct or incorrect response), as additional indications of sublexical reading. Moreover, the category 'regularization' was included to allow comparison with inconsistent orthographies where these errors are common. Finally, due to the presence of pseudowords in the stimulus materials, lexicalization errors have been considered.
Error profiles were examined in both languages considering both reading regular stimuli (as in Marinelli et al., 2016), and low frequency irregular words. We carried out both an age-match and a gradematch comparison. As to the former, we examined Italian and English readers who were either about 7 years-old or 10 years old at the time of testing. Due to the different entry times in school in the two countries, younger children were in third grade in England and in second grade in Italy; older children were in fifth grade in England and in fourth grade in Italy. As to the latter, we compared children who were in fifth grade both in Italy and in England (i.e., with a similar reading experience but different ages).
Overall, the aim of the study was to examine whether a classification of errors based on Hendriks and Kolk's (1997) proposal was able to effectively characterize the accuracy profiles of children learning two orthographies with very different degrees of regularity, such as Italian and English. RT examination highlighted a greater reliance on the lexical procedure among English readers and on the sublexical procedure among Italian-reading children, as indicated by the reliance on small units of analysis (Marinelli et al., 2016). Then, in the present study, a prevalence of sounding out behaviors, syllabications and stress errors is expected for Italian-reading children (indicative of sub-lexical processing), while a prevalence of lexical errors (i.e., word substitutions, lexicalizations, and morpho-semantical errors) is expected for English-reading children.

Participants
Research participants came from a social and educational environment that afforded suitable literacy opportunities. All children attended public primary schools in the provinces of Naples and Rome in Italy, and in the Birmingham area in the United Kingdom. Parents gave informed consent for their children's research participation. Non-native speakers and children who scored B 1.5 standard deviations below the control mean in the Raven's CPM (Progressive Colored Matrices; Raven et al., 1998) were excluded from the analyses.
Age-matched subgroups included a total of 177 Italian-reading children (87 females, 90 males; Mean Raven Z score = 0.24, SE = 0.06) and 81 English-reading children (43 females, 28 males; Mean Raven Z score = 0.35, SE = 0.08). There were 90 younger children in Italy (43 females, 47 males; mean age = 7.3 years) and 40 in England (17 females, 23 males; mean age = 7.8 years). There were 87 older children in Italy (44 females, 43 males; mean age = 9.6 years) and 41 in England (26 females, 15 males; mean age = 9.9 years). The youngest children were in third grade in England and second grade in Italy; the oldest children were in fifth grade in England and fourth grade in Italy. Matched groups did not differ by gender (all v 2 approximately 1) and CPM Raven z score (F about 1). A quantitatively small but significant difference was present for age (for younger children: t(128) = 11.52, p \ 0.0001; for older children: t(126) = 5.91, p \ 0.0001); English-reading children were about four months older than Italian-reading children. Note that in England children are admitted to first grade if by August-September they have reached the expected age, unlike in Italy where this limit is set to April of the same scholastic year. Presumably this temporal difference was the cause of the slightly different age composition of the Italian-and Englishreading samples. However, for the sake of presentation in the following we will refer to this as age-matched comparison.
An additional group of 30 Italian-reading fifth graders (17 females, 13 males; Mean Raven Z score = 0.15, SE = 0.15; mean age = 10.6 years) was tested. The performance of these children was tested against that of the group of English-reading fifth graders to provide for a grade-matched comparison. The two fifth-grade groups did not differ by gender (v 2 \ 1) or by Raven performance (t \ 1), but they did differ by age (t (68) = 7.86, p \ 0.0001); the Italian-reading children were about seven months older.

Materials
List 1: Regular words varying for frequency and matched non-words For each language, 120 stimuli evaluated the effects of stimulus type (N = 40 high-frequency words, N = 40 low-frequency words, and N = 40 non-words) and length (4, 5, 6, and 7-9 letters) with 10 stimuli for each length. Only Italian words with regular stress (i.e., on the penultimate syllable) and English words with regular correspondences (no unusual letter-sound correspondences) were included in this list. Highfrequency words had a mean frequency of 106.2 (SD = 94.2) in English (CELEX lexical database, Baayen et al., 1993) and 63.7 (SD = 55.6) in Italian (CoLFIS database, Bertinetto et al., 2005); lowfrequency words had a mean frequency of 2.9 (SD = 1.4) in English and 3.2 (SD = 3.2) in Italian. Both Italian and English word frequencies were calculated over 1,000,000 occurrences. The Italian and English lists were matched for ortho-syllabic difficulty (i.e., presence of double consonants, consonant groups, and contextual rules; Barca et al., 2007), point of articulation of the first phoneme and word frequency, but not for the number of syllables, which was systematically higher in the Italian sets. Nonwords were derived from the corresponding highfrequency words by changing a few letters.

List 2: Words varying for regularity
For each language, 60 low-frequency words assessed the effect of regularity by contrasting 30 regular and 30 irregular nouns (overall mean frequency = 14.3; SD = 18, for English; and 9.5, SD = 10.6, for Italian with an average length of seven letters, SD = 1). Regular and irregular words were matched for orthosyllabic difficulty (presence of double consonants, clusters of consonants and contextual rules), articulation point of the first phoneme, number of letters and word frequency (frequency norms for Italian: from the Colfis database, Bertinetto et al., 2005; for English: CELEX database, Baayen et al., 1993).
In the Italian list, regularity was established in terms of frequency and consistency of stress pattern, where consistency considered the proportion of words with the same final orthographic-phonemic sequence sharing stress pattern (e.g., the final sequence -ola is predominantly associated with the antepenultimate stress; Burani et al., 2004Burani et al., , 2014. Regular words were words with a with stress on the penultimate syllable (a pattern which occur about 80% of times in the Italian language) and a consistent neighbourhood. Irregular words were words with stress on the anti-penultimate (a pattern which occurs about 18%-in the Italian language) and an inconsistent stress neighbourhood. For irregular words accurate lexical retrieval is necessary to produce the right stress.
In the English list, regularity was established by considering the frequency of letter-sound correspondences. Words were considered irregular if they included one or more letter-sound correspondences with a frequency of less than 5% in the database of English words by Hanna et al. (1966). Also in this case, lexical retrieval is needed to accurately read irregular words (while regular words might be read correctly according to both lexical and sub-lexical procedure). In this vein, a regularity effect (better performance on regular words) is an index of the reliance on the sub-lexical procedure, while the lexical procedure will ensure similar accuracy between regular and irregular words. Note that data on this list were not included in our original report (Marinelli et al., 2016).

Procedure
Children were tested individually in a quiet room at their school using a laptop computer. They were seated approximately 60 cm from the computer screen. Stimuli were presented using the E-prime2 software. Each trial began with a fixation point that remained on the screen for 500 ms. Subsequently, a word appeared at the same location. The stimulus remained on the screen until the child responded.
Words and non-words were presented in separate blocks. To make the task less tiring, the words were divided into three separate blocks with a short break between them. Since non-words were derived from words in the same list, they were presented blocked before the presentation of the words to avoid an increase in word substitutions (due to priming a wrong response). Six practice stimuli preceded both the word and nonword reading trials and the list of regular and irregular words. In both the word and nonword blocks, the order of presentation of the trials was randomized for each child.
Children were asked to read the stimulus as quickly and accurately as possible. Vocal RTs were recorded using a voice key (S-R Box). The experimenter manually recorded any error. Responses were recorded to allow for offline rechecking of children's production.

Error analysis
Errors were classified according to the following mutually exclusive categories: 1. Sounding-out behavior with correct response: the child progressively approaches the correct response, with multiple attempts to decode the orthographic string and final success. This type of ''error'' might include both hesitations (mestiere ''JOB'' ? me…mestiere; limousine ? limo…limousine) or self-corrections (mestiere ''JOB'' ? mo…mestiere; limousine ? isc…limousine). 2. Sounding-out behavior with error: the child progressively approaches the word with multiple attempts to decode the orthographic string, but the final production is incorrect due to the presence of one or more errors (mestiere ''JOB'' ? mes…mestiare; limousine ? li-mou…limousife). The errors might be all errors reported below at categories 7-13. 3. Syllabication with no error: the child read the stimulus correctly, but the decoding is syllabized (mestiere ''JOB'' ? me-stie-re; limousine ? li-mou-si-ne), but no part of the stimulus is re-read. 4. Syllabication with error: the stimulus is both syllabized and it is incorrect for the presence of one or more errors reported below at categories 7-13 (mestiere ''JOB'' ? me-stia-re; limousine ? li-mou-si-fe). 5. Fragment: only the first part of the stimulus is read. The fragment may be correct (the first part of the stimulus, e.g., mestiere ''JOB'' ? m-est…; limousine ? lim…) or incorrect (i.e., with phonological-visual errors, e.g., mestiere ''JOB'' ? mesp…; limousine ? lima…). 6. Omission-I do not know response: the child omits the stimulus or says he/she doesn't know what it is. 7. Phonological-visual error. Errors are in this category if they cannot be classified according to categories 8-13. The response produced shares more than half of the letters with the target stimulus, but includes one or more of the following: insertion of letter/letters (mestiere ''JOB'' ? mestierte; limousine ? limoustine), deletion of letter/letters (mestiere ''JOB'' ? mesiere; limousine ? limoune), substitution of letter/letters (mestiere ''JOB'' ? mestiele; limousine ? limoudine), transposition of letter/letters (mestiere ''JOB'' ? restieme; limousine ? limuosine) and/or contextual errors (bicchiere ''glass'' ? bicciere; jumple ? jumply; naver ? narver). 8. Regularization error: these are phonologically plausible errors arising from the application of sub-lexical routines, and stress errors. Note that phonological plausible errors are only possible in English, due to the absence of irregular words in reading for Italian, while stress errors are only possible in Italian at least with our lists (cadavere ''CORPSE'' ? cadavère; apricot ? apricot pronounced with the A as in bat). 9. Regularization and phonological-visual error: the child makes both a regularization and a phonological-visual errors (cadavere ''CORPSE'' ? cadavèra; apricot ? pricof pronounced with the A as in bat). 10. Word substitution: the target word is substituted with another word that is not semantically or morphologically related to the target words (mestiere ''JOB'' ? mestolo ''LADLE''; limousine ? lemon). 11. Morphological error: the target is substituted with a morphologically-related word (mestiere ''JOB'' ? mestierino ''SMALL JOB''; mestieri ''JOBS''; limousine ? limousines). 12. Semantic error: the target is substituted with a semantically-related word (mestiere ''JOB'' ? lavoro ''WORK''; limousine ? car). 13. Non-word lexicalization: the target is read as a real word sharing several letters with the target stimulus (fibestre ? finestre; tinection ? infection).
Note that errors in categories 7-12 were only possible in reading words, while errors in category 13 were only possible for non-words.
Finally, we counted the number of errors for each child resulting in a real word or a nonword (from now errors resulting in words: wordER and errors resulting in nonword: nonwordER). Note that here we only considered complete attempts and not sounding-out behaviors (with or without correct response), fragment productions, syllabications (with or without errors) and omission-I do not know responses.

Results
List 1: Regular words varying for frequency and matched non-words: age match comparisons Descriptive statistics (average error rates) for Italian and English readers matched for age (2nd and 4th Italian grades with 3rd and 5th English grades) are presented in ''Appendix A''. Separate ANOVAs were carried out for different types of reading errors. In these analyses, the type of stimulus was the repeated measure, and age (younger, older) and language (Italian, English) the unrepeated measures. Depending on the type of error, the type of stimulus factor might involve comparing high-frequency (HF) words, lowfrequency (LF) words, and non-words, or only words (i.e., HF words and LF words) as in the case of word substitution, morphological, and semantical error categories. In one case (i.e., non-word lexicalizations), the ANOVA was confined to non-word stimuli and the type of stimulus factor was not included. These ANOVAs are presented in detail in ''Appendix B''.
A synthesis of these analyses is presented in Table 1. Inspection of the table indicates several major findings: • Italian-reading children made more sounding-out behaviors (with correct responses; see Fig. 1), syllabications (with no error), syllabications (with errors), and fragments; • English-reading children made more word substitutions and non-word lexicalizations; the pattern of word substitutions varied as a function of age and language (see Fig. 2); • There was no language difference in omission-I do not know responses, phonological-visual errors, regularization or stress errors, regularization and phonological-visual errors, morphological and semantic errors; • In all these types of errors (except for syllabications with no error), there was a significant type of stimulus main effect, with more errors for nonwords than LF frequency words and more errors for LF than HF words; • Word substitutions and morphological errors were more frequent for LF than HF words for younger children, especially in the English-reading sample; • Fragments and sounding-out behaviors with correct responses were more frequent for non-words and LF words than for HF words, especially in the Italian-reading sample; • There were substantial differences in the proportion of error types; thus, phonological-visual errors were most frequent as expected while morphological and semantic errors were rare (please refer to ''Appendix A''). Also, regularization errors were quite infrequent (but note that List 1 only contained regular words); • Wherever significant, the effect of age indicated a reduction of errors with older age/greater reading experience except for sounding-out behavior (with correct responses) which tended to increase with age (particularly in the Italian readers; see A second set of analyses compared English-and Italian-reading children matched for age on types of wordER or nonwordER. These ANOVAs are presented in detail in ''Appendix B''; a synthesis is shown in the lower part of Table 1. NonwordER were more frequent among younger (10.83%) than older (3.31%) children. The effect of the type of stimulus indicated more nonwordERs for non-words (13.36%) than LF words (5.61%) than HF words (2.28%). The type of stimulus 9 age interaction indicated a decrease with age in nonwordERs particularly for non-words and LF words. The type of stimulus 9 language interaction indicated a greater number of nonwordERs for LF words in English-(7.54%) than in Italian-reading (4.66%) children; the number of nonwordERs for HF words and non-words was very similar in the two groups.
WordERs were more frequent in English-(4.87%) than in Italian-reading (1.31%) children and in younger (3.67%) than older (1.26%) children. The main effect of the type of stimulus indicated more wordERs for non-words (2.88%) and LF words (2.81%) than for HF words (1.72%). The type of Fig. 1 List 1-sounding-out behavior with correct responses as a function of the type of stimulus (high-frequency words, lowfrequency words, and non-words) and grade. Error bars indicate standard errors. 2nd and 4th Italian grades are approximately matched to 3rd and 5th English grades for chronological age. HF high frequency, LF low frequency words Fig. 2 List 1-word substitutions as a function of the type of stimulus (high-frequency words, and low-frequency words) and grade. Error bars indicate standard errors. 2nd and 4th Italian grades are approximately matched to 3rd and 5th English grades for chronological age. HF high frequency, LF low frequency words stimulus 9 age interaction (F (1,259) = 6.69 p \ 0.01) indicated a decrease with age in wordERs, particularly for LF words. The type of stimulus 9 language interaction was significant (F (1,259) = 28.97 p \ 0.001) indicating that Italian-reading children made considerably fewer wordERs than English-reading children in general and particularly in the case of non-words. By contrast, English-reading children made many more wordERs, and these were sensitive to the type of stimulus; thus, they made wordER particularly in the case of non-words and LF words (see Fig. 3).
List 1: Regular words varying for frequency and matched non-words: grade-matched comparisons Here, we compared children matched for number of years of schooling (i.e., fifth grade Italian and English readers). A synthesis of these analyses is presented in Table 2.
Inspection of the table indicates a few major findings: • In general, the absolute number of errors was lower than in the previous analyses owing to the older age of children; • Italian-reading children made more sounding-out behavior (with correct responses), especially in reading LF words and non-words; • Fragments were more numerous in the Italian than in the English readers, but only in the case of nonwords; • English-reading children made more non-word lexicalizations; the pattern for word substitutions was in the same direction but only a statistical trend was present; • For sounding-out behavior (with correct responses) and phonological-visual errors, there was a significant type of stimulus effect, indicating more errors for non-words than LF and for LF than HF words.
Another set of analyses compared English-and Italian-reading children matched for schooling (i.e., fifth grade Italian-and English-reading children) in the production of wordERs and nonwordERs. These ANOVAs are presented in detail in ''Appendix B''; a synthesis is presented in the lower part of Table 2.
NonwordERs were more frequent for non-words (7.57%) than LF words (2.09%) and more frequent for LF than HF words (0.61%). The main effect of language was not significant and did not interact with the type of stimulus. WordERs were more frequent in English-(3.01%) than in Italian-reading (0.43%) children. There were more wordERs for non-words (2.0%) than LF words (1.57%) and for LF than HF words (0.43%). The type of stimulus 9 language interaction indicated that Italian-reading children made few wordERs with all types of stimuli. The English-reading children made mostly wordERs in reading all stimuli, but especially in reading nonwords (see Fig. 4).
List 1: Regular words varying for frequency and matched non-words: comments Results for List 1 (including only regular stimuli) indicate that the error coding scheme adopted effectively captured reading problems in both Italian and English-reading children. The Italian readers showed a prevalence of multiple attempts characterized by a slow and progressive approach to the target (which may eventually be correctly read or not). The Englishreading children showed a prevalence of errors involving word substitutions or non-word lexicalizations. Again, there was a clear tendency for English reading children to produce more erroneous words than Italian-reading children.
With increasing age, errors, and multiple utterances (including syllabications) generally decreased except for sounding-out behaviors with correct responses that increased with age, due to increasing self-corrections with older age/more experience. The pattern of Fig. 3 ERwords as a function of the type of stimulus (highfrequency words, low-frequency words, and non-words) across grades. Error bars indicate standard errors. HF high frequency, LF low frequency words findings was similar in the age-matched and in the grade-matched comparisons although the former set of analyses was more sensitive. Therefore, it appears that the cross-linguistic differences cannot be easily explained in terms of age or reading experience. Further comments will be presented in the Discussion section.
List 2: Words varying for regularity: age-match comparisons Descriptive statistics for Italian-and English-reading children matched for age are presented in ''Appendix C''. Here we compared 2nd and 4th Italian readers with 3rd and 5th English readers which are approximately matched for chronological age. Separate ANOVAs were carried out for the different types of reading errors as described for List 1, with the exception that the type of stimulus factor contrasted irregular vs regular words. The results of these ANOVAs are presented in detail in ''Appendix D''.
A synthesis of these analyses is presented in Table 3. Inspection of the table indicates several major findings: • Italian-reading children showed more soundingout behaviors with and without correct responses, and made more syllabications with and without additional errors; • English-reading children made more omissions-I do not know responses, phonological-visual errors, word substitutions, morphological and semantic errors; • There was no language difference for fragments; • As expected, regularization errors were more frequent in List 2 than in List 1 particularly in the case of stress assignment errors (see ''Appendix C''); • Italian-reading children made more regularization errors than English-reading children, but note that these were stress assignment errors for the Italian readers and phonologically plausible errors for the English readers; crosslinguistic differences were larger for younger than older children; • Wherever significant, the effect of age indicated a reduction of errors with older age/greater reading experience except for sounding-out behavior (with correct responses) which tended to increase with age. • Word substitutions, morphological and semantic errors were very rare in Italian readers, while they were frequent in English readers and modulated by age (with higher rates in younger children) and by type of stimulus (with higher rates for irregular words). • In English-reading children, phonological-visual errors decreased with age and were more frequent for irregular than regular words, especially for younger children; for Italian-reading children, these errors decreased with age but without a significant difference between the two types of words.
A second set of analyses compared English-and Italian-reading children matched for age on different types of wordERs or nonwordERs. These ANOVAs are presented in detail in ''Appendix D''; a synthesis of these analyses is presented in the lower part of Table 3. NonwordERs were more frequent in younger (10.83%) than older (3.31%) children. Children produced more nonwordERs with irregular (20.28%) than regular words (8.31%). There was no main effect of language. However, nonwordERs reduced with age/experience more in the case of irregular words, particularly among English-reading children (as indicated by the significant type of stimulus 9 language 9 age interaction). WordERs were also more frequent in younger (4.65%) than older (2.13%) children. There were more wordERs in the case of irregular (4.84%) than regular (1.94%) words. English-reading children made many more wordERs (5.53%) than Italianreading (1.25%) children. The type of stimulus 9 age 9 language interaction showed that the effect of regularity was present in English-reading children for whom it reduced with age/experience while it was small and not significant in Italian readers (see Fig. 5).
List 2: Words varying for regularity: grade-match comparisons Here, we compared children matched for number of years of schooling (i.e., 5th grade Italian-and Englishreading children). A synthesis of these analyses is presented in Table 4. Inspection of the table indicates a few major findings: • Italian-reading children made more sounding-out behavior (with correct responses); • English-reading children made more phonological-visual errors, regularization with phonologicalvisual errors, and word substitutions. • Semantic errors were absent in Italian-reading children with both regular and irregular words; in English-reading children they were more frequent with irregular than regular words.
Another set of analyses compared English and Italian readers matched for schooling (i.e., 5th grade Italianand English-reading children) for wordERs and non-wordERs. These ANOVAs are presented in detail in ''Appendix D''; a synthesis is shown in the lower part of Table 4. NonwordERs were more frequent for irregular (9.00%) than regular words (2.98%). There was no difference between the two language groups. WordERs were more frequent in English-(3.51%) than in Italian-reading (0.69%) children. There were more wordERs for irregular (3.20%) than regular (1.00%) words. The type of stimulus 9 language interaction was not significant.
List 2: Words varying for regularity: comments List 2 included only low frequency words and examined the effect of regularity. Error rates were generally higher in English-than Italian-reading children, in particular, phonological-visual errors, as well as all errors indexing a reliance on a lexical procedure, such as word substitutions, morphological  and semantic errors. These errors decreased with age and were more frequent with irregular than regular words, especially in younger children. As with List 1, Italian-reading children made more multiple attempts to the target, with a higher number of sounding-out behaviors (with and without correct responses) and syllabications (with and without error). NonwordERs reduced with age/experience more for irregular words, particularly for English readers.
WordERs were more frequent in English-reading children (particularly in the case of irregular words) than Italian readers. The regularity effect decreased with age/experience in English-reading children and was minimal in the Italian readers.
Overall, the difficulty of the orthography made English-reading children more error prone. Results also indicated different reading strategies across orthographies: i.e., a reliance on a more holistic procedure in English readers (possibly based on lexical-semantic retrieval) and an effortful and progressive approach to the target in Italian readers. Thus, the present findings generally confirm the results of List 1, with the presence of only low frequency words in this list making the cross-linguistic differences generally more evident. The specific interest for this list was to examine regularization errors. As expected, they were more frequent with irregular than regular words. This pattern was evident even in a very regular orthography such as Italian where irregularity only affects stress pattern in multisyllabic words. The number of stress assignment errors made by Italian-reading children was actually higher than the number of regularization errors made by English-reading children, especially when younger samples were compared. List 2 irregular words were selected with the aim to generate as many regularization errors as possible and results are consistent with this aim. However, the intrinsic different nature of the stimuli in the two languages suggests caution in any direct quantitative comparison. Further comments will be presented in the ''Discussion'' section.
As in the case of List 1, errors and multiple attempts generally decreased with age except for sounding out behaviors with correct responses that increased with age.
In the grade-match comparisons with 5th grade children, crosslinguistic differences were still evident showing a higher rate of sounding-out behaviors (with correct responses) in Italian-reading children, and more word substitutions, semantic errors, phonologicalvisual errors, and regularizations with phonologicalvisual errors in English-reading children. As in the agematched comparisons, cross-linguistic differences in nonwordER were not evident, while wordERs were more frequent in English-than Italian-reading children.

Discussion
The experimental results indicated that the error coding scheme we adopted effectively captured reading difficulties in both Italian-and English-reading children. The Italian-reading children showed a prevalence of multiple attempts characterized by a slow and progressive approach to the target (which may eventually be correctly read or not). The English children, instead, showed a prevalence of word substitutions and non-word lexicalizations. This tendency was confirmed when we examined the type of errors produced in terms of resulting in either a real word or a non-word. Again, there was a clear tendency to produce more wordERs (but not nonwordERs) in English-than Italian-reading children.
This general pattern of findings was similar in both lists examined, and in the age-as well as in the gradematched comparisons. The former set of analyses was more sensitive, possibly because of the range of ages considered. Therefore, it appears that the crosslinguistic differences cannot be easily explained in terms of age or reading experience instead pointing to more fundamental differences in the progression towards reading proficiency in different languages. However, a limitation of this study is the absence of grade comparison for children at the early stage of literacy acquisition. Furthermore, as stated above, the age-matching was imprecise presumably because of the different procedure for school admittance in first grade in England and Italy.
The literature on cross-linguistic comparisons has already indicated that the nature of the orthography plays an important role in shaping the reading profile of children with and without a reading disorder (Seymour et al., 2003), as well as the ability to learn new words (lexical learning; see Marinelli et al., 2020). The present study extends previous finding by showing the importance of error analyses. We have extended previous investigations on Dutch (Bakker, 1992;Hendriks & Kolk, 1997) and Italian readers (Trenta et al., 2013) by comparing the errors made by English and Italian readers using a comparable coding system. Results have shown that error profiles are quite different in different orthographies. However, even in a regular orthography, types of errors are informative of reading difficulties.
English-reading children, like Italian-reading ones, showed behaviors such as sounding-out or syllabications; however, they did so much more rarely. This is consistent with the proposal that children from regular orthographies (such as German) start to read sublexically and develop the ability to access the lexicon only at a later stage. In contrast, children learning to read irregular orthographies (such as English) would rely on a direct, lexical access route right away (Landerl et al., 1997;Wimmer & Goswami, 1994), thus becoming skilled in lexical reading earlier on (e.g., Marinelli et al., 2021). The present findings are in keeping with this general proposal and confirm the findings we previously reported on reading RTs and spelling (see Marinelli et al., 2016 andMarinelli et al., 2015, respectively). In agreement with Hendriks and Kolk (1997), our results underscore that the presence of sounding-out behaviors does not merely identify a generic difficulty, but rather specifically points to a reliance on the phonological (sub-lexical) routine. Results of the present study are in keeping with the ''central'' nature of the sounding-out behavior (as originally noted by Fairbanks, 1937). Thus, soundingout responses were closely associated to the nature of stimulus, i.e., they were produced most often in reading non-words and rarely when reading high frequency words. Children approach learning to read in regular and irregular orthographies with a differential reliance on lexical and sub-lexical routines, and the use of a coding scheme, such as the one used here, is effective in capturing this difference.
Interestingly, we were able to capture regularization errors in both languages. List 2 was specifically devised to maximize the presence of such errors. In English, this typically occurs when a child reads an irregular word based on grapheme-to-phoneme conversion yielding a so-called ''valid'' or phonologically plausible error (Temple, 1985). In Italian, the only form of irregularity in reading is given by the assignment of stress in words with three-syllables or more which is not governed by rule. Both the Italian and the English versions of List 2 were effective in producing regularization errors and, in fact, the Italian-reading children made more regularization errors in terms of stress-assignment than the Englishreading children made in terms of conversion rules. Given the different nature of the stimuli, however, any direct comparison in rate of regularization across the two languages is not possible. More generally, one should keep in mind that, despite best efforts, matching stimuli in two languages will always have limitations. For example, the syllabic structure of English and Italian words is quite different (Burani et al., 2017;Perfetti & Harris, 2017), and this may influence the likelihood to detect syllabications. Furthermore, neighborhood size is larger in English than in Italian orthography. This creates a greater opportunity to produce word substitutions.
In summary, the pattern of reading errors observed might be influenced by at least two general orders of factors. On the one hand, it would depend upon the structural characteristics of a language, so that some errors (such as syllabications and word substitutions) will be more likely in Italian or English, respectively. On the other hand, the characteristics of a language and its orthography will shape the modality of reading acquisition. Thus, a language, such as English, where a large proportion of words are irregular, and there is a large availability of orthographic neighbors, will foster a strong reliance on a lexical-access procedure even at early stages of reading acquisition. An orthography, such as Italian, where all words are regular (i.e., obey orthographic rules, apart from stress assignment), there is large proportion of long and multi-syllabic words, and orthographic neighbors are relatively few (largely confined to short words), will favor a greater reliance on sub-lexical procedures, at least in the early phase of acquisition. Notably, the relative weight of these two factors cannot be easily disentangled based on empirical data such as the ones described here, and further ad hoc research would be needed to this aim.
It is well known that an effective measure of reading proficiency in regular orthographies is provided by reading speed (Wimmer & Schurz, 2010). However, accuracy measures may also be important provided that a comprehensive error classification scheme is used as shown here (see also Bakker, 1992;Hendriks & Kolk, 1997). With this approach, clear cross-linguistic differences emerge, such that Italianreading children show a predominance of errors characterized by a slow and progressive approach to the target while English-reading children present a predominance of word substitutions (as well as nonword lexicalizations). From a clinical perspective, error analyses may add important information to the diagnosis of reading difficulties especially in the early phases of reading acquisition.
The present findings may have important clinical implications. While diagnosis in regular orthographies is predominantly based on speed measures, error analyses may add relevant information to the diagnosis of reading difficulties especially in the early phases of reading acquisition. Focusing on accuracy will allow characterizing the reading deficit in children speaking a consistent orthography, provided that the coding scheme considers the production of multiple attempts. Future research should examine whether error analyses can increase our ability to capture improvements after the administration of rehabilitation programs in children with reading difficulties. Moreover, the analysis of errors profile might also be informative of crosslinguistic differences in the reading profiles shown by dyslexic children in consistent and inconsistent orthographies.
Funding Open access funding provided by Università di Foggia within the CRUI-CARE Agreement. The was partially supported by the grant PRIN (Research Project of National Interest) n. 20128YAFKB_004 (prof. Pierluigi Zoccolotti). The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Data availability Data archiving is not mandated but data will be made available on reasonable request.
Code availability Not applicable.

Declarations
Conflict of interest On behalf of all authors, the corresponding author states that there is no conflict of interest.
Ethics approval This research complies with the dictates of the Declaration of Helsinki and has been approved by the local committee of Departments and school authorities. It has also received approval from the Research Committee of Aston University and the Ethics Committee of IRCCS Fondazione Santa Lucia Rome (Prot. CE-PROG.480).
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Appendix A
Means and SDs for all error types (List 1: age comparison). Data indicate average percentage error rates and are separately presented as a function of grade and language. Legend: LFW = Low frequency words; HFW = High frequency words