Introduction

Mapping letter strings (i.e., oral or written words) to concepts is a major part of young children’s language acquisition. For the mapping of letter strings to concepts, children use perceptual and linguistic information in quantitatively different combinations depending upon their age, the type of word, and the situation. For instance, children presumably acquire the basic meaning of bread in situations where there is both bread and talk about bread. In contrast, children probably acquire the meaning of the word decade via inference from such spoken or written text as “Mother died on July 12, 1998, and father on July 12, 2008, exactly a decade in between!”. According to Piccin and Waxman (2007), the disproportionately larger number of nouns as opposed to verbs in the vocabularies of young children can be explained by the fact that the acquisition of verb meanings requires the ability to cull linguistic (i.e., syntactic) information, which is an ability that very young children have not yet mastered. The meanings of many nouns, in contrast, can be largely acquired via perceptual information.

Children sometimes cannot use perceptual information to discover and attach meaning to unfamiliar letter strings simply because such perceptual information is not available. Unfamiliar letter strings in written text are an example of such. To discover and attach meaning to an unfamiliar letter string in written text, the reader must draw upon linguistic and conceptual information from the surrounding text or linguistic and conceptual information derived from the unfamiliar letter string itself, which is a process that we explore in the present research. A letter string can yield linguistic and conceptual information, for instance, through the morphemes it contains.

Studies have shown both advanced and less advanced language users to be sensitive to morphemes in not only written words but also written pseudowords when decoding (Burani, Marcolini, de Luca, & Zoccolotti, 2008; Burani, Marcolini, & Stella, 2002; Clin, Wade-Woolley, & Heggie, 2009; Kavé & Levy, 2005) and when spelling (Deacon & Bryant, 2006a, b; Kemp, 2006; Sénéchal, Basque & Leclaire, 2006). Pseudowords containing morphemes are read aloud more quickly and accurately, accepted as words more often, and spelled correctly more often than pseudowords containing only pseudomorphemes. Language users also use their knowledge of morphemes to attach meaning to unknown words. Baumann and his colleagues successfully taught children from grade 4 to 8 the meaning of prefixes and suffixes and trained them to use this knowledge for acquiring the meaning of unknown words in texts (Baumann, Carr Edwards, Boland, Olejnik & Kame’enui 2003; Baumann et al., 2003).

Other studies have shown that the number of orthographic neighbors a word has (i.e., words that are written the same as the target word except for one letter, e.g. cat: cap, mat, cut; Coltheart, Davelaar, Jonasson, & Besner, 1977) plays a role in the processing of words and pseudowords. Words with more neighbors are named more quickly and decided to be words more quickly than words with fewer neighbors while pseudowords with more neighbors are judged to be words more often than pseudowords with fewer neighbors (Andrews, 1997; Carreiras, Perea, & Grainger, 1997). Next to the number of neighbors of a target word, the frequency of the neighbors of a target word as compared to the frequency of the target word itself, is relevant (frequency measured in terms of the number of occurrences in a given corpus). Words with more frequent neighbors are processed slower than words with only less frequent neighbors (Carreiras et al., 1997; Grainger & Segui, 1990). Pseudowords derived from high frequency words are accepted more often as words than pseudowords derived from low frequency words (Marcolini, Burani, & Colombo, 2009).

In many of the studies that have addressed the role of morphemes and orthographic neighbor words in the processing of words and pseudowords, the length of the items is controlled for as this has been found to influence the speed and accuracy of word processing (e.g., Juhasz & Rayner, 2003; Rayner, Sereno, & Raney, 1996). In other studies, however, no such effects of word length have been found (e.g., Bijeljac-Babic, Millogo, Farioli, & Grainger, 2004). In a lexical decision study by New, Ferrand, Pallier, and Brysbaert (2006), moreover, a U-shaped pattern of responding was found for the effects of words that ranged from 3 to 13 letters in length: Word length was facilitatory when the length was 3–5 letters, neither facilitatory nor inhibitory when the length was 5–8 letters, and inhibitory when the length was 8–13 letters.

Few studies have investigated the relative contributions of all three factors discussed above to performance on word tasks or pseudoword tasks. In the first part of the research reported here, the contributions of morphemic structure, orthographic neighborhood size, and word length to the number of errors on pseudowords in a lexical decision task were examined separately for children in grades 2 through 6. Older children were expected to be more sensitive to the morphemic structure of the pseudowords and younger children were expected to be more sensitive to orthographic neighbors. That is, the younger children were expected to be misled by the similarity of the pseudowords to their orthographic neighbors but not by the morphemic information present in the pseudowords since younger children usually have relatively little morphemic knowledge. In contrast, the older children with their more advanced language skills were expected to be able to distinguish a pseudoword from its orthographic neighbor but their relatively greater morphemic knowledge in combination with a still small vocabulary was expected to lead them to treat pseudowords containing morphemes as words. The age at which the children’s responding shifts from one pattern to the next and the possible role of word length in their responding, however, are open to question.

Lexical decision tasks yield only limited knowledge about how children try to construct the meanings of unfamiliar words. Therefore, in the second part of the research reported on here, the exact manner in which the children attributed meaning to the pseudowords that they considered words was explored. This was done by explicitly asking the children to state what the pseudowords that they had not crossed out meant.

Method

Participants

Participants were 216 elementary school children; 40 from grade 2 (18 girls), 40 from grade 3 (20 girls), 46 from grade 4 (26 girls), 47 from grade 5 (22 girls), and 43 from grade 6 (24 girls). The children all came from an elementary school located in a middle-sized city in the south of The Netherlands. In that part of The Netherlands, a strong regional dialect is spoken not only privately but also often publicly. Standard Dutch is nevertheless the language of instruction at school. The school in question is a Montessori school with almost all middle to high socio-economic status children. Those children known to have dyslexia performed the task along with their peers, but the results for these children were removed for the analyses and the children with dyslexia are not included in the numbers of participants provided above. The task was administered by an education student who conducted the present research as part of her Master’s thesis.

Materials

A lexical decision task that was developed for a 4-year longitudinal study of the vocabulary development of young deaf and hearing school children in The Netherlands (Coppens, Tellings, Schreuder & Verhoeven, submitted) was used in the present research. The task involves 90 words and 90 pseudowords. The 90 words were randomly selected from a list of words with a high frequency and a high dispersion (i.e., spreading over different text sources) in three achievement tests, namely, a national achievement test administered to students during the last year of elementary school in 2004, 2005, and 2006.Footnote 1

The pseudowords were constructed via selection of a word from the Celex database (Baayen, Piepenbrock, & Van Rijn, 1993) that was similar with regard to consonant-vowel structure, word class, and—if possible—length and frequency to each of the 90 words, and then changing the selected words by one or two letters. The construction of the pseudowords was done in such a manner that the phonotactic and orthographic constraints of the Dutch language were maintained. Five pseudowords were homophones but not homographs. The words and pseudowords were next mixed randomly and put into four lists with different orders. All participants saw all words.

An overview of the number of morphemes in the pseudowords, number of orthographic neighbors for the pseudowords, and number of letters in the pseudowords is presented in Table 1. The number of neighbors was calculated using N-Watch (Davis, 2005). The pseudowords also differed in the number of syllables, but a significant correlation between the number of syllables and number of letters of .935 (p = .000) allowed us to simply adopt the number of letters as the relevant predictor. Since the pseudowords were constructed by altering one or two letters of a word and since, evidently, pseudowords have a frequency of zero, for each pseudoword in the task there existed at least one more frequent word that differed only one or two letters from that pseudoword. Therefore, neighborhood frequency was not investigated as a predictor in this study.

Table 1 Number of morphemes, neighbors, and letters of the pseudowords

Procedure

The participants performed the lexical decision task individually in a separate room with the experimenter sitting in the same room. There was no time limit placed upon the children’s responding as we were interested in the responses of all participants to all of the letter strings. The experimenter explained to the child that there are many ‘real words’ but that one can also make ‘nonsense words’ and presented a few examples. The child was then asked to cross out each ‘nonsense word’ on the list presented to him/her.

The experimenter next asked the child to read aloud a number of the pseudowords that the child had incorrectly not crossed out and some of the words that the child had correctly not crossed out. The latter served as fillers and prevented the child from noticing that the experimenter was only interested in the pseudowords. Thereafter, the experimenter asked the child “Can you tell me what this word means?” but only when the child had read the word or pseudoword correctly aloud. The child was never told that the pseudowords were not real words, and the experimenter never pronounced any of the items for the child.

Results

Outcomes of the lexical decision task

Only the scores on the pseudowords were analyzed. The distributions of the predictor variables were somewhat skewed, as can be seen from Table 1, and multicollinearity was suspected. Whether or not the predictor variables met the conditions for a multiple regression analysis was therefore examined. A negative correlation between the number of morphemes and the number of neighbors of the pseudowords was found (r = −.571); a negative correlation between the length of the pseudoword and the number of neighbors (r = −.691); and a positive correlation between the length of the pseudoword and the number of morphemes (r = .791) (all significant at the .000 level). In other words, the longer pseudowords had more morphemes but fewer neighbors than the shorter pseudowords. The correlations were all smaller than .90, however, which is suggested to be a viable upper limit for collinearity in multiple regression analyses (Belsley, 1991, p. 28; Cohen, Cohen, West, & Aiken, 2003, p. 423). For each of the dependent variables (i.e., grade 2 through 6 responding), we also treated the amount of variance not explained by other predictors or the so-called tolerance as a collinearity statistics; the tolerance should be higher than .10. The tolerance coefficients for each of the dependent variables are presented in Table 2.

Table 2 Results of multiple regression analyses for grades 4, 5, and 6

Separate multiple regression analyses were conducted for each grade with the number of errors as the dependent variable. We had clear expectations regarding the roles of number of morphemes and the number of neighbors for the younger and older children, but the exact age at which a shift from one pattern of responding to the other would be apparent and the influences of word length (i.e., the number of letters in the word) were open to question. For this reason, the number of morphemes, the number of neighbors, and the number of letters were entered simultaneously as predictors into the regression equation. All three of the predictors were only found to be significant and thus valid predictors for the grade six results. For the grade five results, the number of morphemes and the number of neighbors were found to be significant predictors. For the grade four results, only the number of neighbors proved significant. For the grade two and three results, no valid multiple regression models could be computed although coefficients for the number of neighbors were significant in both cases (Beta = .311, t = 2,162, p = .033, tolerance = .521 for grade two; Beta = .335, t = 2,329, p = .022, tolerance = .521 for grade three).

Two additional regression analyses were next conducted. In the first, the three predictor variables for the grade six data were entered stepwise as opposed to simultaneously. In the second, the number of morphemes and the number of neighbors were entered in a stepwise manner as predictor variables for the grade five data. These regression results, i.e. for the fourth, fifth, and sixth grade data, are presented in Table 2.

Histograms showed the residuals to be normally distributed in all cases. The residuals had a constant variance in all cases. The Beta values were all positive, which shows more errors to occur for those pseudowords with more morphemes, more neighbors, and more letters (i.e., a longer length). Pearson correlations (i.e., R squares) show that, for the grade four data, the number of neighbors explains 10.5% of the total variance in the children’s responding. For the grade five data, the largest part of the variance is explained by number of morphemes (20.9%); inclusion of the number of neighbors explains an additional 6.2% of the total variance in the children’s responding. For the grade six data, the largest part of the variance in the children’s responding is explained by the number of morphemes (31.1%); inclusion of the number of neighbors explains an additional 3.8% of the total variance; and subsequent inclusion of the number of letters explains an additional 3.6% of the total variance.

Children’s attribution of meaning to pseudowords

Due to the design of the task, different numbers of errors per child, different items missed per child, and no discussion of all errors per child, quantitative analyses could not be undertaken for the children’s attribution of meaning to pseudowords perceived as real words. In the following presentation of the descriptive results, Dutch pseudowords are presented in uppercase, Dutch words are presented in italics, and the English translations for the items are presented in brackets.

The number of errors read correctly aloud and then asked about their meaning was 276 for grade two, 213 for grade three, 372 for grade four, 315 for grade five, and 337 for grade six. Different types of answers were given by the children. First, there were some answers based upon the homophonic character of a pseudoword and a word (such as BANT and band; some children gave the meaning of band, apparently thinking that band should be spelled b-a-n-t; three pseudowords where of this nature). Second, there were self-corrections followed, in some cases, by an indication that the child had misread the letter string as a word during the lexical decision task (“Oh… this isn’t a word after all… I thought it was……”). Third, there were “do not know” responses: “I know it is a word, but I don’t know what it means.” After subtraction of these cases from the number of errors, 50 attributions of meaning to a pseudoword remained for the grade two children, 37 for the grade three children, 99 for the grade four children, 138 for the grade five children, and 90 for the grade six children. The children’s attributions of meaning in such cases could then be divided into two broad categories.

Many of the children’s answers showed them to look into their mental lexicons for a verbal label that resembled the pseudoword and then map this concept onto the pseudoword. This does not come as a surprise as all of the pseudowords in this study closely resembled an existing word. The pseudoword AANTASSEN [“to onbag”], for example, has the orthographic neighbors aanpassen [adapt] and aantasten [affect, harm], and both meanings were indeed provided for this pseudoword at times. It should be noted, however, that only the meanings of those pseudowords that the children read correctly aloud were inquired about, which means that the children were clearly providing the meaning of the pseudoword AANTASSEN in the above case and not the meanings of the words aanpassen or aantasten. It thus appears that some children have not as yet mapped the right written or spoken labels to the right concepts in all cases. In some cases, moreover, the children appeared to find a label that differed more than one or two letters from the relevant pseudoword in their mental lexicons. An example of this is the pseudomorpheme FRAS, which was given the meaning of verrassing [surprise] by a number of the children. Another example is the pseudoword AANTASSEN, which was given the meaning of toetasten [help oneself, dive in] or betasten [feel] by a number of the children. In these cases, the children appear to confuse the pseudoword with a real word that has a similar stem. This first category of meaning attribution for pseudowords is therefore referred to as “concept label errors.”

The second broad category of meaning attributions produced by the children in the present research could be entitled “new concept label constructions.” These responses showed the children to map the pseudoword onto a concept that was newly created by themselves. While the children could be seen to do this in a number of different manners, they mostly drew upon their knowledge of the morphemes occurring in the pseudoword. One example is again AANTASSEN [“to onbag”], which was said to mean “that you hang your bags up,” “that you wear bags,” “that you put on a bag,” or “what you can put onto your bag.” A second example is LEEFHEBBEN [“to have live”, ‘live’ in the first person singular, present tense], which was taken to mean “that you want to have everything” or “that you want to have something in your life.” Some other meaning attributions reflected a combination of morphological subdivision and non-morphological subdivision of the pseudoword. To the pseudoword BESCHEKKEN, the meaning “protective fences” was attributed. Apparently, the child first split the pseudoword into BESCHE|KKEN and then into BESC|HEKKEN. BESCHE is the first part of beschermen [to protect] but not a Dutch morpheme; hekken is the plural of the word “fence” in Dutch. Some of the new concept label constructions also involved homophone errors or concept label errors. For instance, the meaning “you are very sure that you hate someone” was attributed to ZEKERHAAD [“certain hate”] while the Dutch word for hate is haat and not HAAD. Haat and HAAD are homophones in Dutch.

Both the concept label errors and new concept label constructions occurred for pseudowords with constituent morphemes and pseudowords without constituent morphemes. For all of pseudowords with constituent morphemes considered all of the grades, however, 82 concept label errors and 224 new concept label constructions occurred. For all of the pseudowords without constituent morphemes considered across all the grades, 39 concept label errors and 69 new concept label constructions occurred. Examples of the new concept label constructions created for pseudowords without constituent morphemes are HAST, which was given the meaning of aarde [world, earth, soil], and ILS, which was given the meanings of boos [mad], mist [fog], orgaan van een paard [organ of a horse], eten voor vogels [bird food], and lichaamsdeel [body part]. There are no apparent Dutch words and no words in the regional dialect that resemble the meanings attributed to the pseudomorphemes HAST and ILS in any way. An example of a concept label error for a pseudoword without constituent morphemes is KECHT, which was given the meaning of knecht [servant].

Both concept label errors and new concept label constructions occurred in all of the grades studied.

Table 3 gives the division of the two types of attributions for each grade, both in absolute numbers and in percentages of the total number of attributions.

Table 3 Division of two types of attributions for each grade in absolute numbers and in percentages

Discussion

In the present research, grade two through six elementary school children were asked to perform a lexical decision task that included 90 pseudowords, constructed by changing one or two letters in a Dutch word. Multiple regression analyses showed the number of morphemes, the number of neighbors, and word length to jointly explain about 36% of the variance in the number of errors (i.e., the number of pseudowords not crossed out and thus considered words) for the children in grade six. The number of morphemes accounted for the largest part of the variance explained by far. For the children in grade five, the number of morphemes and the number of neighbors jointly explained about 20% of the variance in the children’s responding. The number of morphemes again accounted for the largest part of the explained variance, but the ratio of the two predictors for grades five and six varied: a relatively larger part of the variance was explained by the number of neighbors for the fifth grade students than for the sixth grade students. For the students in grade four, only the number of neighbors was found to be a significant predictor of their lexical decision performance on the pseudowords; the number of neighbors accounted for about 7% of the variance in the children’s responding.

The aforementioned pattern of responding confirms our expectation with regard to the lexical decision performance of the older versus younger children. The older children indeed appeared to be more hindered by the morphemic structure of a pseudoword than by its orthographic neighbors. This shows that they have morphological knowledge of the language. The younger children, in contrast, were less hindered by the morphemic structure of a pseudoword and more hindered by its orthographic neighbors. Although a valid regression model could not be computed for the grade three and four results, regression coefficients suggest that the children in these grades are not hindered by the morphemic structure of pseudowords but, rather, by the orthographic neighbors of pseudowords. The finding that a valid regression model could not be computed for grades three and four and that less variance was explained in the lower elementary grades than in the higher elementary grades (see Table 2) is probably due to the influence of other factors on the performance of particularly the younger children—including decoding errors, fatigue, and a lack of vocabulary knowledge, which can all lead to more guessing.

Word length only explained a significant amount of the variance in the responding of the grade six children and then only a very small amount (about 3%). This is probably due to the fact that the number of morphemes in a word correlated highly with the number of letters in the word although the correlation of .791 was within the acceptable limits for collinearity.

In the second part of the research reported on here, the elementary school children were asked about some of the pseudowords that they had mistakenly not crossed out and thus taken to be words. With this questioning, we explicitly asked for semantic knowledge: did the children actually know the meaning of (some of) the morphemes in the pseudowords. In addition to homophone errors, self-corrections, and “do not know” responses, the meaning attributions provided by the children confirmed the lexical decision regression results. That is, the children’s answers were influenced by morphemes and orthographic neighbors. The answers categorized as concept label errors clearly reflect the influence of orthographic neighbors: The children mentioned the meaning of a orthographic neighbor when asked about the meaning of the pseudoword even though they had just read the pseudoword correctly aloud. The answers categorized as new concept label constructions show the children to actively use their knowledge of morphemes to attribute meaning to otherwise unknown words. This was not only the case for the older children but also for the younger children (i.e., those in grades two and three).

Table 3 shows that children in grade 2 to grade 4 make relatively more concept-label errors than new concept-label constructions, for children in grade 5 and grade 6 this is the other way around. Furthermore, in the highest two grades we see a sharp increase of the relative percent of new concept-label constructions, from less than 10% in grades 2–4 to around 20% in grades 5 and 6. Probably this is due to the growing morphemic knowledge and the growing language experience, resulting in more linguistic creativity in the older participants. The developmental pattern in concept-label errors is more diffuse. Notice that the design of the meaning attribution task did not allow statistical analyses of the children’s answers. In future studies, a different research design should therefore be adopted to examine the occurrence of concept label errors and new concept label constructions for pseudowords as a function of the number of constituent morphemes, the number of orthographic neighbors, and the age of the participants.

The pseudoword items used in the present research came from a lexical decision task in which a domain-referenced criterion was used to select the items. That is, performance was measured against a well-defined body of knowledge: Words that children are supposed to know by the end of elementary school. These were selected for use in the lexical decision task. Furthermore, the unfamiliar letter strings—in this case, pseudowords—looked much like familiar words, which is often the case when children encounter unfamiliar letter strings in text. However, it cannot be assumed that children use exactly the same strategies for the attribution of meaning to unfamiliar words in texts as used to answer the queries of the experimenter in the present research. A one-on-one laboratory situation is obviously different from a regular class or rich home situation. Similarly, discovering the meaning of a written letter string while silently reading a text is presumably different from the construction of a meaning when asked to do so by an experimenter for a written letter string presented in isolation. Nonetheless, the results of the present research still shed light on just how elementary school children attribute meaning to unfamiliar words.

Above we referred to how the participants were ‘hindered’ by linguistic neighbors or by known morphemes of words. Of course, knowledge of many words (i.e., neighbors of other words) and of morphemes usually is helpful when reading text in daily life. This study shows that children in elementary school, at least from grade 2 onwards, can actively use their knowledge of morphemes to look for meanings of unknown words. The study thus corroborates bootstrapping theories of vocabulary development: children use their knowledge of the meaning of words in order to learn the meaning of new words (Gleitman, Cassidy, Nappa, Papafragou, & Trueswell, 2005). This could be actively stimulated by teachers as a means of expanding vocabulary knowledge. Furthermore, in all grades children tended to sometimes attribute the meaning of a linguistic neighbor to a pseudoword, or even the meaning of a word that differed (much) more than one letter from the pseudoword, and this was not caused by decoding mistakes. Apparently, children have rather loose connections between concepts and the corresponding linguistic labels, at least in some cases. Strengthening these connections is another way to expand vocabulary knowledge. This is a subject that requires more research; many studies investigate how children connect concepts with linguistic labels and vice versa, but not how robust these connections are.