A body of research suggests that words acquired earlier in life are processed faster than words learned later. This effect, called the age-of-acquisition (henceforth, AoA) effect, has been observed in various lexical tasks over the last 40 years (Juhasz, 2005) in both children and adults. AoA effect plays a significant role in word processing and should be used as a control factor in experiments in which different word stimuli are used. The goal of this article is to provide fully comparable subjective ratings of AoA obtained with the very same procedure for the same set of words, both nouns and verbs, across 25 languages from five different language families. To the best of our knowledge, this is the very first study comprising such a number of diverse languages. Previous studies were typically conducted in one language only or in a pair of languages. Opportunities for cross-linguistic comparisons of previous studies’ results were diminished by the fact that these studies also differed in terms of the list of words used and in other significant details of their procedures. In the present study, we also considered the potential effects of the participants’ age, education, number of languages known, and parental status on the AoA ratings.

AoA effect

A large number of studies have examined AoA, and most of the representative studies show an effect of AoA on different tasks performed by children and adults. These are summarized by type of task and language in Table 1. To date, the tasks in which the AoA effect has been evidenced for common words have been picture naming, word naming, object recognition, word category decision, semantic classification, associations, lexical decision, orthographic decision, and sentence reading. It is notable that most of the available studies to date have focused on AoA in a single language.

Table 1 Age-of-acquisition effects in different types of tasks in adults and children

Most of the studies were performed with adults, although three studies report child data (from 3 to 10 years of age) and two studies had teenagers as participants (from 11 to 17 years of age). In the majority of the studies with adults, only students were participants (e.g., Baumeister, 1984; Bonin, Fayol, & Chalard, 2001; Colombo & Burani, 2002; Holmes & Ellis, 2006; Juhasz & Rayner, 2006; Meschyan & Hernandez, 2002; Mobaghan & Ellis, 2002; Navarrete, Scaltritti, Mulatti, & Peressotti, 2013; Pérez, 2007; Turner, Valentine, & Ellis, 1998). However, some studies have contrasted either younger adults with older adults (Barry, Johnston, & Wood, 2006; De Deyne & Storms, 2007; Morrison, Hirsh, & Duggan, 2003; Sirois, Kremin, & Cohen, 2006) or adults suffering from impairments with control groups (Alzheimer’s disease: Lambon Ralph & Ehsan, 2006; Lymperopoulou, Barry, & Sakka, 2006; cognitive impairments: Morrison et al., 2003; aphasia: Catling, South, & Dent, 2013).

Subjective and objective AoA

Subjective AoA

In the majority of AoA studies, subjective AoA ratings were obtained by asking adult native speakers to estimate when they had learned given words, by indicating either the exact age (in years) or an age range on a scale. This procedure has been used widely for both English and other languages, such as Chinese, Dutch, French, German, Greek, Icelandic, Italian, Japanese, Persian, Portuguese, Russian, Spanish, and Turkish (see Table 2 for studies on each language). Although there are concerns regarding the validity of such subjective ratings, in terms of adults’ inability to remember the exact age of word learning (e.g., Morrison, Chappell, & Ellis, 1997), many studies have found these estimates to be predictive of various processing variables in the different types of tasks listed above (a list of references is presented in Table 1).

Table 2 Existing subjective and objective age-of-acquisition norms in different languages

Objective AoA

Objective measurement of AoA has been based on spontaneous speech samples of children of various ages. Once the samples are transcribed and the words occurring in the transcriptions are counted by age groups, it is possible to estimate the AoA of the words present in the samples. The age at which a given word appears in the speech of the majority of children or reaches an arbitrarily set criterion of cumulative frequency is identified as its AoA. For instance, Piñeiro and Manzano (2000) defined the AoA of a word as the age range in which the word’s cumulative frequency reaches 10% of its total frequency (in a given sample). They analyzed transcriptions of spontaneous speech of 200 children 11 to 49 months of age (divided into 11 age intervals of 2–4 months), and for each word they calculated its overall token frequency in the sample (total frequency). AoA was calculated only for words whose total frequency equaled at least 10 (298 word types). They assessed cumulative frequency by age intervals, and the lowest age interval in which a criterion of 10% of the total frequency for a given word was reached was assumed to be this word’s AoA. They differentiated AoA from the first time uttered (FTU), explaining that the FTU indicates the age interval within which a specific word may appear for the first time, whereas AoA shows approximately the age at which the same word begins to receive a determined meaning in the active vocabulary of the child (Piñeiro & Manzano, 2000). However, the AoA norms estimated on the basis of the spontaneous speech production of children may (1) not include all of the vocabulary utilized by children, (2) depend strongly on the context of data collection, and (3) be limited in that it does not include words comprehended but not yet produced by children.

Norms for the MacArthur–Bates Communicative Development Inventories (Fenson et al., 1993; Fenson et al., 2007) (henceforth, MB-CDI) also act as a source of information on the age at which children learn words. In the MB-CDI studies, parents of young children (from 8 up to 36 months of age, depending on the language) assess which of the words listed their children have comprehended and/or produced. On the basis of parental reports, it is possible to determine how many children in a given age range know the particular words. These indices allow one to establish the age at which the majority of children understand or say the items. The AoA ratings obtained by this procedure should be treated as quasi-objective, since they rely heavily on an indirect measurement of vocabulary knowledge: the parental report. Yet, MB-CDI in itself has been validated by independent direct testing of child vocabulary and was found to be highly reliable (e.g., Dale, 1991; Dromi, Maital, Sagi, & Bornstein, 2000; Heilmann, Ellis Weismer, Evans, & Hollar, 2005; Thai, O’Hanlon, Clemmons, & Fralin, 1999; Thordardottir & Ellis Weismer, 1996).

Another method to assess objective AoA is elicitation of children’s verbal production using picture naming (Morrison et al., 1997). In this procedure, participants are shown a set of pictures of common objects or activities that they have to name. To obtain the AoA, participants are classified by age, and the AoA of a given word is considered to be the mean age of the group in which the picture is correctly named with relatively high frequency (usually, equal to or greater than 75%). This method has been used in several studies focusing on a total of seven languages (see Table 2 for the detailed references): Chinese, English, French, Icelandic, Italian, Russian, and Spanish. Researchers examined different age ranges from 2 to 15 years, usually 2 to 11 years. Objective AoA ratings have also been calculated on the basis of word definitions provided by participants 5 to 21 years of age (Gilhooly & Gilhooly, 1980).

Although some researchers prefer to use objective ratings (e.g., Morrison et al., 1997), results obtained by the two methods have proven to be highly correlated, at least for some languages. Carroll and White (1973b) correlated subjective AoA ratings collected from 62 adult speakers of English with objective measures of AoA (ratings of how often different age groups use some words in reading and writing) and obtained a coefficient of .85. Gilhooly and Gilhooly (1980) found a correlation of .93 between the ratings of AoA provided by 70 psychology students and the standardized Crichton/Mill Hill vocabulary norms for children 5 to 11 years of age (Gilhooly & Gilhooly, 1980). Additionally, they reported a correlation of .84 between ratings and accuracy in a word-defining task in which children 5 to 13 years of age were asked to describe the meanings of words. Similarly, a correlation (r = .76) between subjective AoA and objective AoA (defined as the age at which 75% of children in a given age group knew the name for an object in a picture-naming task) was found by Morrison, Chappell, and Ellis (1997). Other studies (De Moor, Ghyselinck, & Brysbaert, 2000; Jorm, 1991; Lyons, Teer, & Rubenstein, 1978) have also provided evidence for the validity of subjective AoA ratings as a psycholinguistic variable.

Methodological aspects of AoA studies

Scales used in AoA studies

In the majority of subjective AoA studies, one of four types of scales have been used: an 11-point scale based on equivalent age, a 9-point scale utilized for the first time by Carroll and White (1973a), a 7-point scale introduced by Gilhooly and Logie (1980), or a 5-point scale. These scales were mostly used as variants of Likert-type scales (see the descriptions in Table 3) in studies in which norms for other psycholinguistic variables, such as familiarity, imageability, concreteness, meaningfulness, visual complexity, name and image agreement, and subjective frequency were collected in addition to AoA (e.g., Akinina et al., 2014; Alario & Ferrand, 1999; Bakhtiar, Nilipour, & Weekes, 2013; Barca, Burani, & Arduino, 2002; Bird, Franklin, & Howard, 2001; Bonin, Peereman, Malardier, Méot, & Chalard, 2003; Cuetos, Ellis, & Alvarez, 1999; Della Rosa, Catricalà, Vigliocco, & Cappa, 2010; Dimitropoulou, Duñabeitia, Blitsas, & Carreiras, 2009; Ferrand et al., 2008; Gilhooly & Logie, 1980; Liu, Hao, Li, & Shu, 2011; Liu, Shu, & Li, 2007; Manoiloff, Artstein, Canavoso, Fernández, & Segui, 2010; Moreno-Martínez, Montoro, & Rodríguez-Rojo, 2014; Nishimoto, Miyawaki, Ueda, Une, & Takahashi, 2005; Pind, Jónsdóttir, Gissurardóttir, & Jónsson, 2000; Raman, Raman, & Mertan, 2014; Salmon, McMullen, & Filliter, 2010; Shao, Roelofs, & Meyer, 2014; Sirois et al., 2006; Snodgrass & Yuditsky, 1996; Stration, Jacobus, & Brinley, 1975; Tsaparina, Bonin, & Méot, 2011; Vinson, Cormier, Denmark, Schembri, & Vigliocco, 2008). Other scales have sometimes been modified according to the objectives of the specific study. For example, Auer and Bernstein (2008) used an 11-point scale with the last point set at age 21, because they assumed that many of their stimuli would be assessed as being acquired after the age of 13 years.

Table 3 Most popular scales used in the studies on subjective age of acquisition

Other studies (e.g., Cuetos, Samartino, & Ellis, 2012; De Deyne & Storms, 2007; Della Rosa et al., 2010; Ferrand et al., 2008; Kuperman, Stadthagen-Gonzalez, & Brysbaert, 2012; Stadthagen-Gonzalez & Davis, 2006) have not used an explicit scale; rather, participants were asked to provide their subjective AoA directly in years—for example, to type the number “3” if they thought they had learned a given word at the age of 3 years, and “N” or “X” if they did not know the word at the time of data collection (Ferrand et al., 2008; Kuperman et al., 2012). Ferrand at al. argued that participants find the scaleless instructions easier to follow. Moreover, this kind of measure returns more precise information about the AoA of particular words.

Target/experimental question in subjective AoA studies

Most AoA studies discuss the exact form of the target question used to elicit the AoA ratings in far less detail than they discuss the scale used. A review of 54 publications revealed that the majority of the subjective AoA studies did not state the exact form of the question at all (Akinina et al., 2014; Alario & Ferrand, 1999; Alonso, Fernandez, & Díez, 2015; Bakhtiar et al., 2013; Barry et al., 2006; Bird et al., 2001; Bonin, Boyer, Méot, Fayol, & Droit, 2004b; Bonin et al., 2003; Bonin, Perret, Méot, Ferrand, & Mermillod, 2008; Cameirão & Vicente, 2010; Colombo & Burani, 2002; Cuetos et al., 1999; Cuetos et al., 2012; De Deyne & Storms, 2007; Della Rosa et al., 2010; Dimitropoulou et al., 2009; Johnston, Dent, Humphreys, & Barry, 2010; Lyons et al., 1978; Manoiloff et al., 2010; Marques, Fonseca, Morais, & Pinto, 2007; Moors et al., 2013; Moreno-Martínez et al., 2014; Nishimoto et al., 2005; Nishimoto, Ueda, Miyawaki, Une, & Takahashi, 2012; Raman et al., 2014; Schock, Cortese, Khanna, & Toppi, 2012; Schröder, Gemballa, Ruppin, & Wartenburger, 2011; Sirois et al., 2006; Stration et al., 1975; Tsaparina et al., 2011; Vinson et al., 2008; Walley & Metsala, 1992; Winters, Winter, & Burger, 1978). In the remaining articles, the wording “When do you think you learned this word?” is most frequently used (e.g., Auer & Bernstein, 2008; Barca et al., 2002). Some authors have reported the definition of word learning used in their studies (Kuperman et al., 2012; Moors et al., 2013; Shao et al., 2014; Stadthagen-Gonzalez & Davis, 2006), explaining that the AoA of a word is the age at which participants would have understood that word if somebody had used it in front of them, even if they did not themselves use, read, or write it at the time.

All studies have so far focused on participants’ own experience of word learning. This method may return ratings that overestimate the AoA of some relatively new words (e.g., a computer). So far, no study has used a question concerning adult participants’ opinions on the word learning of today’s children: “When do children learn this word?” To avoid task discrepancy in the ways that estimations were elicited, we followed the most frequent pattern of target questions (“When did you learn this word?”) in the present study. However, because we expected that the exact form of the target question might reveal differences in the estimations, we conducted a one-language control study in which a question on current children’s experience was used.

Word classes in AoA studies

The vast majority of both objective and subjective AoA ratings have been gathered for nouns only (e.g., Alario & Ferrand, 1999; Álvarez & Cuetos, 2007; Bakhtiar et al., 2013; Barbarotto, Laiacona, & Capitani, 2005; Barca, Burani, & Arduino, 2002; Barry, Morrison, & Ellis, 1997; Bonin et al., 2003; Cannard & Kandel, 2008; Carroll & White, 1973a, 1973b; Chalard, Bonin, Méot, Boyer, & Fayol, 2003; Cortese & Khanna, 2007, 2008; Cuetos et al., 1999; Cuetos et al., 2012; De Deyne & Storms, 2007; Della Rosa et al., 2010; Dimitropoulou et al., 2009; Ghyselinck, De Moor, & Brysbaert, 2000; Grigoriev & Oshhepkov, 2013; Iyer, Saccuman, Bates, & Wulfeck, 2001; Johnston et al., 2010; Liu et al., 2011; Lotto, Surian, & Job, 2010; Lyons et al., 1978; Manoiloff et al., 2010; Marques et al., 2007; Moreno-Martínez et al., 2014; Morrison et al., 1997; Nishimoto et al., 2005; Nishimoto et al., 2012; Pérez & Navalon, 2005; Pind et al., 2000; Raman et al., 2014; Salmon et al., 2010; Schröder et al., 2011; Sirois et al., 2006; Snodgrass & Yuditsky, 1996; Stration et al., 1975; Tsaparina et al., 2011; Winters et al., 1978). Other word classes have been included in only 17 studies (Akinina et al., 2014; Alonso et al., 2015; Bird et al., 2001; Bonin, Boyer, et al., 2004; Brysbaert, Stevens, De Deyne, Voorspoels, & Storms, 2014; Cameirão & Vicente, 2010; Colombo & Burani, 2002; Ferrand et al., 2008; Ghyselinck, Custers, & Brysbaert, 2003; Gilhooly & Hay, 1977; Gilhooly & Logie, 1980; Kuperman et al., 2012; Moors et al., 2013; Piñeiro & Manzano, 2000; Schock et al., 2012; Shao et al., 2014; Stadthagen-Gonzalez & Davis, 2006). However, in most of these studies, even if verbs or other word classes were included, nouns were still the dominating category (in terms of the number of items). Only two megastudies have included all possible word classes, comprising as many as 30,000 words: one for English (Kuperman et al., 2012) and one for Dutch (Brysbaert et al., 2014). The present study is the first that has aimed to make available AoA ratings for a balanced number of nouns and verbs in a wide range of languages, thereby making it possible to compare AoAs within both word classes cross-linguistically.

Word set size in AoA studies

The sizes of the word sets for which AoA ratings were collected have also differed between studies, from 80 (Barbarotto et al., 2005) to as many as 30,000 (Brysbaert et al., 2014; Kuperman et al., 2012), but mostly between 100 and 850 words (for 72% of the 64 studies reviewed). In some cases, the size of the data set depended on the number of pictures accompanying the study (e.g., the 260 pictures of the Snodgrass & Vanderwart, 1980, picture set have been used in Barry et al., 1997; Dimitropoulou et al., 2009; Pind et al., 2000; Raman et al., 2014; Snodgrass & Yuditsky, 1996; Tsaparina et al., 2011). In the present study, we used a limited set of 299 words, which had previously been used in a cross-linguistic naming study and had been shown to have the same meanings in 34 languages (Haman, Łuniewska, & Pomiechowska, 2015; Haman, SzewczykMieszkowska, et al., in preaparation).

AoA across languages

In the studies mentioned above, subjective AoA was estimated in 14 different languages, mostly Indo-European. For Germanic languages, data have been gathered for Dutch, English, German, Icelandic, and Norwegian. For Romance languages, data are available for French, Italian, Portuguese, and Spanish. Other Indo-European languages studied are Greek, Persian, and Russian. The only languages outside the Indo-European family so far that have AoA ratings are Chinese, Turkish, and Japanese (see Table 2).

However, no fully comparable ratings of objective or subjective AoA have been obtained with the very same procedure across languages. Some of the AoA studies are based on the same set of words linked to the Snodgrass and Vanderwart (1980) object pictures (e.g., Barry et al., 1997 [English]; Pind et al., 2000 [Icelandic]; Snodgrass & Yuditsky, 1996 [English]; Tsaparina et al., 2011 [Russian]). However, although the same set of words was rated in these studies, the data collection procedure varied. In the studies by Snodgrass and Yuditsky (1996) and Pind et al. (2000), participants were asked to rate when they thought they had learned the words that they saw accompanied by the Snodgrass and Vanderwart pictures (black-and-white version); in the study by Tsaparina et al. (2011), participants instead saw a colorized version of the pictures (Rossion & Pourtois, 2004), whereas in the Barry et al. (1997) study, participants saw only written words. Also, different measurement scales were used in the studies: Tsaparina et al. used a 5-point scale, whereas a 7-point scale was used by Barry et al. (1997) and Pind et al., and a 9-point scale was used in the study by Snodgrass and Yuditsky. Different procedures and measurement scales make the results obtained in these studies hard to compare cross-linguistically, since the ratings may depend on both the exact stimulus form and the type of scale used.

The present study

The motivations for our study were both practical and theoretical. First, because of the existence of the AoA effect (viz. the observation that words acquired earlier in life are processed faster than words learned later, as described above), we planned to use AoA ratings as a factor for the construction of cross-linguistic lexical tasks (Haman, Łuniewska, & Pomiechowska, 2015). Second, by performing the AoA study in a uniform way across such a wide range of languages, we aimed to obtain new evidence for the classic claim of a universal pattern in early meaning acquisition among languages (Clark, 1979, 1995, 2001). Clark argued that children’s early words in various languages fall into a small number of the same semantic categories like: people, food, body parts, clothing, animals, vehicles, toys, household objects, routines, and activities or states (Clark, 2009, p. 76). This argument was based on a cross-linguistic speech diary analysis and comparison of its results with the MB-CDI’s list of the first 50 words in American English (Fenson, Reznick, Bates, Thal, & Pethick, 1994). Clark further argued that in the course of lexical development over the second and third years of life, children elaborate the semantic domains by adding new words into and subdividing the domains (Clark, 1995). Although the present study is not limited to children’s early words, about 95% of the words used in the study fall into the categories indicated by Clark. Thus, we assumed that the universality of early semantic categories and the process of their elaboration in child language might be also reflected in the AoA order of similar words across languages.

Therefore, we collected data on subjective AoA ratings in 25 languages to assess how stable the ratings can be cross-linguistically and to check their validity by comparing them between language pairs and against previous AoA scores. We expected the ratings to be correlated between language pairs, and we predicted that the more similar two languages or cultures are, the higher the correlation coefficients would be.

Additionally, we analyzed how the demographic characteristics of participants (their gender, age, education, being a parent or not, and language status) influenced their AoA estimations. We expected that the AoA of the majority of the words would not depend on participants’ age. Some words might have been acquired earlier by younger and later by older participants, according to the availability of the objects or actions depicted by the words when the participants were growing up. Specifically, we predicted that several words labeling new artifacts (e.g., a computer) and more recently introduced activities (e.g., to surf) would be rated as being acquired relatively earlier in life by the younger group and later by the older group. We did not expect the AoA ratings to depend on participants’ education level and gender. However, we did assume that being a parent (having or recently having had small children who were acquiring language) might influence adults’ ability to assess when they themselves had learned the words—that is, their ratings might be affected by fresh experience with their own children.

Because bilingual children typically have smaller vocabulary sizes than their monolingual peers (if measured in one language only), they might acquire some words later than monolinguals (Bialystok, Luk, Peets, & Yang, 2010). We predicted that adults who reported that they spoke more than one language at a level similar to that of native speakers and who began their second language learning in childhood would estimate that they learned words later than monolinguals.

In the present study, we also assessed whether two different target questions, “When did you learn this word?” versus “When do children learn this word?,” would affect ratings for words. As was stated above, children nowadays might learn words for recently introduced objects and activities at a young age, whereas older participants might have been more advanced in age at the time of introduction of the said objects and activities.

Besides comparisons with previous AoA data, we adopted another method of validity estimation, following the study by Lind, Simonsen, Hansen, Holm, and Mevik (2015). We compared our data to the available norms for MB-CDIs in nine languages: American English (Dale & Fenson, 1996), Croatian (Kuvac et al., 2009), Danish (Bleses et al., 2008), German (Szagun, Stumper, & Schramm, 2009), Italian (Camaioni, Caselli, Longobardi, & Volterra, 1991), Mexican Spanish (Dale & Fenson, 1996), Russian (Eliseeva & Vershinina, 2009), Swedish (Eriksson & Berglund, 1999), and Turkish (Aksu-Koç et al., 2009).

For a given pair of data (MB-CDI vs. AoA), the percentage of children who know a given word at a certain age (obtained from the MB-CDI norms) was contrasted with the mean AoA of the same word (obtained in the present AoA study). The higher the proportion of children who were reported to know the word, the lower we expected the AoA for a given word to be. Thus, we expected negative correlations between the MB-CDI norms and the AoA ratings.

Although MB-CDIs are now available in 61 languages (Dale & Penfold, 2011), normative data for single words have so far only been published for six out of the 25 languages included in our sample (Jørgensen, Dale, Bleses, & Fenson, 2009). Thus, in the case of these six languages (Danish, German, Italian, Russian, Swedish, and Turkish), we were able to compare our AoA ratings with the MB-CDI norms in exactly the same language. MB-CDI norms were also available for another three languages that are very close to the ones from our sample. Thus, we compared the AoA ratings in Serbian, Spanish, and both British and South African English to the available MB-CDI norms for Croatian, Mexican Spanish, and American English, respectively. The available MB-CDI norms were either downloaded from the Wordbank (http://wordbank.stanford.edu/; in the case of all Turkish data and the Croatian Words & Sentences part) or the CLEX website (www.cdi-clex.org/; in the case of the remaining data).

There are two versions of the MB-CDI—namely Words & Gestures (adapted mostly for toddlers 8–18 months of age and assessing both word production and comprehension) and Words & Sentences (designed for the assessment of word production only in older children, mostly 16 to 36 months of age ). We used both MB-CDI versions for Danish, Russian, Turkish, American English, Serbian, and Mexican Spanish. Thus, for these languages we analyzed norms obtained from children 8 to 36 months of age. Swedish norms were available only for the Words & Gestures part, and hence only for children 8 to 16 months of age, whereas the German and Italian norms were available only for children 18 to 36 months of age in the Words & Sentences part.

For seven of the nine languages used in the comparisons, the MB-CDI norms included ratings for both receptive and expressive vocabulary. Although in our AoA study participants were asked to estimate when they could understand the word, which explicitly taps receptive vocabulary knowledge, we contrasted our results with both receptive and expressive norms from the MB-CDIs. However, it was expected that the receptive MB-CDI norms would have a stronger relation to our AoA results than would the expressive MB-CDI norms.

Method

Participants

The participants were 827 adults, a minimum of 20 per language (total range: 20 to 124, M = 31, SD = 21; see Table 4). The data from 31 participants were excluded from the analyses for reasons described in detail in the Data Processing section below. The participants whose data were included in subsequent analyses were 622 females (78%) and 174 males, 18 to 80 years of age (M = 30.8, SD = 12.3). Participants were recruited in a variety of ways: mostly via academic communication (lecturers informing students about the study) or by social media (e.g., Facebook), but also through neighborhood networks and chain-referral sampling. Participants received certificates of participation on request, and those for some languages also received course credits. All participants reported their education level, occupation, country of residence, native language, numbers of spoken and used languages, and the number and age of their children.

Table 4 Characteristics of the participants included in the analysis, per language

Twenty-three participants recruited in the ways described above took part in the control study, in which the target question was replaced with the one concerning word knowledge in children. They were all Polish native speakers (17 female, six male; age: M = 38.6, SD = 10.7). None of these participants participated in the study where the main question (“When did you learn the word?”) was used.

Stimuli

The same sets of 158 nouns and 141 verbs (total of 299 words) were used in each language. The words had been selected in a previous online picture-naming study (Haman, Łuniewska, & Pomiechowska, 2015; Haman, SzewczykMieszkowska, et al., in preaparation) conducted in 34 languages, including each of the languages considered in the present study. Since the words were selected on the basis of the picture-naming study, they labeled imageable objects and actions.

In the naming study, 93 competent raters (native speakers of 34 different languages) named 1,024 pictures (507 object and 517 action pictures). Each participant first assessed whether the pictures easily evoked a single word in his or her native language. The rater then provided words in her or his native languages for the objects and actions presented in the pictures, and then typed the English equivalents of these words. Additionally, for purposes not linked to the present study, participants provided ratings of the picture style. All pictures in the naming study had previously been used in various psycholinguistic studies (with both children of various ages and with adults) in a total of 15 languages. They were gathered from eight sources, representing different picture styles (line drawings, photos, color drawings, etc.).

The data from 76 raters who completed more than 25% of the procedure were used to select the most widely shared meanings. Haman and colleagues (Haman, Łuniewska, & Pomiechowska, 2015; Haman, SzewczykMieszkowska, et al., in preaparation) selected words on the basis of the highest agreement of naming (computed on the English translations). The pictures illustrating the selected words had thus been assessed by the majority of the judges across languages as easily evoking one word or several words similar in meaning. The words for objects and actions were selected separately. This procedure, together with the AoA ratings, was initially designed as a basis for the construction of the LITMUS Cross-Linguistic Lexical Tasks for the assessment of word knowledge in bilingual and multilingual children (see Haman, Łuniewska, & Pomiechowska, 2015).

25 language versions of the online procedure

Lists of target words for each language were obtained as described above. In each language, the list of target words consisted of the labels provided by native speakers of this language during the naming study (Haman, Łuniewska, & Pomiechowska, 2015; Haman, SzewczykMieszkowska, et al., in preaparation).

Instructions for the present study and all other information were first prepared in English. However, in order to avoid inconsistencies, collaborators speaking all languages involved were consulted at the stage of preparing the English version, and again while the target language versions were being prepared. Thus, adaptations of the procedure and the instructions for languages other than English were not mere translations of the English version; rather, they were pre-prepared during the first stage of study design. After preparing the model English version, all materials (the website, instructions, examples, etc.) were translated into each of the languages involved by native speakers who were also researchers (linguists or psycholinguists, mostly coauthors of the present article).

Procedure

The procedure was available online via a website designed exclusively for the purposes of the study (www.words-psych.org). The website was made available in all 25 languages, so participants could use their native language exclusively while using the website. After entering the website, participants were instructed to download a file and open it in Microsoft Excel (or Open Office). The file contained four sheets. The first sheet presented basic information about the study and the instructions, and the second sheet contained questions on the demographics of the participants. The lists of nouns and verbs were presented on the third and fourth sheets, respectively. All of the instructions, questions, and words were presented in the mother tongue of the participants.

Participants were asked to decide at what age they had learned the words presented in the two sheets. The instruction was: “For each word please estimate the age (in years) at which you think you learned this word; that is, the age at which you would have understood that word if somebody had used it in front of you, even if you did not use, read or write it at the time.” The exact form of the question was: “When did you learn the word?” Participants were asked to type a number from 1 (if they thought they had learned the word when they were one year old) to 18 (if they thought they had learned the word when they were 18 or older). They were encouraged to guess the age if they were not sure and not to spend too much time on any single word. If they did not know the word, they were asked to enter “X” in the box. Both the instruction and the target question used in the present study closely matched those used in Kuperman, Stadthagen-Gonzalez, and Brysbaert (2012), who in turn followed the instructions proposed by Stadthagen-Gonzalez and Davis (2006). Although many studies have used Likert scales rather than a continuous scale (from 1 to 18 or up to the participants’ current age), we decided to use the latter one, following the remark of Kuperman et al. (2012) that the “[Likert-like scale] artificially restricts the response range and is also more difficult for participants to use” (p. 980). Also, Ghyselinck et al. (2003) stated that using a continuous scale makes the instructions given to participants as simple as possible.

To ensure that the participants understood the instructions, we provided four examples of both nouns and verbs acquired early and later in life. The examples were presented in a table that looked similar to the one filled out by the participants. Explanatory comments were added to the table (e.g., “Someone estimates that s/he learned the word ‘to ask’ at the age of 3 years.”).

The words on both the noun and the verb list were presented in a random order, generated individually for each participant during the file downloading. On the Nouns and Verbs sheets, below the list of words, a short thank-you note was presented, together with a reminder of the other sheet (“Thank you for filling in the table for nouns. Have you filled in the table for verbs as well?”). Each participant was given the full list of all 299 words. Task duration was about half an hour. After filling in the file, participants were asked to upload it via the website or to send it as an e-mail attachment to the address reserved for the purposes of the study.

For two out of the 25 languages, Hebrew and Luxembourgish, a paper-and-pencil version of the procedure was applied. In these two languages, the files were downloaded from the website by an experimenter, then printed and distributed among the participants. The instructions and organization of the sheets were identical to those aspects in the online procedure. The only reason for running the study offline for these two languages was difficulty with recruitment for online participation.

In the control study that addressed whether the question form affected the ratings, the procedure was the same as that described above. The only modified factors were the target question form (“When do children learn this word?” instead of “When did you learn the word?”) and the descriptions of the examples (“Someone estimates that children learn the word ‘to ask’ at the age of 3 years.”). The control study was run only in Polish in an across-subjects design. Participants of the control study did not participate in the main study, because this could have affected the Polish ratings in both designs.

Data processing

In the first step of data processing, we excluded 1ådata from any respondent who did not follow the procedure of ratings collection. Data from 16 respondents were excluded because the participants reported that they were not native speakers of the language in which they completed the survey. Additionally, we removed the data from nine respondents who did not provide demographic information, and from six who assessed less than 50% of the 299 words. Altogether, the data of 31 respondents (3.8%) were removed from the database. Most of the remaining participants (84%) assessed more than 95% of the the words. Only 2% of the participants provided estimations for less than 75% of the words. Participants who did not provide data for all items skipped some of the words in the file by leaving those lines blank. The blank lines were located in various parts of the files and were equally distributed across the items.

The second step aimed at removing all outliers from further analyses. We defined outliers as disproportionally high or low values for both the word and the participant in a given language. We excluded ratings meeting both of the following two criteria: (1) being three SDs higher (or lower) than the mean for that word in a given language, and (2) being three SDs higher (or lower) than the average estimation provided by a given participant inside a word class. Thus, to be an outlier, a single estimation of AoA of a particular word had to be both very late in comparison to other words learned by that participant and very late in comparison with the average AoA of that word in the same language. In this step, we removed 137 of the 125,879 ratings for nouns, and 110 of the 113,174 ratings for verbs (both about 1%).

Although the instruction allowed participants to type “X” if they did not know a given word, there were no “X” answers. Thus, we did not include this type of response in the analysis.

Results

Descriptive results

The ratings obtained for each of the 25 languages are presented in the supplemental materials. All of the words in the set were reported to be acquired between 1 and 12 years of age, and 98% of the words were assessed as being known to children younger than 7 years.

Cross-linguistic comparison

The AoA ratings in all languages were significantly correlated (Spearman’s rho, adjusted for split-half reliabilities, ranged from .60 to .96; Table 5). The highest correlations were obtained for Polish and Slovak (adjusted r S = .96), Maltese and Greek (adjusted r S = .93), and British and South African English (adjusted r S = .91). The adjusted coefficients were the lowest for Hungarian correlated with Italian (adjusted r S = .62), Irish (adjusted r S = .64), and Hebrew (adjusted r S = .65); see Fig. 1.

Table 5 Matrix of adjusted correlations of all languages with split-half reliabilities per language
Fig. 1
figure 1

Highest (upper row) and lowest (lower row) correlations in language pairs

Although the orders of word acquisition were similar across all of the languages studied, we found significant differences in the raw ratings of words between languages (see Fig. 2). Most of the words from our list were acquired between 2 and 8 years old, and the vast majority of them are reported to have be learned between 3 and 5 years. However, there are three evident exceptions among the languages: (1) Finnish, in which words were reported to be acquired earlier than in the other languages, and the majority of the words were acquired by the age of 4 years, and (2) Maltese and isiXhosa, in which words were reported to be acquired relatively later.

Fig. 2
figure 2

Means for age-of-acquisition ratings across 25 languages. The dots represent words that are outliers. The horizontal line shows the overall mean for all languages. AF = Afrikaans, CA = Catalan, DA = Danish, EL = Greek, EN = British English, ES = Spanish, FI = Finnish, GA = Irish, HE = Hebrew, HU = Hungarian, IS = Icelandic, IT = Italian, LB = Luxembourgish, LT = Lithuanian, MT = Maltese, NL = Dutch, PL = Polish, RU = Russian, SAE = South African English, SK = Slovak, SR = Serbian, SV = Swedish, TR = Turkish, XH = isiXhosa

Target questions

To account for possible differences in the results due to the forms of the target questions, we conducted a control study in which 23 Polish participants answered the modified target question (i.e., (1) “When did you learn this word?” was replaced with (2) “When do children learn this word?”). Their AoA ratings were compared to those of the 32 Polish speakers who answered the original question. The groups differed in age (M 1 = 38.61, SD 1 = 10.65; M 2 = 24.94, SD 2 = 7.28; t = 6.10, p < .001) and years of education (M 1 = 17.09, SD 1 = 2.09; M 2 = 13.91, SD 2 = 2.33; t = 5.21, p < .001), but not in gender [χ 2(1, N = 55) = 0.09, p = .77], parenting [χ 2(1, N = 55) = 0.26, p = .61], or number of known languages [χ 2(1, N = 55) = 0.01, p = .93].

The results showed that although the two sets of ratings are strongly correlated (r S = .93, p < .001), they differ significantly in terms of absolute numbers (see Fig. 3). It appears that participants reporting their own experience in word learning provided significantly higher AoA ratings than did those assessing when children acquire the words (M 1 = 3.84, SD 1 = 1.0; M 2 = 3.34, SD 2 = 0.95; t = 6.09, p < .001). This trend was observed for 92% of the words (see Fig. 3).

Fig. 3
figure 3

Relation between two different target questions (Polish control study)

Reliability of the data

To check the reliability of participants’ ratings, we randomly divided participants into two groups. The correlations in the AoA ratings between the groups were very high and were significant for both nouns [r S(156) = .99, p < .001] and verbs [r S(139) = .99, p < .001].

This procedure was repeated to calculate the split-half reliability coefficients per language. The coefficients were, in general, very high (Table 5). For 22 out of the 25 languages, the coefficients were higher than .90. The only coefficients lower than .85 were obtained for isiXhosa [r S(297) = .68, p < .001], Maltese [r S(295) = .75, p < .001], and Irish [r S(295) = .78, p < .001].

AoA ratings versus demographic variables: Gender

We compared the estimations provided by all male participants (N = 168) to those provided by female participants matched to them by age (M 1 = 30.64, SD 1 = 12.43; M 2 = 31.17, SD 2 = 12.12; t = 0.49, p = .69), education level (M 1 = 15.30, SD 1 = 4.64; M 2 = 15.24, SD 2 = 4.78; t = 0.35, p = .94), and first language. We found no significant difference in the mean ratings provided by men and women (M 1 = 4.18, SD 1 = 1.13; M 2 = 3.96, SD 2 = 1.06; t = 0.95, p = .06).

AoA ratings versus demographic variables: Age

As we assumed, there was no significant correlation between participants’ ages and the average AoA ratings for words [r(771) = –.07, p = .07]. To validate our prediction about differences in AoA for particular words, we compared the estimations given by the youngest (18–20 years old, M = 19.3, SD = 0.7, N = 180, 151 females) to those given by the oldest participants (40–80 years old, M = 52.2, SD = 8.5, N = 140, 102 females). The results (Table 6, Fig. 4) validated our hypothesis, although the orders of word acquisition were similar in the two groups [r S(297) = .89, p < .001].

Table 6 List of 19 words with significantly different age-of-acquisition ratings between the youngest and oldest participants
Fig. 4
figure 4

Age-of-acquisition estimations in different age groups

AoA ratings versus demographical variables: Education

No relationship was found between the estimated AoA of words and participants’ education measured in years [r(771) = –.05, p = .16].

AoA ratings versus demographic variables: Parenting

To check whether being a parent affects AoA ratings, we selected 119 participants who reported that they had at least one child younger than 10 years of age (i.e., their youngest child had to be maximally 10 years old). We chose this criterion to include only participants who had relatively recent memories of their children acquiring vocabulary. This group of parents was compared to a control group of participants speaking the same language who were matched in age (M 1 = 36.11, SD 1 = 6.83; M 2 = 36.36, SD 2 = 10.36; t = –0.22, p = .82), education (M 1 = 16.29, SD 1 = 4.53; M 2 = 16.16, SD 2 = 4.63; t = 0.21, p = .83), and gender [χ 2(1, N = 238) = 1.68, p = .38]. In the control group, 32 participants reported that they had children between 11 and 32 years of age, and the remaining 87 participants did not have children.

It emerged that the parents of children in preschool and in the early school years judged that they had learned the target words earlier than did the control group. They reported acquiring 294 out of the 299 words (99%) earlier than the control group, and the mean rating provided by parents was significantly lower than that provided by nonparents (M 1 = 3.41, SD 1 = 1.21; M 2 = 3.94, SD 2 = 1.15; t = –3.44, p < .001). However, the orders of word acquisition were almost exactly the same in both groups [r S(297) = .98, p < .001, see Fig. 5].

Fig. 5
figure 5

Age-of-acquisition estimations from people with and without children younger than 10 years of age

AoA ratings versus demographic variables: Participants’ languages

When asked about their language skills, 376 participants (49%) reported that they could speak one language at native-like level, 293 (38%) two languages, and 90 (12%) three languages. Nine people reported that they spoke four or more languages at a native level, and five did not answer this question. To check whether the number of languages spoken affected the estimations of AoA in the first language, we divided the participants into groups: those speaking one language and those speaking two or three languages.

The groups of monolinguals and bi- or trilinguals did not differ in terms of age (M 1 = 29.0, SD 1 = 11.7; M 2 = 30.6, SD 2 = 12.9; t = –1.85, p = .06) and education (M 1 = 15.4, SD 1 = 3.9; M 2 = 15.2, SD 2 = 4.0; t = –0.76, p = .45). However, multilingual participants systematically reported that they had acquired words later than monolinguals: They estimated a higher AoA for 288 words (96%). The difference in mean ratings by the two groups was significant (M 1 = 3.72, SD 1 = 0.97; M 2 = 4.05, SD 2 = 0.98; t = –4.19, p < .001). Again, the results of the two groups were highly correlated [r S(297) = .98, p < .001].

Correlations with previous AoA data

In order to assess their validity, the AoA ratings were compared with previous AoA norms. From all of the AoA norms available that were mentioned in the introduction, we selected the ones that contained at least 30 words from our sample collected in the same languages. Thus, we correlated our data with previous norms for Dutch, English, German, Greek, Icelandic, Italian, Russian, Spanish, and Turkish (Table 7).

Table 7 Correlation coefficients (Pearson’s r) between our age-of-acquisition (AoA) ratings and previous data

The coefficients were calculated separately for nouns and for verbs. Our ratings were significantly correlated with previous data in the same and in very closely related languages (American and British English, European and Mexican Spanish). We obtained significant and high correlations with existing AoA norms that included both subjective and objective AoA estimations. Correlations with objective AoA (eight studies, range = .44–.63, M = .56) were slightly lower than those with the subjective ratings (33 studies, range = .29–.92, M = .75). We found no single study with AoA norms available for which the correlation with our AoA results was not significant.

Correlations with MB-CDI data

For a given pair of data (MB-CDI vs. AoA), a percentage of children who knew a given word at a certain age (obtained from the MB-CDI norms for that language) was contrasted with the mean AoA of the same word (obtained in the present AoA study). As predicted, a consistent pattern of significant (negative) correlations was found for all data pairs, although in two languages the correlations were significant in some age groups only. Table 8 presents exact values of the coefficients. All correlations for receptive vocabulary ratings were significant, and they were mostly moderate correlations (r: range = –.18 to –.59, M = –.43). For expressive vocabulary, correlations were in general slightly weaker (r: range = .10 to –.68, M = –.39). The only nonsignificant correlations were obtained for the expressive scores of the youngest age groups (children younger than 10 months) and of some older age groups of Spanish and Turkish speakers (Spanish: 8 to15 months, Turkish: 8 to 13 months).

Table 8 Correlations (Pearson’s r) between AoA ratings and MB-CDI norms for receptive and expressive word knowledge

Discussion

In the present study, we presented a new set of subjective AoA ratings for 299 words in 25 languages from five different language families. The ratings are highly reliable in terms of internal consistency, and their validity was confirmed in comparisons with data from previous studies. The presented ratings suggest that, although the languages differ in terms of the absolute AoA of words (as reported by adults), the orders of word learning are very similar across all languages studied in the age range from 0 to 6 years. The latter finding may indirectly support the statement about a universal pattern of early meaning acquisition among languages (Clark, 1979, 1995, 2009). The former effect (differences in the absolute numbers obtained for AoA in different languages; see Fig. 2) may be due to various factors not controlled for in the present study (e.g., cultural biases related to different cultural views of language development).Footnote 1 However, such post-hoc explanations are of a speculative nature, and more cross-linguistic studies assessing objective AoA would be needed to confirm the universality of word order acquisition and/or the cross-linguistic differences in the exact ages when particular words are acquired.

The present article describes the first study in which AoA ratings were obtained for such a wide range of languages with the use of identical procedures. The obtained ratings suggest that the words included in the study are all acquired early—mostly in the first 7 years of life—in all languages considered. Thus, the ratings obtained in the present study constitute close to a fully comparable database of words across languages, because of both the standardization of the procedures across the languages and the similarity of the results. Thus, the ratings may be used as a measure of “word difficulty” in cross-linguistic studies on word learning or processing by preschool children. The ratings may also be applied in the adaptation of experiments from one language to another, because this process often needs to control for word AoA across languages.

Our analysis also has methodological implications for future AoA studies. It reveals that the target question used widely for obtaining subjective AoA ratings (“When did you learn the word?”) may in fact lead participants to an overestimation of AoA. Changing the question to the one concerning word acquisition in children (“When do children learn this word?”), as well as analysis of the responses of parent participants, indicate that existing AoA ratings may yield an overly conservative AoA. Both parents answering the traditional AoA question and participants answering the question about children learning words provided significantly lower AoA estimations.

In contrast to Kuperman et al. (2012), who reported women giving slightly higher estimations of AoA, we found no gender difference in AoA ratings. Comparison of the answers of polarized age groups showed that, in general, AoA estimations are independent of age. This does not support the results reported by Kuperman and colleagues, who found a marginal but significant (r = .07) correlation between participants’ age and the AoA ratings that they provided. However, this incongruence may have been affected by the specificity of the word list we applied. The reason for the difference between Kuperman et al.’s and our findings may lie in the type of stimuli used: We used a set of relatively simple words labeling imageable objects or actions, which were acquired early in life. Thus, Kuperman et al.’s explanation of the age differences—that older participants gave higher estimations because they had a broader age range to choose from—is not directly applicable to our data set.

Although, in general, the presented AoA ratings do not depend on participants’ ages, the exact AoAs of some words may differ between younger and older adults. In particular, the labels of the most modern objects and activities (e.g., new-tech tools) were estimated to be acquired by older people at later stages in their lives, which replicates the results of Bird et al. (2001). Thus, similarly to Cuetos et al. (2012), we suggest that for studies of AoA effects in older participants, appropriate norms should be used rather than those based on estimations obtained from young adults.

As was the case in the results of Kuperman et al. (2012), we did not find any correlation between the education level of the participants and the ratings that they provided. However, in contrast to the study by Kuperman et al., in the present study this result was expected, because the stimuli consisted of simple words typically acquired by toddlers or preschoolers.

Particularly noteworthy was the finding that AoA estimations depended on the number of languages spoken by the participants: The more languages the participants spoke at a native-like level, the higher the AoA they provided. This result is in line with known patterns of lexical development in bilinguals, who may learn some words later than their monolingual peers (Bialystok et al., 2010).

Finally, the correlations with previous subjective and objective AoA ratings, as well as with the MB-CDI norms, validate the present norms in the cases of all languages for which any previous AoA norms or MB-CDI norms are available.

Study limitations

In the present study, we aimed to collect AoA ratings for a wide range of languages. Because we based our AoA ratings on a set of words selected according to the criterion of sharing meanings across the languages (Haman, SzewczykMieszkowska, et al., in preaparation), nontranslatable words were not included in our word lists. This criterion significantly reduced the number of possible items to only 158 nouns and 141 verbs out of more than 1,000 words. Thus, the number of words used in the present study was limited, especially in comparison to the four most extensive word sets used by Kuperman et al. (2012) and Brysbaert et al. (2014) (30,000), Alonso et al. (2015: 7,149), and Moors et al. (2013: 4,300). However, most AoA studies have used smaller numbers of words, with the average number of items being around 450, and the median number of items being about 220 (estimated for 60 publications that included ratings for AoA). Given that the words were selected to be translatable across languages, our data set does not contain any items specific for some of the languages and cultures, even those that were included in the naming study by Haman, SzewczykMieszkowska, et al. (in preaparation).

The AoA ratings presented in the present article suggest that all of the words included in our set are typically acquired by the age of 7 years. This makes them all “early words,” from the point of view of mature speakers, and limits the usability of the present data set in studies of AoA effects in adults. However, the ratings are still appropriate for experiments concerning AoA effects in children in different languages.

Conclusions

The present study has provided AoA ratings for 158 nouns and 141 verbs in 25 languages. All 299 words were judged as being acquired early in life, mostly at preschool age. This, together with the high validity of the ratings, leads to the conclusion that this article presents a fully comparable database of subjective AoA of 299 words in 25 languages. The database may be useful for a wide range of studies, with both single-language and cross-linguistic designs, in which controlling for stimulus word parameters is required.