Age of acquisition (AoA) refers to the age at which a word was learned. AoA can be assessed by objective or subjective measures. Objective measures refer to methods in which the word knowledge of children of different ages is objectively determined (e.g., through analysis of natural conversations or tasks such as picture naming; Gilhooly & Gilhooly, 1980; Pérez & Navalón, 2005). Subjective methods mostly rely on adults’ retrospective judgments about when different words were acquired (e.g., Alonso, Fernandez, & Díez, 2015; Cortese & Khanna, 2008; Marques, Fonseca, Morais, & Pinto, 2007). Substantial evidence from studies in different languages has shown that objective and subjective measures of AoA are strongly correlated (e.g., Morrison, Chappell, & Ellis, 1997; Pérez & Navalón, 2005). The present study concerns itself with subjective AoA ratings.

Research has suggested that AoA impacts performance on a variety of lexical-processing tasks, such as word reading and picture naming. Overall, the findings have shown that performance is more accurate or faster for words acquired early in life than for those acquired later (e.g., Barry, Morrison, & Ellis, 1997; Bonin, Chalard, Méot, & Fayol, 2002; Bonin, Fayol, & Chalard, 2001; Cuetos & Barbón, 2006; Gilhooly & Logie, 1980; Morrison & Ellis, 1995; Raman, 2006). Reliable AoA effects have been reported with older adults in object naming (e.g., Hodgson & Ellis, 1998; Morrison, Hirsh, & Duggan, 2003) and word reading (e.g., Morrison et al., 2003). AoA has also been shown to be a reliable predictor of lexical processing in patient groups such as aphasics (e.g., Cuetos, Aguado, Izura, & Ellis, 2002; Nickels & Howard, 1995) and those with Alzheimer’s disease (e.g., Kremin et al., 2001). Recent work has shown that AoA may even affect the retrieval of autobiographical memories; Morrison and Conway (2010) found that the AoA of the cue words used to elicit autobiographical memories predicted the ages of the participants at the time of the memories.

Although research evidence has tended to show that AoA plays a role in these cognitive tasks (e.g., Carroll & White, 1973b; Cuetos, Barbón, Urrutia, & Domínguez, 2009; Morrison & Ellis, 2000; Morrison, Hirsh, Chappell, & Ellis, 2002), it has also been argued that it is difficult to untangle the effects of AoA on skilled performance from those of other lexical variables, such as concreteness, imageability, word length, familiarity, and especially word frequency (e.g., Zevin & Seidenberg, 2002). Johnston and Barry (2006) noted that identifying the relative effects of AoA and word frequency on naming and reading performance has been one of the fundamental conceptual issues. In line with Carroll and White’s (1973a) initial finding that AoA rather than word frequency is the variable that affects lexical processing, a large number of studies have reported similar findings (e.g., Gerhand & Barry, 1998; Izura et al., 2011; Morrison & Ellis, 1995). In addition, some studies have reported independent effects of word frequency and AoA (e.g., Cortese & Khanna, 2007; A. W. Ellis & Morrison, 1998), and others have reported an interaction between the two in which AoA had a stronger effect on lexical-decision tasks for low-frequency than for high-frequency words (e.g., Bonin et al., 2002).

In contrast, Zevin and Seidenberg (2002, 2004) argued that the existing static frequency norms, such as those of Kučera and Francis (1967), are not adequate to address frequency effects. They suggested that the proper measure of word frequency is cumulative word frequency, which refers to the total number of times an individual is exposed to a given word across his/her lifespan. Zevin and Seidenberg (2002) found in their simulation work that AoA effects on reading disappeared when cumulative frequency was used as the measure of word frequency. One prediction based on this cumulative-frequency hypothesis is that AoA effects on processing should diminish or disappear for older participants. This prediction, however, was not supported in tasks including picture naming (Morrison et al., 2003), word reading (Morrison et al., 2002), object naming (Barry, Johnston, & Wood, 2006), and brand-name recognition (A. W. Ellis, Holmes, & Wright, 2010).

Although there is agreement on the validity of Zevin and Seidenberg’s criticism regarding word frequency norms (De Deyne & Storms, 2007; Johnston & Barry, 2006), empirical studies have indicated that AoA continues to exert an effect on lexical processing even when word frequency is properly measured and controlled (e.g., Brysbaert & Cortese, 2011; De Deyne & Storms, 2007; Pérez, 2007). It should be noted that similar issues exist for the contributions of other lexical variables, such as imageability, in relation to AoA in tasks such as reading aloud and lexical decision tasks (e.g., Schock, Cortese, & Khanna, 2012; Yap & Balota, 2009).

As is illustrated by the AoA–word frequency controversy, well-constructed, comprehensive, and up-to-date norms, possibly for multiple variables that can be used in conjunction, are essential for studies that aim to address the variables underlying various cognitive processes. Given the well-established role that AoA plays on a variety of cognitive tasks (e.g., Carroll & White, 1973a; Johnston & Barry, 2006; Juhasz, 2005; Morrison & Conway, 2010; Schock et al., 2012), as well as its relationships with other lexical variables, the main aim of the present study was to provide subjective AoA norms in Turkish, a language for which limited normative data exist.

AoA norms in Turkish

Turkish is spoken by approximately 200 million people worldwide (Sarı, 2011), and a substantial amount of work using materials in Turkish has addressed cognitive processes ranging from picture and word naming to the lifespan retrieval of autobiographical memories. However, normative studies on verbal materials have been sparse (e.g., Aksan et al., 2012; Göz, 2003; Raman, Raman, & Mertan, 2014; Tekcan & Göz, 2005). Göz (2003) constructed the first large-scale word frequency database of written Turkish, based on one million words. The Tekcan and Göz norms provide concreteness and imageability ratings as well as association sets for 600 of the words selected from Göz (2003). These norms have been widely used by researchers using Turkish materials (e.g., Booth, 2014; Booth, Mackintosh, Mobini, Oztop, & Nunn, 2014; Ilgaz, Altun, & Aşkar, 2014; Tekcan et al., 2015). In addition to these norms, a few databases are more frequently used in linguistics research, such as the Turkish National Corpus (Aksan et al., 2012) and Turkish Electronic Living Lexicon (Inkelas, Küntay, Orgun, & Sprouse, 2000).

In Turkish, Raman (2002, 2006, 2011) carried out substantial work investigating the effects of AoA on lexical and picture processing, and the existing published sources of subjective AoA ratings from adults in Turkish have come from her studies. Raman (2006) collected subjective AoA and other lexical-variable (e.g., frequency) ratings for 433 Turkish words from 50 individuals; in the published study, however, she reported AoA ratings for only 25 early-acquired and 25 late-acquired words. More recently, Raman, Raman, and Mertan (2014) provided AoA ratings (along with other lexical variables) for color versions of the 260 Snodgrass and Vanderwart (1980) pictures (Rossion & Pourtois, 2004).

The present study

This is the first large-scale AoA norming study in Turkish, in terms of the numbers of both words and participants. It reports subjective AoA norms for 600 Turkish words obtained from four age groups: children, young adults, middle-aged adults, and older adults. It also provides concreteness ratings, imageability ratings, and numbers of syllables and letters, as well as the frequencies for these words. As such, the potential contributions of this study can be summarized as follows: First, we report AoA norms based on four age groups, which would allow for the use of age-appropriate AoA ratings in the selection of verbal materials. Research has clearly shown that AoA plays a role in cognitive processing across the lifespan. Several studies have shown AoA effects in children’s reading (e.g., Brysbaert, 1996), word recognition (e.g., Garlock, Walley, & Metsala, 2001), and word perception performance (e.g., Nazir, Decoppet, & Aghababian, 2003). AoA has also been shown to affect picture naming (Hodgson & Ellis, 1998), object naming, and word reading (Morrison et al., 2002) in older adults. In addition, AoA contributes to performance on tasks that differ between young and older adults, such as lexical decision and semantic categorization (De Deyne & Storms, 2007). Therefore, although AoA effects have been studied with different age groups, most of the existing AoA norms in different languages have come from student/young adult samples (e.g., Carroll & White, 1973a; Marques, Fonseca, Morais, & Pinto, 2007; Salmon, McMullen, & Filliter, 2010). Even when the samples have large age ranges (e.g., 18 to 50 years in Carroll & White, 1973a, or 19–56 in Tsaparina, Bonin, & Méot, 2011), it has not been possible to report and compare the AoAs for different age groups due to the small numbers of participants in different age groups. The few studies addressing age effects on AoA have shown relatively small age differences across adult age groups in subjective AoA ratings in English, German, Dutch, and Canadian French. For instance, both Birchenough, Davies, and Connelly (in press) and Kuperman, Stadthagen-Gonzalez, and Brysbaert (2012) found very small but statistically significant positive correlations between participant age and AoA ratings (rs = .16 and .07, respectively). De Deyne and Storms (2007) found that adults 51–56 years of age reported higher AoA ratings than did young adults of 17–20. In a study comparing three age groups, Sirois, Kremin, and Cohen (2006) found that young adults reported lower AoA ratings (3.08) than did middle-aged (3.38) or older (3.66) adults. Similarly, Gathercole and Adams (1995; cited in Morrison, Chappell, & Ellis, 1997) found that parents’ AoA ratings were 16 months later than those of college students. On the other hand, Schröder, Gemballa, Ruppin, and Wartenburger (2012) reported no age effect in AoA between young (20–40 years) and older (41–70 years) participants. Furthermore, Walley and Metsala (1992) found that 5-year-old children’s AoA ratings were not different from those of adults. Therefore, there are very limited data with regard to child–adult comparisons, and the few comparisons across adult groups seem to report small age differences, with older adults reporting higher AoA ratings than young adults. Research has shown that such age differences in AoA emerge not only for recent words, but for common words as well, suggesting that they are a more general phenomenon (De Deyne & Storms, 2007). Such findings clearly underscore the need for age-appropriate AoA norms (De Deyne & Storms, 2007; Hodgson & Ellis, 1998). We note that, with the exception of Sirois et al., studies addressing age differences in AoA have been limited to two-group (typically young vs. older adult) comparisons, and therefore were not suitable for drawing clear conclusions regarding changes across the lifespan. In the present study, we aimed to provide AoA ratings from four different age groups, which would better represent the lifespan than have earlier studies.

Second, we report ratings/norms for a number of other important lexical variables associated with AoA. This is important, because studies on AoA effects have been criticized for not incorporating relevant lexical variables that correlate with AoA (Johnston & Barry, 2006; Juhasz, 2005; Zevin, & Seidenberg, 2002). In the present study, we provide data regarding word frequency, concreteness, imageability, and word length as measured by numbers of syllables and letters. Thus, the present norms provide an opportunity to incorporate a number of lexical variables in studies employing regression-based or factorial studies.

Third, Turkish is not just another language with regard to AoA effects; it is a transparent one. As was noted by Raman (2006), although other languages, such as Dutch and Spanish, are transparent to some degree, they also contain inconsistent sound-to-print mappings. Turkish, on the other hand, is completely transparent. This makes Turkish a valuable testing ground for the arbitrary-mapping hypothesis (e.g., A. W. Ellis & Lambon Ralph, 2000; Lambon Ralph & Ehsan, 2006; Zevin & Seidenberg, 2002), which argues that AoA effects in word naming should appear in (opaque) languages with inconsistent print-to-sound mapping, but should diminish in transparent languages. Although some findings have supported this hypothesis (e.g., Izura, 2003; Monaghan & Ellis, 2002), substantial empirical evidence has shown that robust AoA effects emerge in transparent languages such as Italian (Bates, Burani, D’Amico, & Barca, 2001) and Turkish (Raman, 2006, 2011). Given the findings showing an interaction between AoA, word frequency, and print-to-sound consistency (Monaghan & Ellis, 2002), the present norms may provide an important tool for using other lexical variables, along with AoA norms, in Turkish as a transparent language.

An additional characteristic of the present study is that written word frequency data are reported alongside AoA ratings. As we will discuss in more detail later, the word frequency data we report here come from the first written word frequency database in Turkish (Göz, 2003), which is based on relatively recent materials, published between 1995 and 2000. As was reported by Balota, Cortese, Sergent-Marshall, Spieler, and Yap (2004), more recent frequency norms have shown stronger word frequency effects, resulting in a better estimate of any AoA effect. This represents another strength of the present norms.

Thus, in the present study we report subjective AoA norms for 600 Turkish words, based on four age groups. We also provide word frequency, concreteness and imageability ratings, and word length (in numbers of syllables and letters).

Method

Participants

Our participants (n = 457) comprised 115 12-year-old children (48 % female), 100 young adults (72 % female) 20–26 years of age (M = 21.8, SD = 1.12), 115 middle-aged adults (53 % female) 40–45 years of age (M = 42.61, SD = 1.71), and 127 older adults (43 % female) 60–87 years of age (M = 66.76, SD = 6.64). The children were students in the same grade of primary education. The young adults were university students and participated in the study in exchange for extra course credit. The adults in the two remaining groups were recruited through convenience sampling and were not given any financial compensation. Seventy percent of the middle-aged adults and 45 % of the older adults had at least a high school education. All older adults were living in their own homes.

Materials

The 600 words we used in the present study were the 600 words used in Turkish Word Norms (Tekcan & Göz, 2005), which reported concreteness and imageability ratings along with word frequency in written Turkish. In that study, each of the 600 words was rated by 100 university students on a scale from 1 to 7, separately for imagery and concreteness, where higher ratings indicated stronger imagery and concreteness. These words were originally selected from the Word Frequency Dictionary of Written Turkish (Göz, 2003), which is a frequency dictionary of written Turkish based on 1,016,000 words obtained from a variety of written sources, published between 1995 and 2000. The general methodology and types of written sources were based on Kučera and Francis (1967). Words were selected so that there were equal numbers (n = 200) of low-, medium-, and high-frequency words (frequencies of ≤20, 50–99, and ≥100 per million, respectively). Fifty-one percent of the words were concrete, and 49 % were abstract. The majority of the words were nouns (88 %), and the remainder were adjectives (12 %).

Three different booklets of word lists, each containing 200 randomly selected words, were formed. The order of the words in each list was fixed. Participants were asked to select the age at which they believed they had learned the word by circling one of the following seven options: 2–3, 3–4, 4–5, 5–6, 7–9, 10–11, and 12+. The first six categories are the same as those of Carroll and White (1973b). Because our sample of children were 12 years old, our last category was 12+ (as opposed to Carroll & White’s, 1973b, 14+). Participants were asked to choose an “I don’t know” option for words they did not know.

Procedure

The data from children and young adults were collected in groups of five to ten, whereas the data from the two remaining adult groups were collected individually. Each participant filled out all three booklets with at least one day between sessions (e.g., Brysbaert & Cortese, 2011; Gilhooly & Logie, 1980; Pérez, 2007). A Latin-square design was used for the administration of the three booklets.

Participants were told that this was a study on the characteristics of words and that there were no correct or incorrect answers, and they were given an instruction sheet. The instructions stated that the participants’ task was to estimate the age at which they had learned each word, to the best of their ability. Each session took approximately 30 min.

Results

The means and standard deviations of the AoA estimates for the 600 words for the four age groups are presented in the supplemental materials, along with word frequency, concreteness and imageability ratings, and word length (numbers of syllables and letters). The words are given in alphabetical order in Turkish, along with their English translations.

In this Results section, we present analyses regarding the reliability and validity of the AoA ratings, their relationships with other linguistic variables, and the relationship between age and the AoA ratings.

Reliability

As a measure of reliability commonly used in norming studies (e.g., Pérez & Navalón, 2005), we calculated the internal consistency (Cronbach’s α), measuring the degree to which participants agreed on the ratings for different items. As we mentioned in the Method section, each participant filled out three separate forms, each containing 200 words. We calculated intraclass correlations for each form separately for three of the four age groups. We excluded children from this analysis, because very few children provided ratings for all of the words, making the resulting α estimates unreliable. The Cronbach’s α values for the remaining three adult groups were very high, ranging from .983 to .989. These values represent very high internal consistency.

Validity

We carried out two analyses to address the validity of the present AoA norms. One was to look at the words that children reported not knowing (e.g., Bonin, Barry, Méot, & Chalard, 2004). If children’s estimates of not knowing were accurate, then the adults should have given these words a rating of 7, indicating that they had learned these words later than 12 years of age. We categorized a word as “unknown by the children” if at least 75 % of the children reported they did not know it. We identified 47 such words. Seventy percent of the young adults, 59 % of the middle-aged adults, and 56 % of the older adults gave a rating of 7 to these words. This match between the children’s judgments and the adults’ AoA estimates can be considered an indication of the validity of this measure, especially because it represents a correspondence between an objective measure (of not knowing) and a subjective measure (AoA ratings).

The second validity measure was to look at the correlations between the AoA ratings in the present study and the other existing AoA norms in Turkish. To our knowledge, three sets of published AoA ratings have been based on varying numbers of items (Raman, 2006, 2011; Raman et al., 2014). Raman (2006) collected AoA estimates from young adults for 433 words as part of a study testing the effects of AoA on word naming. We identified 79 words common to her study and ours. We calculated the correlation between the AoA ratings given to these common words in the two studies. Because Raman’s (2006) data were based on young adults, we used only the young adult group’s ratings from our study. There was a strong correlation between the two sets of ratings, r(77) = .900, p < .01. In another study, Raman (2011) reported AoA ratings for 60 picture names (30 early AoA, 30 late AoA) based on Snodgrass and Vanderwart (1980), 16 of which were also in our word list. Again the two sets of item ratings were correlated, r(14) = .684, p < .01. Finally, there were 47 common items between our list and the 260 picture names used in Raman et al. (2014), who reported Turkish norms for color versions of 260 of the Snodgrass and Vanderwart pictures. The two sets of items were also correlated, r(45) = .610, p < .05. These figures are in line with correlations found between other studies in the same language, supporting the validity of the present AoA ratings (Pérez & Navalón, 2005; Sirois et al., 2006). Overall, these analyses suggest that the present AoA norms represent a reliable and valid measure.

Relationship with other psycholinguistic variables

As we noted earlier, important questions with regard to AoA and other lexical variables are their shared variance and relative contributions to cognitive-performance variables (e.g., Clark & Paivio, 2004; Johnston & Barry, 2006). In this section, we analyzed the relationships between AoA estimates and word frequency, concreteness, imageability, and word length, for which recent norms exist in Turkish (Göz, 2003; Tekcan & Göz, 2005). Previous research has shown that early-acquired words were more likely to be high-frequency (Bird, Franklin, & Howard, 2001), more concrete, and higher in imagery (Bird et al., 2001; Gilhooly & Logie, 1980), and shorter (Bird et al., 2001).

Word frequency

We calculated the correlations between the word frequency and AoA estimates separately for all groups. Word frequency was negatively correlated with the AoA estimates from each age group (–.566, –.615, –.601, and –.518 for children, young adults, middle-aged adults, and older adults, respectively, ps < .001). These findings are consistent with previous findings in Turkish (–.33; Raman et al., 2014) as well as in other languages, such as English (–.294; Schock, et al., 2012), Russian (–.47; Tsaparina et al., 2011), Portuguese (–.39; Marques et al., 2007), Italian (–.37; Barca, Burani, & Arduino, 2002), and German (–.53; Birchenough et al., in press).

Concreteness and imageability

Because concreteness and imageability ratings were based on young adults (Tekcan & Göz, 2005), we limited this analysis only to the young adults in our sample. Table 1 shows the Pearson correlation coefficients between AoA, concreteness, and imageability (it also contains the AoA–word frequency correlations for the young adults discussed above). In line with earlier work, AoA showed moderate negative correlations with both concreteness and imageability. The correlation we found for AoA and concreteness (–.37) parallels those found in English (–.51; Schock et al., 2012), Portuguese (–.53; Marques et al., 2007), and Italian (–.38; Barca et al., 2002). With regard to imageability, the correlation between AoA and imageability in the present study was also negative (–.32), but smaller than those found in other languages, such as Russian (–.60; Tsaparina et al., 2011), Portuguese (–.70; Marques et al., 2007), and Italian (–.70; Barca et al., 2002).

Table 1 Correlations between AoA ratings (young adult group only), word frequency, concreteness, imageability, number of syllables, and number of letters

Numbers of syllables and letters

The correlations we obtained between AoA and numbers of syllables and letters were positive (.11 and .18, respectively). Small to moderate correlations between AoA and word length measures have been reported in languages such as Russian (.29 and .19; Tsaparina et al., 2011), English (.30 and .35; Stadthagen-Gonzalez & Davis, 2006), and Portuguese (.28 and .25; Marques et al., 2007). Importantly, although our correlations tended to be smaller than those reported in other languages, they are similar to those found in another study (.20 for syllables and .19 for letters) in Turkish by Raman et al. (2014).

Age effects

In Table 2 we present summary statistics for the AoA ratings of the four age groups and for the other lexical variables we report in the supplemental materials. We report the statistical analyses for group differences below, but we note two aspects with regard to AoA here. First, children tended to use a more restricted range in their AoA estimates, as indicated by the standard deviation as well as by the minimum and maximum values. Second, the AoA distributions for the three adult groups were positively skewed, as indicated by Kolmogorov–Smirnov tests, D s ≥ .98, p s < .01.

Table 2 Descriptives for AoA, word frequency, imageability, concreteness, number of syllables, and number of letters

We addressed two questions regarding the relationship between age and the AoA estimates. First, we were interested in the consistency of the AoA ratings across age groups, and therefore looked at the correlations between the AoA estimates of the four age groups. As can be seen in Table 3, they were strongly correlated (ps < .01). Given the evidence regarding a strong relationship between word frequency and AoA, we looked at whether word frequency affected these correlations, and calculated the same correlations separately for high-frequency (frequency of 100 or more per million) and low-frequency (frequency of 20 or fewer per million) words. Again, the correlations were very high, ranging between .777 and .950 for low-frequency words, and .854 and .951 for high-frequency words.

Table 3 Correlations between the AoA estimates of the four age groups

Second, we looked at whether the subjective AoA estimates changed across age groups. Again, given that frequency was correlated with AoA, we included word frequency as a variable in the analyses (see Fig. 1). A main effect of word frequency on the AoA estimates was apparent: Low-frequency words were assigned later AoAs (M = 3.54, SD = 1.14) than high-frequency words (M = 2.61, SD = 0.73), F(1, 453) = 2,930.75, MSE = .069, η 2 = .87.

Fig. 1
figure 1

Age-of-acquisition (AoA) estimates of the four age groups for low-frequency and high-frequency words

Age also had an effect on the AoA estimates, F(3, 453) = 105.84, MSE = 0.973, η 2 = .41. Children gave earlier AoA estimates (M = 2.09, SD = 0.37) than the remaining three groups (M = 3.35, SD = 0.66; M = 3.58, SD = 0.68; and M = 3.28, SD = 0.84, for the young, middle-aged, and older adults, respectively). In addition, the older adult group reported lower AoA estimates than did the middle-aged adult group. We also found an interaction, F(3, 453) = 259.67, MSE = 0.067, η 2 = .63, such that high-frequency words were given earlier AoA estimates than low-frequency words by young, middle-aged, and older adults, but there was no difference between high- and low-frequency words for children.

One possible reason for earlier AoA estimates by children than by adults may be that there were several words that children did not know and that adults learned late (after 12 years of age) and therefore assigned a rating of 7. Indeed, this might have artificially increased the AoA estimates of the adults. To test this possibility, we repeated the analysis of variance only for the words that were known by the children in our sample (i.e., words that were given a rating by all children). The overall pattern was largely the same. We found a main effect of frequency, F(1, 453) = 574.21, MSE = 0.037, p < .01, η 2 = .56; low-frequency words were given a later AoA (M = 2.61, SD = 0.81) than high-frequency words (M = 2.31, SD = 0.65). We also found an age effect, F(3, 453) = 31.61, MSE = 0.86, p < .01, η 2 = .17. Tukey’s HSD tests showed that children gave earlier AoA estimates (M = 1.99) than did the three older groups (Ms = 2.50, 2.82, and 2.52, for young, middle-aged, and older adults, respectively; p s < .05). There was also an interaction, F(3, 453) = 16.40, MSE = 0.037, p < .01, η 2 = .10. Comparisons of the mean ratings in the two analyses supported our prediction that the words acquired after 12 years of age inflated the AoA estimates of the three adult groups. However, the pattern remained unchanged; children reported lower AoA ratings than did the adult groups. In addition, the interaction pattern was the same: Although low-frequency words were given later AoA estimates than high-frequency words, the differences were smaller for the children and older adults than for the other two groups.

Discussion and conclusions

We have presented subjective AoA ratings for 600 Turkish words from four age groups: children, young adults, middle-aged adults, and older adults. In addition, we provided information about related lexical variables obtained in an earlier norming study for the same set of words (word frequency, as well as concreteness and imageability ratings; Tekcan & Göz, 2005) and numbers of syllables and letters. High internal consistency measures and high correlations with existing Turkish AoA data (Raman, 2006) showed that the present norms constitute a reliable and valid tool for researchers working with Turkish material. The similarity of the correlation patterns among the lexical variables in the present study and those in other languages (e.g., Álvarez & Cuetos, 2007; Bird, Franklin, & Howard, 2001; Schock, Cortese, & Khanna, 2012; Stadthagen-Gonzalez & Davis, 2006; Tsaparina et al., 2011) bolster the confidence in the present AoA norms as a valid tool.

In terms of age effects, there were two main findings. First, the AoA estimates of the four age groups were strongly correlated with each other for both high- and low-frequency words. Second, the mean AoA ratings of the three adult groups were higher than those of the children. Both of these results fit with the few studies that have reported age effects in AoA. De Deyne and Storms (2007), Cuetos, Samartino, and Ellis (2012), and Schröder et al. (2012) found strong correlations between the AoA ratings of younger and older participants (.75, .77, and .85, respectively, in the three studies). Sirois et al. (2006) reported correlations ranging from .89 to .95 for three adult age groups. Although Cuetos et al. (2012) found a smaller correlation (.32), in that study children’s objective rather than subjective AoA ratings were correlated with adult AoA ratings. The present study clearly showed that subjective AoA ratings from children also correlate with those of adults of different ages. Considered alongside the significant age differences we found in the AoA estimates, this strong agreement across age groups is in line with the claim that AoA ratings are more likely to reflect the order rather than the actual age of learning.

The main finding regarding age differences was that the children reported lower AoA than did all of the adult groups. There might be a few potential reasons for this pattern. One is that changes in vocabulary across the generations might have contributed to this difference. De Deyne and Storms (2007) suggested that age differences in AoA estimates might arise from recent words entering the lexicon (e.g., mango, modem; see also Birchenough et al., in press; Kuperman et al., 2012). People who are older would report higher AoA than would children for such words. To test whether this process affected our data, we identified words that are commonly used today but that were not in the Turkish language in 1970s by consulting Turkish dictionaries of the time (Türk Dil Kurumu, 1974). We identified 20 such words, and looked at the age effects on AoA only for these words. The mean AoA ratings for these words were 2.80, 4.33, 5.15, and 5.18 for children and for young, middle-aged, and older adults, respectively. Comparing these values with the mean AoA ratings of the remaining words (2.15, 2.98, 3.25, and 2.98) clearly shows the effects of “new” words; the difference between the two sets of means steadily increases for each age group. This is consistent with the idea that the participants in the two older groups were at least in their teenage or young adult years when they were first exposed to these words.

Although these findings support the effects of changes in the lexicon on age effects in AoA norms (e.g., De Deyne & Storms, 2007), they do not fully explain the difference observed in the data, since there were only 20 such words among 600. A second possibility is that educational and social changes (schooling, smaller families, etc.) across generations might have contributed to this pattern. One concrete example is the preschool experience. It is well established that preschool attendance increases children’s verbal abilities (e.g., van Druten-Frietman, Denessen, Gijsel, & Verhoeven, 2015), and, at least for Turkey, preschool attendance saw a significant shift in the last 20 years: In 2003, when the children in our sample were born, the preschool attendance rate was 12.5 %, whereas in 2014 this had increased to 46 % (TUİK, 2015). To speculate, one might expect changes in AoA estimates over the generations that would be similar to what Flynn (2012) demonstrated for IQ scores. In addition, as has been noted by several researchers (e.g., Brysbaert & Ghyselinck, 2006; Johnston & Barry, 2006), people do not directly remember the age at which they learned words when making AoA judgments. This type of judgment is based on people’s metacognitive ability. Research has shown that metacognitive abilities develop continually from childhood until late adolescence and remain stable until old age (e.g., Weil et al., 2013). Finally, following up on Kuperman et al.’s (2012) reasoning with regard to older adults’ high AoA ratings, it may be argued that children’s lower AoA estimates were influenced by the fact that they were making judgments within a much smaller time frame (their lifespan) than the adults.

Another interesting difference was that, whereas all of the adult groups gave higher AoA ratings to low-frequency than to high-frequency words, we observed no such difference for children. This is an unexpected and, to our knowledge, novel finding that has not been reported before. Although it is difficult to pinpoint why this occurred, this result might fit in with children’s ability to integrate frequencies and base rates into their decision making. Although there is evidence that children can accurately estimate the frequencies of stimuli encountered in a lab settings (e.g., N. R. Ellis, Palmer, & Reese, 1988), they are less able to make use of frequencies for naturally occurring events (e.g., Jacobs, Greenwald, & Osgood, 1995). Klaczynski (2001) suggested that estimation and general statistical reasoning abilities continue to develop steadily through adolescence. It is, therefore, possible that children’s inadequate estimation abilities might have played a role with regard to this finding. An additional possibility is related to the time frame argument mentioned above (Kuperman et al., 2012): Adults have a much longer time frame within which they were exposed to both high- and low-frequency words than do children, who have been exposed to words for a much shorter period. Therefore, on the basis of the differences in years of exposure, it may be easier for adults than for children to make that distinction.

As we noted above, the correlations among the variables we used were generally in line with existing data from different languages, as well as with theoretical conceptualizations. AoA ratings showed moderate to strong correlations with word frequency, concreteness, and imageability, and small but significant positive correlations with both measures of word length. The strong association between word frequency and AoA was expected, given that, by definition, people are more likely to be exposed to high-frequency than to low-frequency words. Similarly, words learned earlier tend to be more concrete, which are also more likely to be easier to imagine (Bonin et al., 2004; Sirois et al., 2006). One interesting finding was that the correlations between the word length measures (especially number of syllables) and AoA were smaller than those reported for other languages (e.g., Sirois et al., 2006). One potential reason for this could be that a large portion of the words had two or three syllables, leading to a restricted range for that variable.

One limitation of the present study was that although we had three different AoA forms, each containing 200 words, which were administered using a Latin-square design, we did not randomize the order of the words in each booklet. It might be argued that this could have potentially led to order effects. However, on the basis of the similarity of our findings to those reported in other languages, we believe that order effects were unlikely to have influenced the AoA ratings.

In summary, we have provided reliable and valid AoA norms in Turkish that can be useful for empirical investigations in learning, memory, and psycholinguistics, especially when age differences might be of interest. Moreover, although the most obvious use of these norms would be the selection of materials in studies in which well-controlled procedures are essential (e.g., picture naming), they are likely to prove useful for research in which such control is not possible (e.g., earliest autobiographical memories). It should be mentioned that not all cognitive tasks are equally influenced by AoA (Johnston & Barry, 2006). Although the exact locus of the AoA effect is a matter of debate, substantial data support a semantic basis for AoA effects (Brysbaert, Van Wijnendaele, & De Deyne, 2000; Juhasz, 2005). Thus, norms such as those we have presented here might be especially useful for tasks that emphasize access to meaning, such as picture naming, object recognition, and autobiographical memory. In addition, the present AoA norms and lexical variables could also be usefully employed where the fully consistent print-to-sound mapping of Turkish would be relevant.