The knowledge we acquire about the world throughout our lives constitutes our semantic memory. This memory includes the accumulation of information we have about objects, such as texture, color, common behavior, superordinate category, etc. This memory has both a static element, which is more stable, and a dynamic element, which can vary over one’s lifetime as well as across individuals. Many models of semantic memory posit that conceptual representation is dynamic and its activation is context-dependent (Barsalou, 1982; White et al., 2018; Yee & Thompson-Schill, 2016). It follows, therefore, that there is an amount of knowledge that is partially idiosyncratic because it is closely tied to our daily experience and the culture in which we live (White et al., 2018). However, the words that we use to communicate with other speakers obviously refer to some common conceptual features that allow us to understand each other. These features are essential to establishing an interaction and understanding between speakers, and considerable stability of meaning has been observed for concrete concepts in common use, both within the same language and among different languages, and even among different age groups (García Coni et al., 2019; Kremer & Baroni, 2011; Vivas, Kogan, et al., 2020a; Vivas, Martínez, et al., 2020b; Vivas, Montefinese, et al., 2020c).

Consequently, there should be stable and essential aspects of a concept’s meaning that are shared by speakers of a language and allow for effective communication and understanding (White et al., 2018). Vivas, Kogan et al. (Vivas, Kogan, et al., 2020a) suggested there are three levels in the structure of a concept’s meaning: core features, partially shared features and idiosyncratic features. The former (core features) refers to the essential features of the concept at issue, which are probably present in every person’s semantic representation of that object. The second level (partially shared features) can be present in many people’s representations but are not indispensable to defining the concept. Lastly, idiosyncratic features are part of a person's mental representation of the object that are tied to personal experience and not shared with other members of the community. Core features are also shared across languages. Despite their commonality among people, core features can show certain variability across cultures and, most importantly for the focus of the current paper, across one’s life span.

Semantic feature production norms have been developed to identify the shared aspects of meaning. Norms are currently available for various languages, including Dutch, English, Spanish, Italian, German and Chinese (Buchanan, Holmes, Teasley, & Hutchison, 2013; Buchanan et al., 2019; De Deyne et al., 2008; Deng et al., n.d.; Lenci et al., 2013; McRae, Cree, Seidenberg, & McNorgan, 2005; Moldovan, Ferré, Demestre, & Sánchez-Casas, 2015; Montefinese, Ambrosini, Fairfield, & Mammarella 2013; Ruts et al., 2004; Vinson & Vigliocco, 2008; Vivas et al., 2017; Zannino et al., 2006).

Most of these norms include only concrete concepts, but there are also some derived from abstract concepts, verbs and adjectives (Buchanan et al., 2019; Lenci et al., 2013; Vinson & Vigliocco, 2008). These norms provide valuable information about the mental representation of concepts and form the basis for theoretical models on semantic memory. The information extracted sheds light on the elements that compose concepts (i.e., the features) and their weight and relation. Additionally, semantic feature production norms provide resources that can be useful for the selection of experimental stimuli, such as the concept and feature variables. As mentioned in the introductory paragraph, these norms mainly account for those common aspects of meaning, but intersubjective variability obviously exists and can also be analyzed, as has been done in depth by Chaigneau and colleagues (Chaigneau et al., 2018). These authors have developed a formula to measure conceptual variability within Conceptual Property Norms and have shown how concepts can have different feature variability. We will focus on the commonalities in semantic representations, but the interested reader can perform some of these analyses with our database, which is available in the Open Science Framework (OSF).

Semantic feature production norms have generally been developed on the basis of young, healthy adults. However, it has been observed that certain particularities can emerge in semantic representation during one’s life span. Evidence of the differential characteristics of the lexico-semantic system of older adults comes from diverse sources. An interesting amount of data has been obtained on the basis of semantic network analysis. For example, Dubossarsky et al. (2017) performed a study using a word association task with a wide sample comprising children, middle-aged adults and older adults. They observed increasing average path length, smaller in and out degree and increasing entropy in late life. The authors conclude that the semantic networks of older adults are less connected, less organized and less efficient. In the same vein, Wulff, Hills and Mata (2018) performed a similar analysis using a verbal fluency task. They observed sparser networks in older adults. Further, Wulff, De Deyne, Jones, Mata and the Aging Lexicon Consortium (Wulff et al., 2019) assert that “there is now converging evidence that although network size appears to grow continuously across the life span, degree and shortest path length show mirrored nonlinear trends, with degree increasing across childhood and decreasing across adulthood and shortest path length decreasing across childhood and increasing across adulthood” (p. 7). They suggest a four-dimensional model to account for these differences: environment, learning, representations and retrieval. The former refers to the cumulative exposure to the environment that contributes to the diversity of people over their lifetimes. The learning component refers to sensory constraints and attention failures in encoding. The dimension of representations includes two processes: decay and consolidation. Those representations that are not activated tend to show a gradual decay, but there is also an ongoing gradual and systemic consolidation process that can last months and systematically reorganizes memories by removing unused memories, allowing more efficient memory representations (as proposed by Hardt et al., 2013). By contrast, system consolidation is a much slower process which can last for weeks to months, or even up to several decades, depending on the species. Consolidation in this case refers to a gradual process of reorganization and a differential involvement of the brain regions that support memory processing. The last dimension of the model, retrieval, refers to the changes that occur in cognitive control or interferences and search strategies. All these factors help to explain changes in semantic networks over one’s life span.

With respect to vocabulary and use of language, there is plenty of evidence that they tend to be maintained during aging (Kausler, 1991; Harada et al., 2013; Salthouse, 1993) and even that older adults tend to perform better than younger participants on vocabulary tests (Krieger-Redwood et al., 2019), synonym judgement tasks (Hoffman, 2019) and general knowledge (Coane & Umanath, 2021). However, it is very common to observe retrieval deficits of specific words. This leads to the phenomena called tip-of-the-tongue, where there is a failure to activate the complete phonological information about the word (Burke & Shafto, 2004). Verbal fluency also tends to decline with age (Harada et al., 2013).

From the neurofunctional point of view, a recent meta-analysis of neuroimaging studies in semantic tasks comparing younger and older people indicated that the latter exhibit domain-general neural resources and a reduction in prefrontal lateralization (Hoffman & Morcom, 2018). This result is interpreted as a compensatory mechanism for the decline in the effectiveness of executive functions in order to face task demands. Moreover, a recent study by Krieger-Redwood et al. (2019) analyzed intrinsic brain connectivity during a task that demanded semantic control in both young and old participants. They observed that older adults performed worse in the task, and they attributed this phenomenon to an observed reduction in intrinsic brain connectivity between anterior temporal lobe and medial prefrontal cortex within the default mode network. However, it is important to be cautious when generalizing these results to other aspects of semantic cognition. A variety of processes within semantic cognition must be differentiated. As stated by Hoffman (2018), there are “semantic representations that accumulate throughout the life span, processes for controlled retrieval of less salient semantic information, which appear age-invariant, and mechanisms for selecting task-relevant aspects of semantic knowledge, which decline with age and may relate more closely to domain-general executive control” (p. 2). Grieder et al. (2012) also provide behavioral and electrophysiological evidence that automatic semantic retrieval remains stable in global signal strength and topographic distribution during healthy aging.

In addition, some authors analyzed the types of lexical relations produced by a group of older adults with typical aging (Minto-García et al., 2020). They selected a word association task and asked participants to produce the first word that came to mind when presented with a target word. Only nouns were used. Responses were classified according to two coding schemes: paradigmatic vs. syntagmatic, and another classification proposed by the authors that included the following categories: (i) semantic association (categorical, e.g., dog-animal, and non-categorical, e.g., tree-branch); (ii) wide association (thematic-contextual co-occurrence plus semantic relation, e.g., bee-sting, and thematic-contextual co-occurrence, e.g., elephant-luck) and (iii) association by signifiers (phonological and morphological) (see Minto-García et al., 2020, p. 5). They observed that a greater number of responses were paradigmatic and fell under thematic-contextual co-occurrence plus semantic relations (e.g., carrot-rabbit). In this regard, one of the most frequently addressed issues when comparing the semantic organization of younger and older adults is the alleged preference for thematic/situational categorization in the latter, and for taxonomic/categorical categorization in the former. Using a picture-matching task, Maintenant et al. (2011) found that older people were more likely to make thematic choices because they found it more difficult to inhibit them when the task required it. That pattern of performance has often been attributed to cognitive decline, where situational categories, which are tied to context and superficial characteristics, are benefited by their salience and low cost (García Coni, Comesaña, Piccolo, & Vivas, 2020; García Coni et al., 2019; Maintenant et al., 2013).

More closely related to the topic of semantic feature composition, some studies compared younger and older adults who spoke different languages in order to analyze the differences between and within cultures in semantic tasks. A recent paper from White et al. (2018) performed a study in order to demonstrate how the meaning of concepts evolves over time. They studied younger and older adult speakers of French and Dutch on a category judgment task with household containers. They found, regarding the concept "bottle", that older and younger adults weighted classic materials such as glass differently from relatively new materials such as plastic. This would imply that our changing environment generates changes in the meaning of words—at least in terms of the features we consider relevant to define objects. However, there will always be a considerable amount of overlap in the life experiences of different age groups within the same culture, which will cause mental representations to overlap as well. According to this same study, the component of necessary stability of representations could explain why older adults who grew up with glass bottles maintain this bottle prototype without adapting it to current experiences (which incorporate plastic). This is also supported by the fact that the ability of statistical learning through the regularities of the environment decreases with age and becomes less effective (McNealy et al., 2010).

For their part, Yoon et al. (2004) collected category norms for young and old adults from two vastly different cultures: Chinese and American. They asked participants to produce items belonging to a shown category and calculated affinity scores for each category. The results indicated that there is a high level of similarity between young and old adults. This study found that differences between cultures are much greater than differences within cultures. However, some categories, such as disease and herbal medicines, showed differences between age groups, too.

More recently, Castro et al. (2020) presented category norms for different age groups including young (18–39 years), middle-aged (40–59 years) and old (60 years and older) adults by selecting 70 categories, including those selected by Yoon et al. (2004). Interestingly, they observed that young and middle-aged adults showed lower between-subjects variability in their responses than old adults.

Verheyen et al. (2019) also analyzed age-related differences in categorization. They asked participants whether or not an item belonged to a certain category. They used eight categories with 24 items each. They observed that older adults used a lower threshold for category membership. Furthermore, although they observed very similar categorization proportions between groups, some items from the categories insects, fruits, fish, furniture, tools and sports were found to be differently functioning items according to the model analysis. It is worth noting that at the end of their paper, they state that “these findings indicate that studies on age-related semantic processing should recognize the age-specific nature of semantic representations” (p. 15).

All the evidence suggests that the elderly lexical-semantic system has some particularities that deserve to be considered when studying semantic cognition. Consequently, older adults should have their own normative data in order to be more faithful to their own characteristics and to avoid attributing characteristics that belong to other age groups and that are probably not a good reflection of their semantic organization. Additionally, the study of semantic organization in the aging stage is of vital importance, since it is one of the cognitive domains that allows for differentiation between normal and pathological aging.

Thus, for the purpose of analyzing the semantic organization of older adults and providing researchers with new normative data, we developed the first semantic feature production norms for this population. Additionally, a comparison of younger and older adults will be performed for relevant concept and feature variables.

Description of the norms

Sample

Our sample consisted of 810 healthy older adults (63% women) from the city of Mar del Plata, Argentina. Participants were accessed through university programs for older adults. Their ages ranged from 60 to 95 years old (mean= 70.53 years, SD = 7.34 years). Every participant was a native Argentine Spanish speaker. All of them gave their informed consent to participate in this study. The study was performed according to the principles of the Declaration of Helsinki (World Medical Association (WMA), 2013).

Materials

The concepts were extracted from the database built by Cycowicz et al. (1997). The same set was used to build the young adult Spanish feature norms (J. Vivas et al., 2017) as well as other Argentine psycholinguistic norms (Manoiloff et al., 2010; Marínez-Cuitiño et al., 2015). Each concept corresponds to a single noun in Argentine Spanish. In case of polysemic terms, a key was added to clarify the target meaning.

The norms include 400 concepts belonging to 22 semantic categories, from the domains of living (129 concepts) and non-living things (227) and other categories that cannot be categorized in that way (44). The categories from the living domain include animals (93), vegetables (12), fruits (15) and plants (9). Non-living categories include accessories (19), weapons (4), tools (33), constructions (17), house parts (10), clothing (17), utensils (29), furniture (14), vehicles (17), devices (13), objects (27), containers (16) and toys (11). The categories that cannot be included in either domain are food (6), musical instruments (14), body parts (19) and nature (5). Food, musical instruments and body parts are exceptional cases because, according to the category-specific deficits literature, they behave neither as non-living nor as living things (see Barbarotto et al., 2001; Mahon & Caramazza, 2009; Rumiati et al., 2016). We also added the category “nature” as a salient exception because it includes concepts such as cloud and moon that are non-living but also not manmade. The concepts were also chosen to span a wide range of familiarity values, although a minimum value of familiarity was obviously required in order for participants to provide useful information.

Data collection

The norms were collected over a period of three years in the city of Mar del Plata (Argentina). Concepts were distributed in groups of 15 in different spreadsheets in such a way that categories were homogeneously represented. Each participant listed features for only one set of concepts. The same spreadsheets were used for the young adult norms (for a detailed description see Vivas et al., 2017).

Participants were given a set of concepts and had to produce a list of features that described them. They were provided 15 blank lines per concept and instructed to list different types of features, such as those related to internal parts and physical properties (their appearance, sound, smell or touch). They were also encouraged to think about where, when and how they use the object at issue, and to consider the category to which it belongs. Two examples were provided, one for each domain. The instructions that participants received are presented in Appendix 1. In every case, 30 subjects listed features for each concept. Participants were not given a time limit; they took approximately 20–30 minutes to complete the task.

Recording Process

The construction of norms requires the contribution of a very large number of participants. Subjects provide spontaneous answers that vary considerably in how the same features are expressed. The answers provided by the participants were diverse, even when referring to the same features. For example, to describe the concept chair, some participants produced “has legs” and others “has many legs” or just “legs”. In Spanish, the variety of features is wider than in English, as in the former language, adjectives must agree with the noun they modify in gender and number. To cope with this variability, an arduous and extensive task was undertaken to ensure that features conveying the same meaning were coded identically, within a concept and among different concepts. Similarly, features with dissimilar meanings were carefully coded with different labels. This procedure is referred to in McRae’s paper (2005) as the recording process. This process entails the adjustment of most of the features produced by participants without altering the original meaning of those features. There are at least two main reasons that justify this process. First, as the norms aim to capture the regularities underlying the production of semantic features, the wide variety of spontaneous expressions given for those features must be reduced. Otherwise, the vast information provided by the norms would be useless, and its analysis would be impossible. Secondly, in order to accurately compute many feature variables, the unification of features is essential. For example, variables such as production frequency (i.e., the number of participants who produced a certain feature within a specific concept) and distinctiveness (the number of concepts in which a certain feature is listed) would not be correctly calculated if features were not coded equally. In order to rigorously perform this process, we followed McRae and colleagues’ (2005) criteria, but we also added new criteria to perform this process successfully. Below, we cite the most important criteria we previously published in Vivas et al. (2017):Footnote 1

“All features consisting of adjectives were written as singular and masculine independent of the number and gender of the corresponding concept.

Quantifiers (for instance: “generally” or “usually”) were eliminated, because the information provided by these words is expressed by the production frequency of the feature.

To identify the features that referred to a subtype of a concept, we used the expression <can be> (for example, for the concept apple, <can be red> and <can be green> were used)Footnote 2.

The features constituted by a quantifier adjective preceding a noun such as “has four legs”, were divided into two separate features: <has four legs> and <has legs>. This decision was made because two bits of information are contained in features like these and we intended to preserve both.

Disjunctive features (such as “is red or black” in the case of an ant) were also divided (in this example, into <is red> and <is black>). However, if a feature conveyed a conjunction (such as “is black and yellow” in the case of the concept bee), it was not divided.

In some cases, some words were added to the features. For example, an indefinite article (“a” or “an”) was added to the features that referred to superordinate categories (for instance, “animal” was transformed into <an animal>), and the expression “used for” was incorporated into the features that referred to a function (for example, the feature “to carry things” was transformed into <used for carrying things>).

Every feature that consisted in a verb was conjugated in the indicative of the present tense (e.g., “roar” was transformed into <roars> in the case of the concept lion).

The word “has” was added to every feature that referenced the possession of a certain part or object, and that word replaced any synonym for it, such as “possesses”, and any other word that conveyed a similar meaning, such as “with” (e.g., in the case of the concept lion, the features “possesses a mane”, “with a mane” and “mane” were all replaced by <has a mane>).” (p. 1099)

Measures and statistics

In this section, we detail the variables corresponding to feature and concept measures. The calculations that were performed are shown. Between parenthesis we show the values obtained for the younger adult (YA) population.

Participants produced a total of 24,654 features (YA: 21,630). At this point it is important to clarify that, out of these features, only those produced by at least five participants were included in the norms, as per McRae and colleagues’ (2005) criteria. The reason for this decision is that values lower than 5, as Zannino et al. (Zannino et al., 2006) suggest, are not considered representative of the knowledge that the community has about the concepts at issue. As a result, only 2734 (YA: 3064) features were used to calculate most of the variables. The mean of the features produced by each participant was 4.59 (SD 2.39; Max. 20; Min. 1) (YA: 5.82; SD 2.25; Max. 17; Min. 1).

Four files were elaborated: a concept-feature file, a concept-concept matrix, a feature-feature matrix and a phonetic transcription. The datasets generated in the current study are available in the OSF repository. Concepts and/or features in these files are written both in Spanish and English. Note that the English version has been published simply to facilitate communication with English-speaking readers and is not intended to be used as English-language normative data. Each of these files is described below.

File 1. Concept-feature

This file includes the features produced by the participants for the 400 nouns. The variables shown in this file are equivalent to the ones found in the previously published norms for young adults, although they are ordered differently. For clarity, variables were organized in four blocks.

First block - Spanish Concept, English Concept, Spanish Category, English Category, Spanish Feature, English Feature, WB-Label, N_WB_Tax, N_WB_Sit, N_WB_Ent, N_WB_Intr

This first block includes variables referring to the description and classification of concepts and features. The first and second columns of this file correspond to the concept name in Spanish and English, respectively. The next two columns refer to the semantic category in both languages, followed by the feature name in both languages. The following column (WB-label) shows the feature type, according to the coding scheme proposed by Wu and Barsalou (2009).

The authors propose five major categories: taxonomic categories (C), situation properties (S), entity properties (E), introspective properties (I) and miscellaneous (M). Subcategories are also proposed. A more detailed feature type classification is expressed in lowercase after the hyphen (see Appendix 2 for the complete coding scheme). Although some features were related to more than one feature type, in general we referred to just one of these categories, as per McRae and colleagues’ (2005) criterion. However, some exceptions were observed. For instance, features that alluded to quantities (such as <has two wings>) were codified as E-quant + the corresponding feature category (in this example: E-excomp), and those that included negations (like <cannot fly>) were codified as I-neg + the corresponding feature category (in this example: E-beh). The final columns of this block (N_WB_Tax; N_WB_Sit; N_WB_Ent; N_WB_Intr) include the number of different instances of each of the major categories present for each concept.

Because the process of assigning a specific code to each feature is performed by different coders and can be ambiguous in some cases, we followed the procedure recommended by Bolognesi et al. (2017) in order to attain the maximum reliability in annotations for each feature. The steps taken were as follows: (1) We chose Wu and Barsalou’s coding scheme for three reasons: (i) it is one of the most well-known and complete taxonomies; (ii) it is hierarchically structured and therefore allows for the possibility of analyzing broader categories as well; and (iii) it was previously used in the young adult semantic feature norms (Vivas et al., 2017), and consequently we already had trained coders. (2) Initially, training sessions were held to explain and exemplify the taxonomy to new coders, and a detailed, written description with examples was provided to every participant.Footnote 3 (3) A subset of 114 concepts (28% of the full set) with their respective 800 features were selected for the initial annotation, on which a reliability analysis would be performed. (4) One trained and four novice coders codified the features considering the subcategories of Wu and Barsalou’s coding scheme. (5) Krippendorff's alpha analysis was performed with the codes provided. First, an analysis was performed only with the novice coders, which obtained a value of .78. Then another analysis was performed including the trained coder, which obtained a value of .863, indicating a high level of agreement and, consequently, a very acceptable reliability for the coding process. Afterwards, a proportion of concepts were given to each coder to work on independently.

It can be observed that for the older adult (OA) norms, 440 features corresponded to the taxonomic category (while 520 corresponded to this category for YA), 1069 to situational properties (YA: 1064), 1026 to entity properties (YA: 1383) and 203 to introspective properties (YA: 97).

Second Block - Prod_Freq, Ranked_Prod_Freq, Sum_PF, CPF, Disting, Distinctiveness, CV, Rel, Intercor_Strength

This second block comprises properties of the features. The following nine columns refer to feature properties. Production frequency (Prod_Freq) is the number of participants who produced that feature for the target concept. Ranked production frequency (Rank_PF) is the ordered position of the feature with respect to the other features of the same concept, ranked by descending order of production frequency of the feature. Total production frequency (Sum_PF) expresses the sum of the production frequencies of that feature across all concepts in which it appears. The CPF column lists the number of concepts in which that feature occurs. The following two variables refer to CPF: a binary variable (Disting) that indicates if the feature is distinguishing (“D”, if it only appears in one or two concepts) or not (“ND”, if it appears in more than two concepts), and a quantitative measure (Distinct), which is the inverse of CPF (i.e., 1/CPF) (Devlin et al., 1998; Garrard et al., 2001). This variable was calculated using all the concepts included in the norms. Cue validity (CV) represents the production frequency of the feature divided by the sum of the production frequencies of that feature in all the concepts in which it appears. Following the analysis performed for young adults, taxonomic features were not removed to calculate cue validity, as McRae and colleagues had done. As cue validity is calculated on a per-feature basis, there is no interaction between taxonomic and other kinds of features. Relevance (Rel) is related to distinctiveness and cue validity, as they convey feature informativeness (Marques et al., 2011; Sartori et al., 2005; Sartori & Lombardi, 2004). Relevance connects a local component, operationalized by production frequency, and a global component, which expresses the contribution of the feature to the meaning of the other concepts. The values of this variable were calculated by using the equation employed by Sartori et al. (Sartori et al., 2007)Footnote 4. Intercorrelational strength (Intercor_Strenght_No_Tax) occupies the final column of the feature properties block. This measure encompasses the strength of the correlation between a feature and the other features for the same concept. It is calculated by adding the features’ shared variance (i.e., r2) with that of the other features of the same concept. The shared variances between features can be found in File 3, feature-feature. This calculation was performed based only on features found in three or more concepts, as was done for the young adult norms; therefore, features that do not have values for intercorrelational strength in the file are those that are associated with less than three concepts. We considered a significance level of p ≤ .05, which corresponds to │r│>.164 (which is 2.7% of shared variance) (Sheskin, 2007; Vivas et al., 2017).

Third Block - Familiarity, Total_feat, 5_feat_tax, 5_feat_no_tax, Density_No_Tax, Num_Correl_Pairs_No_Tax, Perc_Correl_Pairs_No_Tax, Num_Disting_Feats_No_Tax, Disting_Feats_Perc_No_Tax, Mean_Distinct_No_Tax, Mean_CV_No_Tax, Mean_Corr.

This block includes properties related to the concept. The first column shows the concept’s familiarity (Familiarity), which was extracted from the Argentine psycholinguistic norms (Manoiloff et al., 2010). The range of that measure goes from 1 to 5. Then there are three columns devoted to the number of features: the first one includes the sum of all features for that concept, including those produced by just one person (Total_Feat). The second is the sum of all features that were produced by five or more people (5_Feat_Tax). The third sum excludes both the features with less than five mentions, and the taxonomic features (5_Feat_No_Tax).

Next comes intercorrelational density (Density_No_Tax), which is the sum of shared variances (i.e., r2) across the concept’s significantly correlated features. Whereas intercorrelational strength is a feature variable, intercorrelational density is a concept variable. Both were calculated for non-taxonomic features only (see Vivas et al., 2017). The shared variances between features can be found in File 3, feature-feature. This file only shows the values of the significantly correlated feature pairs per concept.

The last two variables related to significantly correlated feature pairs are the number of pairs for a given concept (Num_Correl_Pairs_No_Tax) and the percentage of correlated features over those with a production frequency of 5 or more (Perc_Correl_Pairs_No_Tax).

The next two concepts’ variables in the file are the number of distinguishing features (Num_Disting_Feats_No_Tax) and their percentage per concept, excluding taxonomic features (Disting_Feats_perc_No_Tax).

The concept-feature file also includes three variables that express mean values: mean distinctiveness (Mean_Distinct_No_Tax), mean cue validity (Mean_CV) and mean correlation (Mean_Corr). This third variable is derived from the concept-concept matrix and is calculated as the mean of the cosine similarity obtained between the concept and the other 399 concepts of the set.

Fourth Block - Feat_Length_Including_Spaces, Length_Syllables, Length_Letters, Length_Phonemes

Finally, linguistic properties for both concepts and features were included: feature length (Feat_Length_Including_Spaces), concept syllables (Length_Syllables), concept length in letters (Length_Letters) and phonemes (Length_Phonemes).

File 2. Concept-concept

This matrix is made up of the 400 concepts and reflects the similarity between each concept pair. Each concept is considered a vector defined by its feature composition and frequency. Only those features produced by at least five participants were included in this analysis. The semantic similarities were calculated using the geometric technique of comparing two vectors in the n-dimensional Euclidean space by the smallest angle between them. Parallelism (that is, a cosine equal to 1 or −1)Footnote 5 represents the maximum possible similarity, while orthogonality (a cosine equal to 0) represents the maximum possible difference. The cosine was computed in the usual way, computing the ratio between the "component-wise" inner product and the product of the respective Euclidean norms. It is worth mentioning that the idea of measuring semantic similarity through the construction of two vectors from a set of features that defines a concept was originally proposed by Kintsch (2001).

File 3. Feature-feature

This matrix comprises the determination coefficient between the 15576 pairs of features. Only those features that appeared in three or more concepts were included, as per McRae et al.’s (2005) criterion, to avoid spurious correlations between features. Thus, the matrix is composed of the combination of 177 features. This file was necessary to calculate some of the variables found in the Concept-Feature file, namely: features’ intercorrelational strength (Intercorr_Str_No_Tax), concepts’ density (Density_No_Tax), the number of significantly correlated feature pairs (Num_Corred_Pairs_No_Tax) and the percentage of significantly correlated feature pairs (%_Corred_Pair_No_Tax).

File 4. Phonetic transcription

This file includes the broad phonetic transcription of the 400 concepts. We used the International Phonetic Alphabet (IPA) symbols for the transcription, though some adaptations were implemented following Hualde (2014) and Martínez-Celdrán, Fernández-Planas and Carrera-Sabaté (2003). The language variety described is formal Argentinean Spanish spoken in Buenos Aires by educated middle-aged speakers. Interested readers may want to download the self-extracting file containing the IPA fonts used here (Ipa-samd Uclphon1 SILDoulosL): https://www.phon.ucl.ac.uk/home/wells/fonts.htm

Comparison between younger and older adults

In this section, we compare the data between older and younger adults. Tables 1, 2 and 3 show the comparative values of the most relevant variables. The first three rows refer to the total number of features produced. As can be seen, older adults produced more unique features, while younger adults produced more common features (total number of features produced by at least five participants), indicating a wider variety of responses in the older group. The total sum of production frequency was higher for the younger group, indicating greater fluency in feature production. The rows that follow correspond to the mean values per concept for some variables related to the number of features produced (Total-feat, 5-feat-tax, 5-feat-no-tax). Independent-samples t-tests were performed comparing the values of each concept variable between age groups. As highlighted in the table, older adults produced a significantly higher total number of unique features per concept, while younger adults obtained higher values on the quantity of features per concept produced by at least five participants (5-feat-tax and 5-feat-no-tax). These results indicate that the feature production of younger adults is more homogeneous.

Table 1 Comparative values of number of features produced between younger and older adults
Table 2 Comparative mean values for some concept variables
Table 3 Comparative proportion values for each feature type

Mean correlation (mean-corr) was another variable that showed statistically significant differences between groups, indicating that younger participants have stronger correlations between pairs of concepts.

The variables referring to distinctiveness (mean-distinct-no-tax), distinguishing features (num-disting-feats-no-tax) and CV (mean-CV-no-tax) as well as the number of correlated pairs did not show significant differences, suggesting that the properties of the features themselves remain constant across the age groups.

Table 3 shows the percentage of each feature type per group. A chi-square analysis was performed considering feature type and group that indicated significant differences with the expected proportions (X2(3, N = 5798) = 80.082; p < .001). The standardized residuals analysis indicated which values were significantly different from the expected value. We considered a cutoff of 2 as suggested by the specialized literature (Beasley & Schumacher, 1995). As can be seen, younger adults produced more entity properties, while older adults produced more situational and introspective properties.

ns: no significant differences

Furthermore, we performed a vector comparison similar to the one we did for File 2, but comparing concepts between age groups (i.e., dog for YA vs. dog for OA). In this case, we included features produced by at least two participants because this cutoff provides a better reflection of the similarities and differences between groupsFootnote 6. The final matrix can be seen in OSF. As mentioned above, the semantic similarity values in this case range from 0 to 1. As can be seen in Fig. 1, the values tend to be higher than .50. Only 11 concepts (2.75% of the full set) obtained values lower than .30, which indicates a low level of similarity in the feature production between the two groups. The statistical mode of the similarity values for the full set was .691 (min = .025; max = .928).

Fig. 1
figure 1

Histogram of semantic similarity values for the 400 concepts

Additionally, we were interested in knowing which concepts showed more similar feature representation between age groups, so we decided to perform a statistical comparison between semantic domains (living and non-living things). Living things include animals, plants, fruits and vegetables, while non-living things include categories such as tools, devices, furniture, household, weapons, etc. The categories musical instruments, body parts, nature and food were not included in either domain because, as explained previously, they have been shown to behave differently. An independent-samples t-test comparing the similarity values between domains indicated statistically significant differences between them (t = 5.899; p < .001; Hedge’s g = 0.576), showing living things (M = .753; SD = .112) to be more similar between age groups than non-living things (M = .663; SD = .177). Results organized by domain (living, non-living and atypical category named “others”) are plotted in Fig. 2.

Fig. 2
figure 2

3D image of semantic similarity values between concepts and groups organized by domain

Discussion

The main contribution of the current paper is to present the semantic feature production norms for Spanish-speaking older adults. Values for many relevant feature and concept variables were provided, as well as concept-concept and feature-feature comparisons. As mentioned in the introduction, it has been shown that semantic cognition undergoes changes across one’s life span. That is why it is necessary to have normative data for each age population.

Additionally, comparisons were performed between young and old adult normative databases. One finding of interest is that older adults tended to produce more diverse responses than younger adults. Although the latter produced a higher number of features in total, they were more frequently shared across participants. Therefore, it can be said that older participants tend to produce more idiosyncratic features (for a detailed definition see Vivas et al., 2021). This finding is in line with the data extracted from semantic network analysis (Wulff et al., 2018), which indicates that older adults show less inter-individual agreement. These authors propose that semantic networks are a product of experience; hence, as experience becomes greater and more diverse with age, it impacts the network’s composition. Castro et al. (2020) also observed greater variability between subjects in their category norms.

Additionally, differences in mean correlation were observed, indicating that concepts are more closely related within younger semantic networks. The mean correlation can be considered an analogue measure of the clustering coefficient, which is generally described in semantic network analysis as indicating how closely related concepts are within the semantic space. In this sense, Dubossarsky et al. (2017) observed a decrease in the clustering coefficient over the course of a life span, just as we did. However, it is worth mentioning that other authors obtained the opposite result (Zortea et al., 2014).

Interestingly, variables referring to feature measures, such as distinctiveness and CV, seem to remain stable across both age groups. This suggests that the core meaningFootnote 7 of concepts is maintained during a life span; that is, those features that are more central to the concept’s meaning tend to remain stable. In the same vein, as detailed in the introduction, White et al. (2018) observed relatively stable representations between age groups that make communication possible among speakers, but also subtle differences due to varying exposure to materials related to age.

Regarding the type of features produced by participants, it was observed that younger adults produced more entity properties, while older adults produced more situational and introspective properties. This is in line with the literature, which suggests that older adults tend to use thematic criteria to organize concepts (e.g., Minto-García et al., 2020). Thematic relations refer to concepts that belong to a similar scenario or tend to appear in the same spatial-temporal context (Estes et al., 2011), and therefore can be included in the Situational Responses category. It is worth mentioning that the phenomenon that is often considered concomitant to this finding—a lower presence of taxonomic or categorical criterion compared to younger adult production—was not found. According to that line of analysis, while the preference for thematic features and relations would stand out among older adults because of its prominence, the taxonomic criterion would be more inaccessible because of its greater complexity and abstraction (Maintenant et al., 2013; Mudar & Chiang, 2017). Our results do not reflect this, contributing to the idea that semantic memory is not particularly impaired in old age, nor does it reflect early stages of semantic organization.

Another interesting result is that older adults tend to produce more introspective responses, those being responses related with their feelings and subjective beliefs regarding the concept (e.g., I like it, it is nice, etc.). Apparently, they tend to relate the object’s definition to their own experience with the concept. We propose two possible explanations for this finding. First, older adults may tend to highlight the feeling the object produces; that is, the valence of the object. Alternatively, the presence of introspective responses may be indicative of a failure of controlled retrieval of the relevant features for the task at hand. This would be in line with Hoffman and Morcom (2018) and Krieger-Redwood et al. (2019), both of whom observed executive control deficits in semantic tasks. The intervention of cognitive control in information retrieval consists precisely in reducing interference and sharpening the focus on task-relevant representations (Wulff et al., 2019). As the instructions given to participants were to produce features that define the concept, the examiner would expect to receive more essential and defining characteristics of the object, which are those that make it unique and distinguishable from others. Introspective responses are not quite specific to an object’s definition, but are rather subjective parts of one’s mental representation, which is why they can be regarded as less accurate responses.

With respect to the fact that there were more unique attributes in older adults, but a higher total production in younger adults, it is worth noting that experience endows greater vocabulary and knowledge that could lead to a more diverse production, but, in turn, this greater vocabulary and knowledge could make it more costly and difficult to retrieve the right information (Wulff et al., 2018).

Finally, the vector comparison allowed us to compare the list of features produced by age group. This analysis showed mainly high levels of similarities between groups, as observed by other authors (White et al., 2018; Yoon et al., 2004). Interestingly, those concepts that obtained values lower than .300 were all non-living things. Furthermore, when analyzing the differences between domains in both groups, it was observed that more divergences existed within non-living concepts. Those concepts, as they are manmade, tend to change over time, as evidenced by White et al. (2018), but living things (animals, fruits, plants and vegetables) do not. This is why greater differences in the feature production between younger and older adults are observed in the non-living domain.

A limitation that must be acknowledged from these norms is that they were extracted from a highly educated sample. We decided to do that in order to make data comparable to the younger adult norms which were extracted from university students. However, this has the limitation that the data do not allow direct and immediate generalization to another population.

In future studies, we plan to continue analyzing the differences between both age groups with other techniques, such as semantic network analysis, based on the suggestion of the Aging Lexicon Group (Wulff et al. 2018) for a Free Association Task, and the methodological strategy proposed by Dubossarsky et al. (2017). We are presently working on a comparison of the structural properties of semantic networks for younger and older adults, generated on the basis of the data obtained from a features production task using the same 400 concepts of the Spanish semantic norms for younger and older adults.

In addition, it would be relevant to have normative data for older adults for other kinds of concepts, such as abstract or emotional, and therefore future research in this area would be welcomed.