Background

Children are sensitive to semantic meaning, in both taxonomic and associative links, by 24 months of age (Arias-Trejo & Plunkett, 2009). This has been measured by comparing whether children attend longer to the second word of a pair of words related in meaning (e.g. catdog) compared to unrelated word combinations (e.g. catplate). However, one striking observation from this research is the absence of a readily available resource of these related words, also referred to as word associations (WAs), in young children to validate the exact relationships between words in early childhood and to inform stimuli selection in research exploring the emergent lexical-semantic system. The lack of such a resource has resulted in a reliance on WAs from the adult literature. To our knowledge, these WAs have not been validated as existing in the lexicons of young children, yet are nonetheless used as stimuli when exploring early semantic development. It could be argued that the demonstration of semantic priming in early childhood (Arias-Trejo & Plunkett, 2009) validates the use of WAs taken from adult norms. However, these effects might not be distributed evenly across all stimuli pairs, and more importantly, the failure to demonstrate priming earlier than 24 months using intermodal preferential looking (Arias-Trejo & Plunkett, 2013) could be due to partially immature word associations due to the selection of stimuli. Thus, it would be of empirical interest to first determine whether WAs are comparable in the adult and the emergent child lexicon. Furthermore, by documenting child-specific WAs, it may highlight some of the first associations that children form and can verbalise, suggesting the primacy of these relationships. Consequently, these early relationships would be more likely to be captured in studies that explore the development of semantic meaning as young as 18 months old (e.g. Delle Luche et al., 2014; Plunkett et al., 2022).

A number of early studies (e.g. Jenkins & Russell, 1960; Koff, 1965; Woodrow & Lowell, 1916) did explore differences between WAs in adults and children, but findings were inconclusive, older children (> 8 years) were tested, and the exact word pairings that children used were not documented and made accessible as a stimulus resource.

Word association tasks have been employed in various areas of psychological research for over a century (Fitzpatrick et al., 2013). In a typical WA task, a participant names or writes the first word they think of in response to a cue word. Exploring WAs can provide insight into the organisation of the mental lexicon and how this organisation affects performance in certain tasks involving memory and verbal response (Comesaña et al., 2014). Through our experience of the world, associative structures form, linking word representations together in the mental lexicon. The shared lexical experience of many people is represented by this associative structure, and the way in which words are associated provides information about the organisation of the mental lexicon (Nelson et al., 2000). When one word readily cues another, the links between the two are believed to have a strong connection in memory (Nelson et al., 2000). This makes the study of WAs a useful tool for investigating meaning and internal representations related to language (De Deyne et al., 2019).

In network models of semantic memory (Collins & Loftus, 1975), concepts are represented in an interconnected network of nodes. Spreading activation occurs between related concepts in such a system so that when one concept is activated, like the cue in a WA task, this activates other nodes related to the concept, such as the responses generated to the cue word. A common opinion is that these WAs represent the links in the network (de Groot, 1989), and by knowing the types of responses (e.g. paradigmatic or syntagmatic), it can reveal the types of links between concepts in semantic memory (Moss & Older, 1996).

However, conceptual links are not the only factor affecting associative strength in words. The frequent co-occurrence of words such as catdog are thought to contribute to the associative strength in addition to their category membership, which means co-ordinates such as cathorse would have a lower associative strength to catdog, as the words might belong to the same semantic category, but they do not occur frequently together in everyday language (Moss & Older, 1996).

To date, there have been a large number of adult studies looking at WAs, including studies which document the exact word–word pairs produced by more than one participant (these are discussed further in the next section: Word Association Studies in Adults). In contrast, there have been far fewer child studies looking at WAs and, to the best of our knowledge, none to date have tested British English-speaking children under 4 years old, nor have these studies included a resource of the word–word pairs produced by children that are suitable for use in infant studies. The absence of the exact word–word pairs produced by children in child WA studies to date has resulted in a reliance on adult WA studies that do include word–word pairs, to inform stimuli selection in child studies exploring the development of semantic meaning. For this reason, the next section presents the commonly cited adult WA resources in the child literature, and other large-scale adult WA resources to act as a model for how child-equivalent resources could look.

Word association studies in adults

Studies investigating infant semantic development often draw stimuli from, and reference the work of, three key adult associative norms studies: the Edinburgh Associative Thesaurus (Kiss et al., 1973), the Birkbeck Word Association Norms (Moss & Older, 1996), and the University of South Florida free association norms (Nelson et al., 2004).

Kiss (1975) and Kiss et al. (1973) collected WAs between 1968 and 1971 from 100 British, 17–22-year-olds for the Edinburgh Associative Thesaurus. There are 8400 cues (taken from Kent & Rosanoff, 1910) with 100 responses per cue. Although this resource is no longer readily available, it has more recently been transformed into an RDF dataset (Resource Description Framework—a model for data interchange on the Web) (Hees et al., 2016). Child studies using this resource to inform stimuli selection include Arias-Trejo and Plunkett (2009, 2013), Chow et al. (2017, 2018), and Mani and Plunkett (2010).

Moss and Older (1996) compiled the Birkbeck Word Association Norms from the associative responses to 2464 words, organised into 14 tests, over 7 years. Participants were between 17 and 45, living in the UK. Each cue word was allocated to 41–50 British English participants, and each participant responded to 50–387 cue words, with some participants completing more than one test session. Child studies using this resource to inform stimuli selection include Arias-Trejo and Plunkett (2009, 2013), Jardak and Byers-Heinlein (2019), Mani and Plunkett (2010), and Styles and Plunkett (2009).

In the University of South Florida free association norms, Nelson et al. (2004) reported the WAs of more than 6000 American adult participants to 5019 cues. A total of 149 participants responded to 100–200 words on average, which generated 72,000 word pairs. The research has been cited 1900+ times and is the most commonly used resource in English (De Deyne et al., 2019), despite data collection starting 40 years before its publication. Child studies using this resource to inform stimuli selection include Chow et al. (2017), Delle Luche et al. (2014), and Jardak and Byers-Heinlein (2019).

A more recent adult study is the English Small World of Words project (SWOW-EN) (De Deyne et al., 2019) which compiled a new English WA dataset, collected between 2011 and 2018. The study tested 12,000 cue words on over 90,000 participants. The sample included over-16-year-olds who were predominantly American English and British English speakers.

Due to inconsistencies found in the methodologies used in a number of influential adult WA studies, Fitzpatrick et al. (2013) devised a WA task to explore differences in WAs, modulated by age. Twin 16-year-olds and twins over 65 years old (N = 48 twins per group) were tested. Age-related differences were reported, which the authors suggest might stem from the vocabulary preferences of the two age groups or to changes related to ageing. Consequently, Fitzpatrick et al. (2013) caution against using normed lists such as the South Florida Association Norms (1998) to compare responses of a target population, as it fails to acknowledge the characteristics of a cohort, such as generational differences, which might influence how a group responds. A population-specific list will reflect the characteristics of those tested, and this will enable better identification of differences within and across populations.

Thus, adult studies are sub-optimal for informing stimuli selection for child studies, and so we now turn to the literature on WAs in children to explore how the methodology commonly used in adult WA studies can be adapted for children, particularly to make the WA task accessible for young children who are not old enough to read or write.

Word association studies in children

There have been far fewer WA studies conducted on children compared to adult studies. Of the more recent child studies (e.g. Comesaña et al., 2014; de La Haye et al., 2003; Macizo, 2000; Zortea & de Salles, 2012), few have tested children under 7 years of age, and few have used an oral methodology. Since the aim of this paper is to develop a resource of imageable, associated word–word pairs that can be used to explore the primacy of semantic meaning, we focus on studies with a well-documented WA methodology that can be accessed by very young children. Many of these studies, however, are much older than more recent work.

The youngest age group tested to date in a WA study seems to be 48–66 months in the WA studies conducted by Newman (1970). Using a ‘continued sentence associations’ methodology, which encourages multiword responses, and a standard WA methodology, Newman found that the former was more successful when testing children at a young age. Unlike adult, single-word WA responses, a common tendency in children of 4–5 years engaged in associative word tasks is to respond with more than one word (see Entwisle, 1964). This offers insight into how to adapt a WA methodology for even younger participants.

An area of particular interest in WA research in children is investigating the occurrence of a developmental shift referred to as the ‘syntagmatic–paradigmatic’ shift (White, 1985). As per definitions used in previous WA studies with children (Sheng et al., 2006; Wojcik & Kandhadai, 2020), a paradigmatic response in a WA task might be defined as a superordinate (e.g. catanimal), a subordinate (e.g. traincarriage), a synonym (e.g. brushcomb), an antonym (e.g. nightday), or a category coordinate (e.g. elephantdog). A syntagmatic response can be defined as a word which is able to syntactically follow or precede the cue (e.g. traintrack), or which is thematically close (e.g. bedstory).

Until 6 years of age, children’s responses to a WA task are mostly based on syntagmatic links (Brown & Berko, 1960; Entwisle et al., 1964; Ervin, 1961), but after this age, up until 11 years, children’s responses become more paradigmatic in nature (Newman, 1970).

Paradigmatic responses (e.g. insect after bee) to a WA task indicate a more developed semantic system, thus are more common in adult associated responses. It is believed that a higher level of cognitive processing is behind this type of response, which involves processes such as conceptual and lexical reorganisation (Nelson, 1977). Thus, as children develop cognitively and linguistically, it is thought that the types of WAs they produce will become more adult-like, and paradigmatic in nature. Paradigmatic knowledge helps structure semantic networks and the retrieval of semantic knowledge, which develops as a child increases their vocabulary (Sheng et al., 2006). However, according to Wojcik and Kandhadai (2020), the assumption that young children only produce syntagmatic responses (e.g. honey after bee) in a WA task is inaccurate, because taxonomic responses (e.g. horse after dog) are produced by children, but there is simply a lack of data in the WA literature testing children. In fact, in experiments testing comprehension, sensitivity to syntagmatic and paradigmatic relationships between words has been observed at 24 months (Arias-Trejo & Plunkett, 2013), with some evidence suggesting the existence of paradigmatic relations as young as 6 months (Bergelson & Aslin, 2017).

To explore the developmental trajectory of paradigmatic relations in children, Wojcik and Kandhadai (2020) conducted a WA task on 60 English-speaking 3-8-year-olds (M= 4.85, SD= 1.27). They also tested a group of adults for comparison (N= 60). A total of 65 cue words were used (nouns = 25), and eight order lists were created, 32–33 words in length. Children were grouped as ‘old’ at 6–8 years (N = 17) and ‘young’ at 3–5 years (N = 43). The authors found clear evidence of paradigmatic responses in ‘young’ children, with a higher proportion of this response type in ‘old’ children, and a higher proportion still in adults.

Much like other recent WA studies testing children, a limitation to this study is the relatively small sample tested (e.g. Cronin, 2002: N = 59; Sheng et al., 2006: N = 24; Wojcik et al., 2020: N = 60). While much larger-scale English WA studies exist in children, many of these were conducted over 50 years ago (Entwisle, 1966). One such study was conducted in 1963 by Koff (1965), who tested 8- to 12-year-olds (N = 147) on a list of 51 words to compare children’s associative responses with responses collected in one of the first child studies on WAs (Woodrow & Lowell, 1916, testing children aged 9–12, N = 1000). Koff found a significant difference in primary responses in children from 1916 to 1963, but when compared to adult responses given in 1954 (Jenkins & Russell, 1960), there was not a large difference between responses given by children and adults. This differs to Woodrow and Lowell’s (1916) finding of a large discrepancy between children and adults. Koff (1965) concluded that a cumulative effect on WAs can be attributed to changes in culture.

Taken together, it is clear that only a few studies directly elicit free associations from children under the age of 4, and that large-scale WA studies conducted on English-speaking children are already very old. Whether the associated responses of English-speaking adults and children are similar (Koff, 1965) or very different (Woodrow & Lowell, 1916) remains inconclusive.

Proposed research and rationale

The WA literature reviewed indicates that caution must be taken not to generalise findings from normative studies across different populations, as these will have their own associative norms (Nelson et al., 2004). Word associations are likely to be modulated by age (Fitzpatrick et al., 2013), and if associations stem from our experience of the world and our exposure to linguistic input, this will inevitably differ according to the stage of a child’s linguistic development. Common relationships between words in young children might be missed if relying on predetermined relations (Wojcik & Kandhadai, 2020) which do not derive from the population of interest. Due to a lack of studies documenting very young English-speaking children’s WAs, and no studies to our knowledge testing under the age of 4, it remains to be seen what some of these early word–word relationships are, and whether they mirror adult associative norms (Arias-Trejo & Plunkett, 2009), which are the source of stimulus selection in many infant studies exploring early word–word relationships.

To date, many child studies have relied on adult associative norms for their stimulus selection, yet these norms do not prioritise highly imageable word pairs, which is imperative when testing young children. Therefore, the aim of this research is to develop a task whose focus it is to document common noun–noun WAs in the lexicon at as young an age as possible (Experiment 1). Then the aim is to replicate these word–word connections through a second study (Experiment 2) and to determine whether these connections are equally strong in a receptive, semantic priming study (Experiment 3). Together this will provide evidence that these words are connected in the lexicons of young children receptively as well as productively and can therefore be reliably consulted as a stimulus resource for future studies investigating the development of lexical-semantic networks in English-speaking infants.

Experiment 1

Since few studies have collected WAs in very young children, and no study to our knowledge has tested children under the age of 4, we based our method on Newman’s (1970) WA methodology which encourages more than one attempt to respond to a cue word (see Newman, 1970, Experiment 2), acknowledges multi-word responses, uses a reduced number of cue words compared to other experiments, and has an oral mode of delivery. All of these elements likely make it a more accessible WA method when testing young children under 4, who are not yet able to read or write, and who have not yet been reliably tested on such a task to know how we might optimise the process for young children with limited language. We hypothesised that by using a methodology as outlined above, particularly one that allows for more than once response, it may allow this young age group to use repetition or rhyme as a tactic to processing the cue (Palermo & Jenkins, 1964) while learning how to respond correctly to the task; in addition, it better frames the task as a ‘word game’ (see Palermo & Jenkins, 1964, 1966; Palermo, 1971), which might help with engagement, which is a concern when testing young children.

The WA task was administered quite differently to previous studies: the at-home format (Experiment 1) saw the parent act as experimenter, whereas the online format (Experiment 2) used puppets to model the task and take on the role of experimenter. These decisions were taken to accommodate the young age of participants and to allow testing to continue during the UK national lockdown at the start of the COVID-19 global pandemic.

Method

Participants

A total of 150 participantsFootnote 1 (female = 84, male = 66) completed the study. Of those, 140 were recruited from the BabyLab database and its corresponding Facebook page, and the remainder were recruited from other Baby Labs. Participants were divided into seven 2-month age bins, i.e. 34–35 (N = 23), 36–37 (N = 22), 38–39 (N = 21), 40–41 (N = 20), 42–43 (N = 21), 44–45 (N = 23), and 46–47 (N = 20), to explore WA production across a child’s third year of lifeFootnote 2. Participants were considered ineligible for the study if known to speak more than one language, or if diagnosed with a developmental or language delay. These eligibility criteria apply to Experiments 1–3.

Materials

One hundred highly imageable, concrete nouns were selected from nine categories (e.g. animals, toys, clothes) that are known by at least 60% of 18-month-olds according to the Oxford Communicative Development Inventory (CDI; Hamilton et al., 2000) and UK CDI (UK-CDI Database, 2016). The full list of words can be found in Appendix Tables 3. Ten lists of 10 words were created, ensuring each category was represented in each list. Two pseudo-randomised orders were created for each of the 10 wordlists to avoid effects of cue order. Care was taken to avoid consecutive words being associatively related or appearing from the same category. Words sharing initial word onset were not presented consecutively.

Procedure

After we received ethical approval from the university’s ethics committee, participants meeting the inclusion criteria were contacted via the BabyLab database or Facebook page. An email invitation including a participant information sheet outlining the procedure, data handling, and a consent form were sent. Written consent was obtained from the parents. At the end of the process, a final debriefing email was sent out thanking the family for their participation in the study with a digital certificate and £5 voucher code attached. Experiments 1–3 all followed this procedure.

Next, interested families were sent an email with the task instructions and one of the 10 wordlists. On receipt of this, parents were asked to request replacement words if the words were unfamiliar to their child. We used parental report to determine a child’s comprehension of each word, in line with the procedure for administering the MacArthur-Bates CDI-III (CDI, Fenson et al., 2007, lexical component only).

Parents were instructed to follow the script (see Appendix Tables 4) as closely as possible and to elicit three responses per cue where possible. Parents were asked to use the cue word when encouraging each of the child’s three responses to a word. It was emphasised that the task should be enjoyable and that the parent should move on to the next word if their child had difficulty responding. Parents were instructed to record their child’s responses in the order they were given, in a table provided (see Appendix Tables 4). The full utterance of a response was requested, with instruction to indicate whether the child was naming objects in the immediate environment.

Parents returned the completed task by email to the experimenter. The responses were checked, and parents were contacted to provide further information about ambiguous responses, especially if seemingly random responses might have related to something in the immediate environment. Previous research on free associations in children (Palermo, 1971, 1964) has shown this to be common when young participants are unable to produce a response.

Pilot study

A pilot study was run on children between 24 and 60 months (N = 14), but 24–30-month-olds were not always successful in understanding the task, with some unable to complete it at all. This prompted a change in the minimum age from 24 months to 34 months. Due to availability of resources and a refocusing of the research aims, the upper age limit was set to 47 months to focus on WAs in the third year of life.

Results

Data processing

Data were pre-processed as follows: spelling errors were corrected; nouns were prioritised when a word belonged to multiple word classes; contextual information provided by parents was noted in brackets to assist coding; and missing responses were marked as ‘NO RESPONSE’.

Coding for response type

Different response types were identified by analysing the data collected in the pilot study, leading to a set of 10 categories: Category 0 = no response; 1 = related; 2 = unique relationship to child; 3 = connected to a previous response; 4 = related in a wider sense; 5 = repetition of cue; 6 = naming something in immediate environment; 7 = unrelated; 8 = rhyme (including clang responses); 9 = sounding out (e.g. APPLE – ‘a’ for apple); 10 = sound or action (see Appendix Tables 5 for a more detailed description with examples). Related responses were tagged as paradigmatic, syntagmatic, or both. Definitions used in previous WA studies with children (Sheng et al., 2006; Wojcik & Kandhadai, 2020) and as mentioned previouslyFootnote 3 were adopted.

Participant responses were coded by the lead researcher, with a junior researcher coding a subset (10%) of the data. Rater agreement of category coding was 91% with a Cohen’s κ of 0.62 which demonstrates substantial agreement (Landis & Koch, 1977). Paradigmatic/syntagmatic coding agreement was 93%, with a Cohen’s κ of 0.92, demonstrating near perfect agreement.

Associative strength analysis

The likelihood of a cue word producing a particular response in a WA task (e.g. cat -> dog) can be indexed using a measure of forward strength (FSG, Nelson et al., 2000). This is calculated by dividing the number of participants producing a particular response to a cue (P) by the total number in the group responding to a given cue (G): FSG = P / G.

To calculate P, the data were first grouped. For example, responses were grouped for a repeated entry, and for the plural and singular forms of a noun (see Entwisle, 1966). In multi-word utterances containing a noun, the noun was the focus (in line with the aim of this study).

The FSG was calculated for every response produced by two or more participants following the procedure used by Nelson et al. (2000). This was done to generate a proportion which could be compared to other datasets looking at FSG in WAs (Moss & Older, 1996; Nelson et al., 1998).

Descriptive statistics

A total of 4512 responses were collected from 150 3-year-olds completing the WA task. After subtracting responses categorised as ‘no response’ (i.e. Category 0, N = 908), a total of 3603 responses remained. This produced an average of 24 responses out of a possible 30 (three attempts for each of the 10 cue words, SD = 6.14). Considering first responses only, out of a possible 1500 responses (150 participants, each with 10 cue words), 1454 responses remained after subtracting ‘no responses’ (N = 46). The mean response rate was 9.69 (SD = 0.91).

We then calculated the percentage of all responses per category type (see Fig. 1). Most 3-year-olds’ responses were related (i.e. Category 1), rather than any other type of response.

Fig. 1
figure 1

Experiment 1. The percentage of WA responses (all responses) by response category: Category 0 = no response, 1 = related, 2= unique relationship to child, 3 = connected to a previous response, 4 = related in a wider sense, 5 = repetition of cue, 6 = naming something in immediate environment, 7 = unrelated, 8 = rhyme, 9 = sounding out (e.g. APPLE – ‘a’ for apple), 10 = sound or action

By organising responses into no response (Category 0) and collapsing categories representing a related response (Categories 1, 2, 4, 10) and responses which are not related (Categories 3, 5, 6, 7, 8, 9), Table 1 illustrates the distribution of all responses, as a percentage and as a raw value.

Table 1 Experiment 1. Percentage of all responses by relatedness of response type

Next, we calculated the percentage of responses per category type for first responses only (see Fig. 2).

Fig. 2
figure 2

Experiment 1. The percentage of WA responses (first responses only) by response category

By splitting the data in this way, we see a higher percentage of related responses (Category 1 = 68%) and lower instance of no responses (Category 0= 3%). Due to this observation and since not all children gave three responses to every cue word, we focus henceforth on first responses only for inferential analysis, but we have retained related responses from second and third responses in the Appendices to document exact cue–target word combinations.

Given that some participants did not provide a response for each of the 10 cue words, a proportional score of related responses was calculated for each participant. This was the number of related responses divided by the total number of responses (minus no responses). The overall mean proportion of related responses was 0.85 (SD= 0.21). We ran a type III ANOVA on participants’ proportion of related responses with gender and age bin as fixed factors. There were no significant differences between the proportion of related responses by gender and age, and no interactions between the variables (ps > 0.1).

Related first responses were categorised as paradigmatic, syntagmatic, or both. Following Wojcik et al.’s (2020) method of calculation, responses classified as paradigmatic or both were combined. A total of 25.2% of responses were paradigmatic (or both), and 74.8% of responses were syntagmatic. We ran a type III ANOVA on participants’ proportion of paradigmatic responses with gender and age bin as fixed factors. There were no significant differences between the proportion of paradigmatic responses by gender and age, and no interactions between the variables (ps > 0.1).

Associative strength

Related responses given by two or more children to each of the 100 cue words were processed to calculate their forward word association strength (FSG) (Nelson et al., 2000). Focussing on first responses onlyFootnote 4, a total of 188 responses had two or more participants producing the same response for a cue word, with 96 of the cue words represented in these responses. The full list of cue words (organised alphabetically) with two or more of the same response and their associative strengths (M = 0.20, range = 0.11 to 0.69) can be found in Appendix Tables 7.

Since one aim of this research was to look at the most common imageable noun–noun associated word pairs in 3-year-olds, we extracted noun–noun word pairs to create a stimulus resource bank (see Appendix Tables 8). Of the 188 responses shared by two or more children, 115 of these were noun–noun word pairs.

To determine whether the most common WAs in our sample of 3-year-olds are unique to this age group, we then compared the FSG from the adult literature for the same word combinations. Of the 188 related word–word combinations produced as first responses by two or more of the 150 3-year-olds in this study, 30 were not characterised in either the Birkbeck or the South Florida norms (though the cue was used); 13 were not used as a cue in the Birkbeck norms, nor documented as an associated response in the South Florida norms; two were not documented as an associated response in the Birkbeck norms, nor used as a cue in the South Florida norms; and four were not used as a cue in either study, resulting in a total of 49 word pairs found in children’s responses, without a value of associated strength in adults. This missing data correspond to a total of 26% of associated responses found in 3-year-olds that are not reflected in adult associative normsFootnote 5.

The resulting 139 word pairs which are represented in the adult data were analysed. Where there was an associative strength available in the two adult studies used for comparison (Moss & Older, 1996; Nelson et al., 1998), the mean was taken, but where only one value was available, this was taken to represent FSG in adults. The 139 word pairs can be seen in Appendix Tables 9.

A paired t-test was run to determine any difference between the associative strength between word pairs in children and adults. There was a significant difference in the FSG between age groups, t(138) = 4.58, p < .001, 95% CI [0.04, 0.10], indicating stronger associative strength between word pairs in children (M = 0.21, range = 0.11–0.69) compared to adults (M = 0.14, range = 0.01–0.76). There was a significant, weak positive correlation between the two groups, r(137) = .22, p = .01, 95% CI [0.05, 0.37]. This shows a tendency for strongly associated word pairs in adults to be strongly associated in children too.

Some of the WAs with the highest FSG in the data are not replicated in the adult literature, so while no comparison can be made statistically, these may represent novel WAs in 3-year-olds that warrant further testing. These word combinations are displayed in Appendix Tables 10.

Discussion of Experiment 1

Experiment 1 tested whether children as young as 3 years old could successfully complete a WA task and sought to compare any recurring responses in children to those found in adult norms using forward associative strength as the metric of comparison. There was strong evidence that children between 34 and 47 months can produce associated responses in a repeated free association task. In fact, 3-year-olds produced related responses for the majority of their responses (62%). This establishes that 3-year-olds can successfully complete a WA task and produce some of the same responses as their peers, rather than just idiosyncratic responses.

A large number of associated first responses were produced by two or more 3-year-olds; however, only 139 of these associatively related pairs could be found in adult associative norms. In other words, 26% of related responses given by two or more children are not found in adult norms, and this includes some of the word pairs with the strongest associative strength found in the child data. This might provide a glimpse into the shared experiences of 3-year-old children, which is represented in their lexical-semantic structure at this age. However, these findings would need to be replicated to draw any inference about the probability that a particular cue will elicit an expected associated response in a 3-year-old. This will be addressed in Experiment 2.

Most (74.8%) of the related responses given by 34–47-month-olds were syntagmatic, and there was no effect of age on the rate of paradigmatic responses in the third year of life. The tendency for 3-year-olds to produce syntagmatic responses in a language production task is in line with the idea that a shift to paradigmatic responses in a WA task occurs later, at 6 years of age.

The findings from this study suggest that adults and children converge in the likelihood that certain cue words will elicit the same associative responses; however, this is only true for some word pairs. A direct comparison is difficult to make between the associative strengths found in children and adults, as not much is known about the variables affecting WA behaviour (Fitzpatrick et al., 2013).

A potential explanation for why the associative strength between word pairs might be higher in children compared to adults is due to 3-year-olds having smaller vocabularies, and therefore, the connections that exist between words in their mental lexicons could be stronger, as they are fewer in number.

Experiment 2

Findings from Experiment 1 validated the use of a free association task on 3-year-olds when the task is administered by a parent. However, having the parent act as ‘experimenter’ inevitably calls into question the validity of the task’s administration, and indeed informal correspondence with participants indicated that there were some deviations from the delivery of the task when performed by different families in their unique home contexts. While this may not directly influence the types of responses a child gives, it warrants a replication study to confirm that when a parent administers the task at home, the types of WAs that a 3-year-old produces in this context are the same types of responses that would be given in a more controlled setting. This potential confound has prompted an adaptation of the original methodology into an online format.

The online WA task did not require the parent to act as the experimenter, but instead used pre-recorded videos of puppets to describe and demonstrate the task. A participant’s responses were recorded for off-line coding, and the more engaging format sought to retain the child’s focus. A further impetus to test online was the inability to test face-to-face due to the global pandemic.

In Experiment 2, we asked whether the WAs produced by 3-year-olds in the parent-administered version of the task could be replicated in another modality, that is, in an online format. To what extent the modality influenced the responses was addressed, as well as examining whether word pairs found in Experiment 1 re-occurred in this online modality, and whether their associative strength was replicated.

The task remained very similar in its design through its remote administration, for instance, by using the same cue words, and with 10 cue words and three responses encouraged for each cue word. However, a homogeneous delivery of the task was better achieved by controlling how the task was explained and how responses were recorded.

We adjusted the age range for Experiment 2 to 36–39 months due to restrictions on time and resources. This specific age range was chosen to maintain a focus on very young children (i.e. at the younger end of a child’s third year of life). From the 10 lists of cue words in Experiment 1, cue words eliciting the WAs with high FSG were selected to create two new lists with 10 words per list for Experiment 2.

We predicted that overall, there would be a replication of the WAs with strongest associative strength in 3-year-olds in the modified online modality. However, due to a high idiosyncratic response rate in young children (Wojcik & Kandhadai, 2020), the strength of the WAs and specific word pairings may differ for Experiment 2. If the parent acting as the ‘experimenter’ was a confounding factor in Experiment 1, then we expected a marked difference in the types of the responses produced by participants (e.g. fewer related responses). Equally, if the online modality made the task more engaging, we expected to see a reduction in the naming of objects in the immediate environment and potentially a greater proportion of related responses.

Method

Participants

Monolingual English-speaking toddlers were recruited from the BabyLab database and its social media platform pages (N = 24: 13 female, 11 male). The mean age of participants was 37.64 months. Participants were divided into two age bins, 36–37 months and 38–39 months (±15 days), with 12 children in each age bin. CDI III scores (Fenson et al., 2007, lexical component only) were collected from participants, but only approximately a third of parents completed this part of the task (N = 7, M = 79.43/99, SD = 13.62).

Materials

Stimuli

Twenty of the cue words from Experiment 1 which generated a WA with high FSG in Experiment 1 were selected and organised into two new lists for Experiment 2. List 1 comprised chair, bed, tooth, finger, key, sock, bowl, head, park, and bath. List 2 comprised table, teddy, brush, hand, door, foot, cereal, hair, swing, and towel.

Audio and video recordings

The script used by parents in Experiment 1 was adapted for use online. The task explanation and examples were delivered by two puppets, with greater exemplification (i.e. more than one example to demonstrate the task) to aid conceptual understanding of the task. Video recordings were made of the puppets explaining and demonstrating the task by two female, junior researchers, all directed and overseen by the author. Great effort was taken to make the instructional delivery engaging by using child-directed speech. In addition to the main explanatory video, short motivational clips were recorded of the puppets encouraging participation and praising a participant’s effort. Cue words were recorded auditorily by the same junior researchers and presented without the puppets on screen to minimise distractions.

Procedure

Parents indicated the day and time they would complete the online experiment, and a unique link was generated for the Gorilla Experiment Builder platform (www.gorilla.sc, Anwyl-Irvine et al., 2019), with further instructions on the procedure. Clicking on the link took the participants through a series of tasks, in the following order: study overview screen; participant eligibility questionnaire; consent form; audio and video test screen with equipment eligibility questionnaire; participant and parent/carer demographic questionnaire; word checklist; CDI III (lexis component only); debrief (see https://app.gorilla.sc/openmaterials/764752 for the full procedure). An experimenter was available for questions and troubleshooting during the time the participant attempted the task.

For the WA task, a video was played of a demonstration of the task by two puppets. The puppets gave examples of WAs (using words not in the stimulus list) with an emphasis on the need to say the first thing that came to mind as quickly as possible.

Following the puppets’ instructions, a cue word was played while an abstract, visual attention getter appeared on screen to maintain the child’s attention to the task/on screen. The cue word was presented once with on-screen instructions for the parent to support the child in producing three responses per cue word. An audio recording of the child and parent was made through the participant’s device. Due to the remote nature of testing, this procedure could not be fully controlled, and there is a chance that the parent did not use the cue word to encourage second and third responses. The result of this is the chance of chained responses. However, we included a category code to capture any instance of this (3 = connected to a previous response).

When clicking on ‘Next’ for a subsequent cue word, a video of the puppets praised the child’s attempt, and three text fields appeared for the parent to type the child’s responses in, in the order given. This feature was added in case of an error with the audio recording, or a difficulty understanding the child’s speech, and to analyse how parents record their child’s responses. Refer to Appendix Fig. 8 to see how the experiment looked for the parent and child.

On every trial, the parent was able to determine when the child was ready to progress to the next word in the list by clicking on a ‘Next’ button. This allowed for individual differences in the time needed to produce up to three related words. It was made clear to parents to move on if a child could not think of three responses or if a child became disengaged. Additionally, an ‘Exit’ button was present on every screen to end the task if the child did not want to continue. After five words had been presented in this vein, a video of the puppets demonstrated the task again with a non-cue word. The final five words were then tested. Finally, the parent completed a digitalised version of the CDI III (lexis component only)Footnote 6 before a final debrief questionnaire asking for any questions or comments relating to their experience of the task.

Piloting

Various iterations of the Gorilla experiment were trialled on junior researchers and children to ensure that the sequence of tasks was optimal and that the instructions for the parent were straightforward and unambiguous. Piloting resulted in the following modifications to the procedure: a hardware eligibility check; optimisation of audio and video for varying bandwidths; restriction of the task for use with the Google Chrome browser; and various modifications to task instructions.

Data processing and analysis

Audio responses were transcribed and compared to parental reports of their child’s responses. The rate of agreement between the audio transcription and parental report was 92%, providing sufficient evidence to use parental responses for further analysis. The 8% discrepancy in recorded responses was likely due to the audio recording not capturing all responses (i.e. a child continued talking when the recording stopped), parents not accurately recording/not remembering to record all words uttered, or parents not acknowledging all responses as valid.

Reponses were grouped and categorised (0–10, see Appendix Table 5) by two independent coders, as previously outlined in Experiment 1. The agreement between raters was ‘perfect’ with 100% agreement (Cohen’s kappa). This high level of agreement indicates that the categories were being applied consistently when different coders categorised responses.

Rater agreement for paradigmatic/syntagmatic coding was ‘almost perfect’ at 96%, κ = 0.82.

Results

Descriptive statistics

A total of 593 responses were recorded as related or unrelated out of a possible 720 responses. Remaining responses were ‘no responses’ (N = 127). Based on a participant producing up to three responses for each of the 10 cue words, an individual participant produced an average of 24.71 responses (SD = 5.80).

Considering first responses only, out of a possible 240 responses (24 participants, each with 10 cue words), 218 responses remained after subtracting ‘no responses’ (N = 22). Mean response rate was 9.01 (SD = 1.61). Figure 3 shows the percentage of first responses by response type.

Fig. 3
figure 3

Experiment 2. Percentage of first responses by response category in the online WA task

Category 1 (Related) responses were most prominent (59%), followed by Category 7 (random responses, 12%), then Category 0 (no responses, 9%).

Organising responses into ‘no responses’ (Category 0), a related response (Categories 1, 2, 4, 10) and an unrelated response (Categories 3, 5, 6, 7, 8, 9), Table 2 illustrates the distribution between the three main response types as a percentage and as raw values.

Table 2 Experiment 2. First responses by relatedness of response in the online WA task

As per Experiment 1, we calculated a proportional score of related responses (first responses only) for each participant. The overall mean proportion of related responses was 0.82 (SD = 0.19). We ran a type III ANOVA on participants’ proportion of related responses with gender and age bin as fixed factors. There were no significant differences between the proportion of related responses by gender and age, and no interactions between variables (ps > .05).

A total of 72% of first related responsesFootnote 7 were syntagmatic, and 28% were paradigmatic (or both).

We ran a type III ANOVA on participants’ proportion of paradigmatic responses for first responses with gender and age bin as fixed factors. There were no significant differences between the proportion of paradigmatic responses by gender and age, and no interactions between variables (ps > 0.1).

Taking age as a continuous variable, there was a weak negative correlation between the proportion of paradigmatic responses in first responses as age increased, though this was not significant, r(22) = −.20, p = .36 , 95% CI [−0.56, 0.22]. Together this indicates that 3-year-olds predominantly produce related responses that are syntagmatic, and this is not modulated by age (between 36 and 39 months) or gender.

Associative strength

Responses were pre-processed and organised as per Experiment 1. When the same response to a cue word was generated by two or more participants, its associative strength was calculated (Nelson et al., 2000). Considering first responses onlyFootnote 8, 25 responses were given by two or more participants with 18 of the 20 cue words represented in these word combinations. The list of first response word combinations shared by 2+ children can be found with their corresponding associative strengths (M = 0.22, range = 0.17 to 0.42) in Appendix Table 12.

The corresponding associative strength for the related responses given as first responses was then extracted from adult associative norms (Moss & Older, 1996; Nelson et al., 1998) and compared to the child data (see Appendix Table 12). Associative strength was averaged across the two adult studies where possible; otherwise, an available value from one of the studies was taken to represent the associative strength in adults overall.

Seventeen of the 25 associative pairs found in the online free association task were present in the adult associative norms. Eight of the 25 related word pairs found in children’s responses did not have a value of associated strength in adults: three associated word pairs were not characterised in either the Birkbeck or the South Florida norms (though the cue was used); four were not used as a cue in the Birkbeck norms nor documented as an associated response in the South Florida norms; and one was not used as a cue in either study. This corresponds to 32%Footnote 9 of associated responses found in 3-year-olds that is not reflected in adult associative norms.

The associative strengths of related responses in children from the eight cue–response pairs not present in adult norms (M = 0.21, range = 0.17–0.33) were compared to the associative strengths of the 17 cue–response pairs present in children and in adult norms (M = 0.23, range = 0.17–0.42). There was no significant difference in associative strengths, t(23) = −0.72, p = .48, 95% CI [−0.04, 0.08], between cue–response word pairs in children only and for pairs found in children and in adult associative norms.

The 17 word pairs which were represented in the child and adult data were analysed further. A t-test was run to determine any difference in word associative strength in children and adults. There was no difference in the associative strength between words in the two groups, t(32) = 0.87, p = .39, 95% CI [−0.05, 0.14], though the associative strength was slightly higher in children (M = 0.22, range = 0.17–0.42) than in adults (M= 0.19, range = 0.041–0.638). There was no significant correlation between the two groups, r(15) = .23, p = .38, 95% CI [−0.28, 0.64], despite a weak positive tendency. Associative strength seems to be comparable in adults and children and there is some indication that this could correlate positively: word pairs with high associative strength in adults are also strong in children.

As with Experiment 1, imageable noun–noun combinations with the highest forward associative strength were identified (N= 34) and are displayed in Appendix Table 13. These represent the strongest, imageable associated word pairs from the online WA task in 36-39-month-olds (first responses in bold, N= 9).

Comparing experimental modalities: Parental vs. online

In the following section, we compare the two experimental modalities: at home with a parent/carer as the experimenter (Experiment 1) and online, at home with a puppet as the experimenter (Experiment 2), whilst acknowledging that Experiment 2 only tests a subset of the stimulus words (N= 20) compared to the stimuli used in Experiment 1 (N= 100).

Descriptive statistics

There was no difference in response rate between the two experimental modalities, t(172) = .44, p = .66, 95% CI [0.53, 0.83], which indicates that 3-year-olds approached and responded to the WA task equally when it was performed by a parent in the home, and when demonstrated by a puppet online.

With regards to response type, the pattern of findings in the online WA task clearly mimics the findings in the parentally-administered version of the task. The online experiment replicates the finding of a large proportion of related responses to a cue word, as found when the WA task was administered in the home. This is especially true for the percentage of Category 1 first responses (online- 59%; at-home- 68%), and the overall proportion of related first responses (Experiment 1: M = 0.85, SD= 0.21; Experiment 2: M = 0.82, SD= 0.19). Category 0 first responses (online- 9%; at-home- 3%) were also proportionally comparable.

No effect of gender or age on relatedness of response was found in either modality. In both modalities, syntagmatic responses occurred more frequently than paradigmatic responses. The rate of paradigmatic responses was not modulated by age or gender.

Associative strength

Considering all related responses in Experiments 1 and 2, 38 word pairs were represented in both experimental modalities as responses given by 2+ 3-year-olds for the same cue words. Ten of the word pairs, or 26%, are not represented in adult associative norms. The full list of word pairs found in all responses of both versions of the task can be found in Appendix Table 14.

For first responses, 13 word pairs were represented in both experiments (see Appendix Table 15). One of these word pairs was not represented in adult associative norms (7.69%).

The associative strength for related word pairs (in first responses) did not differ between Experiments 1 and 2, t(11) = 0.02, p = .98, 95% CI [−0.07, 0.07], with the average associative strength in the online version (M= 0.24, range = 0.17–0.42) equal to that in Experiment 1 (M = 0.24, range = 0.12–0.40). Word pairs are associated to an equal degree when the task is administered by a parent at home, or when done online.

Discussion of Experiment 2

Experiment 2 clearly demonstrates that conducting a WA task online with 3-year-olds is a feasible and valid way to deliver this task, with evidence that it generates the same proportion and type of responses as when administered by a parent, in a home setting. There was no effect of age, which is likely because the age range is too narrow to observe a solid effect, as in Experiment 1.

Rate of response was comparable in Experiments 1 and 2, but also the type of response, with syntagmatic responses favoured in both versions of the task. Parental report of the WAs produced by their children was accurate 92% of the time, suggesting that it is an objective and reliable way to record the responses to a free association task in children, making it a comparable modality to the at-home version of the task.

In terms of the exact associated responses generated to the cue words by two or more children, we saw a replication of 38 word pairs from Experiment 1 (total = 432 pairs) and Experiment 2 (total = 72 pairs), when counting all responses given. For first responses only, 13 word pairs appeared in both experiments. There was no difference in the associative strength of these 13 word pairs when the experiment was done with a parent or when done online. The fact that so many word pairs were found in both experiments suggests that these might be particularly robust and thus more reliable for use in experiments investigating development of the lexical-semantic system. To investigate this claim, Experiment 3 will test these WAs in a priming experiment with a new sample of children.

Experiment 3

To test the strength of association in the unique child WAs found in Experiments 1 and 2, Experiment 3 employs a receptive task. An online adaptation of the primed intermodal-preferential looking (IPL- see Arias-Trejo & Plunkett, 2009; Jardak & Byers-Heinlein, 2019; Styles & Plunkett, 2009) paradigm was developed for this purpose, after first validating an online word recognition IPL task (Nguyen, Fitzpatrick, & Floccia, 2024). Experiment 3 compared the magnitude of a semantic priming effect between child-specific associations, adult-specific associations, and associations found in both adults and children. Based on the findings in Experiments 1 and 2, it was hypothesised that adult WAs not represented in the child WA data may not show any semantic priming effect, or the effect may be smaller in magnitude compared to the word pairs found in children’s associations. In contrast, child-specific associations and those represented in both child and adult WA data were expected to show a consistent priming effect.

A stronger effect of priming in child-specific word pairs might indicate stronger receptive knowledge of these than productive knowledge (as measured in the WA task) or simply that a child’s attention will be maintained for longer for the unique child WAs since their experience of the world at the age of three is represented in these word pairings.

Method

Power analysis and sample size calculations

A power analysis calculation was performed using an effect size extrapolated from Jardak and Byers-Heinlein (2019). The effect size showed that a sample size of 39 participants would be sufficient with 80% powerFootnote 10.

Participants

Forty 3-year-old healthy, English monolinguals were tested (19 girls, 21 boys). The average age of participants was 37 months 3 days (range = 35 months 3 days to 39 months 6 days). Productive vocabulary size was measured using the word list component of the MacArthur-Bates CDI III (Dale et al., 1998). The mean vocabulary score was 85/100. A further four participants were tested but excluded due to technical issues during testing.

Materials

Forty-eight common, highly imageable nouns were selected which are in the productive vocabularies of 3-year-olds (as demonstrated in Experiments 1 and 2). Nouns were selected either from the noun–noun WAs produced by 3-year-olds in Experiment 1 and/or Experiment 2 which had high FSG, or from the noun–noun WAs documented as having a high FSG in adults (Moss & Older, 1996; Nelson et al., 2004) and which have been selected for use in infant studies exploring semantic development (see Appendix Table 16 for the specific studies consulted). This resulted in three prime-target conditions: (i) unique child associations documented in the WAs of 3-year-olds (Experiments 1 and 2), (ii) validated adult associations (i.e. word pairs documented in both the adults’ WAs and the WAs of 3-year-olds), (iii) unvalidated adult associations (i.e. only found in the adult data, not in 3-year-olds’ associated responses). There were four trials per condition and 12 control/unrelated trials. Word pairs in unrelated trials had no attested associative or taxonomic relation, nor did distractor/target pairings in all trial types. Word pairs did not share phonological onset/rhyme. The full list of stimuli can be found in Appendix Table 16.

Twenty-four photographs of real objects were chosen to act as visual stimuli. Each visual stimulus was cut out of its background and presented centrally on a 50% grey background. The 24 images were seen twice by each participant: once as the target, and once as a distractor, appearing in different blocks to avoid an effect of repetition. The presentation side of the target was counterbalanced across participants. Each prime/ target word was individually recorded as auditory stimuli by a female speaker with a neutral British south-west accent, in a child-directed manner. Three neutral carrier phrases, i.e. ‘I want a/an…’, ‘I have a/an…’, ‘I saw a/an…’, were recorded in the same manner. The carrier phrase and prime word were concatenated into a single audio file for each trial. The target words were presented in isolation. Auditory and visual stimuli were presented using the experimental platform, Gorilla Experiment Builder. Four list orders were created to counterbalance presentation side of the target image. Block order was also counterbalanced. No 3-year-old saw more than two consecutive trials from the same relatedness condition.

Procedure

An information sheet about the study was emailed along with instructions for the study and a unique link to the Gorilla Experiment Builder website. A time was arranged for the parent to access the link when a researcher was available by email for questions or assistance.

The procedure replicated a previous asynchronous online experimental design (see Experiment 1, Nyugen et al., 2024: https://app.gorilla.sc/openmaterials/626885) in terms of pre-testing components, which included: eligibility checks, consent, collection of participant and demographic information, and instructions on how to position the child and how to run the experiment. The testing itself was procedurally different and is explained below.

Each trial began with a smiley fixation point in the centre of the screen for 1000 ms to focus the child’s attention to the middle of the screen. This was replaced by a blank screen and the carrier phrase embedded with a prime word (e.g. ‘I saw a… cat’) played auditorily. An inter-stimulus interval (ISI) of 200 ms was then followed by the target word (e.g. ‘dog’) and a stimulus onset asynchrony (SOA) of 400 ms (see Jardak & Byers-Heinlein, 2019) at which point two images appeared: one on the left-hand side of the screen, and one on the right. One of the images was a referent to the target word, and one was a distractor image. Both images remained on screen for a further 2600 ms. After 12 trials, a short animation was played to maintain the child’s interest. The second block of 12 trials then followed automatically. The experiment ended with a short animation. The parent could exit the task at any point by clicking on the ‘Exit’ button.

Parents completed a word checklist for the experimental words to test that the child was familiar with them, as well as completing the vocabulary component of the CDI III at the end of the procedure.

Results

Data processing and analysis

Using university-developed bespoke software, webcam recordings of individual calibration and experimental trials were uploaded and automatically split into 50 ms frames. Calibration recordings were checked first to understand the looking behaviour of an individual (e.g. subtle/obvious saccades, the orientation of the screen in relation to the child’s position), and to validate that looks were being made to the side of target image presentation.

Each video of a trial was played in full, with audio, before analysis began. Since there was no recording of the visual stimuli in the video, hearing the audio did not influence manual coding of the eye gaze as the target location was unknown. This pre-analysis step served two purposes. First, it enabled us to check that the target word had been presented, with no significant delay in the Gorilla command to begin webcam recording. A second reason was to understand a participant’s looking pattern and head movement, to help when coding for left/right looks.

For experimental trials, the primary coder manually marked for each 50 ms frame if a child was looking left, right, on-screen but at an indeterminate location (which also accounts for saccades across the screen), or off-screen, using four keys on the keyboard. The coding was automatically saved in a .csv file which was later imported into R for analysis. A second coder coded a 10% subset of the data to test for rater reliability. Inter-rater reliability agreement between coders was 91% with a Cohen’s kappa κ of 0.80, indicating substantial agreement.

Trials were excluded if (i) a participant failed to look at the screen for a minimum time of 750 ms (or 15 frames, each measuring 50 ms) as per Jardak and Byers-Heinlein (2019) on each trial; (ii) the length of a given trial was under 2500 ms, as this signified that a technical error must have occurred; (iii) if a parent had marked either the prime word or target word as unknown to the child. Trials with webcam recordings without audio were excluded if the parent could not verify that sound had been played during the experiment. A participant was excluded if fewer than 50% of related and unrelated trials were available for analysis after excluding individual trials based on the above criteria. Analyses were completed in RStudio (1.4.1717 R Core Team, 2021), using R tidyverse (Wickham et al., 2019), and dplyr (Wickham et al., 2023) packages.

Descriptive statistics

Out of a possible 960 trials (a maximum of 24 trials for each of the 40 participants), a total of 920 trials were included for analysis. Reasons for exclusion were due to insufficient trial length (11 trials or 1% of trials); inattentiveness (<750 ms spent looking at the screen per trial: 11 trials or 1% of trials); prime or target word unknown to child (8 trials or 1% of trials); technical error (10 trials or 1% of all trials). No participants had to be replaced due to not meeting the minimum threshold number of trials, per condition.

The average number of valid trials per participant was 23 (SD = 1.99). This high number indicates children were very engaged in an online looking task when administered in the home. There was no effect of gender on response rate, t(38) = .96, p = .35, 95% CI [−0.67, 1.88]. Out of the four trial types, participants completed an average of 3.85/4 (SD = 0.59) trials for unique child word pairs, 3.8/4 (SD = 0.69) trials for validated adult word pairs, 3.75/4 (SD = 0.59) trials for unvalidated adult associations, and 11.65/12 (SD = 0.86) trials for unrelated word pairs.

Proportion of looking time to the target

The window of analysis was set at 200–2000 ms which coincides with visual stimulus onset, an allowance of 200 ms for an initial saccade, and a free-looking period of 1800 msFootnote 11. The proportion of looking time (PLT) towards the target visual stimulus, relative to the distractor stimulus, was calculated as the dependent variable for each trial as: PLT to target/(PLT to target+PLT to distractor).

A two-tailed, paired t-test was run on related and unrelated trials, showing that 3-year-olds looked significantly longer on related trials (M = 0.51, SD = 0.07) than on unrelated trials (M = 0.48, SD = 0.07), t(39) = 2.39, p = .02, d = .38, 95% CI [0.01, 0.06] (see Fig. 4).

Fig. 4
figure 4

Experiment 3. Proportion of looking to a target visual stimulus on semantically related (red) and unrelated (blue) trials in an online semantic priming study on 3-year-olds (white square = mean in each condition)

A follow-up, one-sample t-test was performed to investigate whether looking was above chance (0.5) on related and unrelated trials. Comparisons to chance (0.5) with PLT indicated that 3-year-olds did not look significantly above chance in related, t(39) = 1.28, p = .10, 95% CI [0.50, Inf], or unrelated trials, t(39) = −1.76, p = .96, 95% CI [0.46, Inf].

In sum, the mean looking patterns of 3-year-olds indicated some sensitivity to the different relationship between words, demonstrated by a target preference when trials were related. However, there was no evidence of target recognition which is usually indexed by above-chance looking. The target not being recognised in unrelated trials replicated previous lab-based studies (e.g. Arias-Trejo & Plunkett, 2009; Styles & Plunkett, 2009), but the lack of target recognition on related trials was unexpected.

Association type

To examine the effect of association type (unique child, unique adult, adult and child, and unrelated), a one-way, repeated measures ANOVA was run on PLT with association type as a fixed factor. The PLT was statistically different for association type, F(2.57, 100.1) = 13.13, p <. 0001, generalized η2 = .19.

Planned pairwise comparisons were performed with a Bonferroni adjustment to identify the locus of the difference. Post hoc analyses revealed that the PLT to the target for child-specific associations (M = 0.59, SD = 0.12) differed significantly to adult-specific associations (M = 0.45, SD = 0.12; p < 0.0001), 95% CI [−0.21, −0.08]; to adult-child associations (M = 0.50, SD = 0.13; p = .003), 95% CI [−0.16, −0.02]; and to control trials (M = 0.48, SD = 0.07; p < .0001), 95% CI [0.04, 0.17]. Other pairwise comparisons were not statistically significant. These data are visualised in Fig. 5.

Fig. 5
figure 5

Experiment 3. Proportion of looking time to the target by word association type in 36–39-month-olds doing an online semantic priming task (white square = mean of each condition)

Comparisons to chance (0.5) with PLT indicated that 3-year-olds looked significantly above chance in trials with child-specific associations (t(39) = 4.82, p < .0001), but not in trials with adult–child associations (t(39) = −0.01, p = .5), adult-specific associations (t(39) = −2.96, p = .1), or unrelated trials (t(39) = −1.76, p = .96). Together this shows that children looked longer at the target when the prime-target word pair had been generated in the WA task (see Experiments 1 and 2), compared to other WA types tested here. The lack of above-chance looking for adult or adult–child associations and unrelated word pairs suggests that no target recognition was indexed.

A correlation between CDI scores and priming difference scores, which were calculated by subtracting the PLT on unrelated trials from the PLT on related trials per child, as per Jardak and Byers-Heinlein (2019), showed no relation between productive vocabulary size and priming, r(37) = .06, p = .7.

Paradigmatic/syntagmatic analysis

We re-coded related word pairs as paradigmatic/both or syntagmatic (according to the definitions used in Experiment 1—see the Coding for Response Type section), rather than using our original unique child, unique adult, adult and child, related response types. Re-coding was done from a child’s perspective (i.e. whether the association is documented in the responses to Experiments 1 and 2 in this paper) rather than from an adult’s perspective and based on adult norms. For example, bootspuddle was coded as syntagmatic, whereas looking at adult norms to guide coding, this would not have appeared as associatively related. The mean PLT per paradigmatic/syntagmatic association type is visualised in Fig. 6.

Fig. 6
figure 6

Experiment 3. Proportion of looking time to the target by paradigmatic, syntagmatic, or unrelated association type in 36–39-month-olds doing an online semantic priming task

To examine the effect of paradigmatic/syntagmatic association type, a one-way, repeated-measures ANOVA was run on PLT with paradigmatic/syntagmatic association type as a fixed factor. The PLT was statistically different for paradigmatic/syntagmatic association type,

F(1.72, 67.16) = 18.03, p < . 0001 , generalized η2 = .23.

Planned pairwise comparisons were performed to identify the locus of the difference. Post hoc analyses revealed that the PLT to the target for syntagmatic associations (M = 0.57, SD = 0.10) differed significantly to paradigmatic associations (M = 0.46, SD = 0.10; p < .0001), 95% CI [−0.17, −0.07], and to unrelated word pairs (M = 0.48, SD = 0.06; p < .0001), 95% CI [0.04, 0.14]. A pairwise comparison of paradigmatic and unrelated trials was not significantly different (p = .60), 95% CI [−0.08, 0.03].

Time-course analysis

Looking behaviour over time was interrogated using a time-course analysis to understand where 3-year-olds looked throughout the 1800 ms looking period. The PLT to the target for related and unrelated trials was averaged across participants for each 50 ms time bin and plotted using the R package eyetrackingR (Forbes et al., 2021; see Fig. 7). Visual inspection suggests that the curves start to diverge at approximately 125 ms.

Fig. 7
figure 7

Experiment 3. Time-course of looking behaviour in 36–39-month-olds for semantically related and unrelated trials with the significant divergence in behaviour indicated by a boxed area

To determine where any difference in looking behaviour occurred on related and unrelated trials during the time-course of word recognition, a non-parametric statistical cluster analysis was performed (see Maris & Oostenveld, 2007), which has been successfully employed by various studies investigating preferential looking (Floccia et al., 2020; Von Holzen et al., 2019; Von Holzen & Mani, 2012). Paired t tests were run for each time bin, followed by identifying clusters with significant t vales and comparing these to a Monte Carlo distribution. Comparisons using the time-course analysis revealed a significant difference in looking behaviour between 450 and 850 ms post visual stimulus onset (cluster t statistics = 27.99, Monte Carlo p < .001) between related and unrelated trials, with the unrelated condition showing reduced looking in this period compared to related trials. This area is marked by a box in Fig. 7. This analysis suggests that the priming effect, as indexed by the difference in PLT in the related and unrelated conditions, occurs at around 450 ms after target onset.

Discussion of Experiment 3

The main aim of Experiment 3 was to ascertain whether the unique child WAs found in Experiments 1 and 2 would demonstrate a measurable difference in a receptive semantic priming task. To explore this, we compared PLT for each WA type: unique child, unique adult, child and adult, and unrelated. The results clearly demonstrated that the priming effect was modulated by WA type. Related word pairs with the highest PLT were those taken from the productive vocabularies of 3-year-olds, tested in a WA task (Experiments 1 and 2). This WA type was the only of the four types tested with an above-chance probability of looks towards the target image. The finding that PLT for child-specific WAs differed significantly to the two other WA types (adult-specific, child and adult) suggests that an effect of semantic priming only occurred in the combined related data due to the associative boost provided by the child-specific WAs. However, the absence of above-chance looking when all three related WA types were combined might suggest that an online modality is not sensitive enough to capture general priming effects, particularly for WAs not robust in a child’s lexical-semantic system (i.e. some of those stemming from adult associative norms).

After performing a time course analysis on looking behaviour, we found a significant difference between PLT on related and unrelated trials. This indicates that children spent longer looking at the target image on related trials. Visual inspection reveals that looking times to the target raise to above 60% in the related trials, while they remain at 50% for the unrelated ones. We observed that children made saccades to the target stimulus before 200 ms. On average, first looks were slightly above chance for related trials but not for unrelated trials. We found a significant finding between 450 and 850 ms, where 3-year-olds looked above chance at the target more on related trials, than unrelated trials. Thus, while an effect of above-chance looking was absent when analysing the averaged PLT per trial type, the pattern of findings from this time course analysis suggests that children did recognize the target on related trials.

As hypothesised, WAs not found in the productive vocabularies of 3-year-olds, but prominent in the associated responses of adults performing a WA task, did not show a strong effect of priming in this experiment. This deserves attention, as many studies exploring the primacy of connections in the lexical-semantic system of infants have relied on associative norms from the adult literature to drive decisions regarding experimental stimuli for their studies. Studies which might not have seen a priming effect could be a result of stimuli selected, with the assumption that a WA in the adult lexical system is equivalently robust in the infant system. In experiments that did find a priming effect, further analysis on the stimuli selected could help inform other researchers on the best word pairs to select for infant studies.

A finding that we did not expect to see was the lack of a priming effect in child–adult associations, that is, word pairs documented in our own findings of Experiments 1 and 2 (for 3-year-olds) and in adult associative norms (Moss & Older, 1996; Nelson et al., 2004). One explanation for no semantic priming in child and adult WAs may be the syntagmatic nature of the child-specific WAs compared to the more paradigmatic child and adult WAs. The most reliable effect of semantic priming has been found in words both taxonomically and associatively related (e.g. chairtable) due to the associative relatedness providing a ‘priming boost’ (infants: Arias-Trejo & Plunkett, 2009; adults: McRae & Boisvert, 1998; Perea & Rosa, 2002). While evidence exists to show that pure taxonomic relationships can evidence a priming effect in young children (Arias-Trejo & Plunkett, 2013), this was in an in-lab testing context, while our Experiment 3 was online.

We interrogated the potential syntagmatic/paradigmatic explanation by re-coding prime-target pairs as paradigmatic or syntagmatic and re-analysing the data. We found a significant difference between the PLT for syntagmatically associated word pairs compared to paradigmatic or unrelated pairs. This presents a confound between child-specific WAs and a syntagmatic advantage. It could be that the child-specific WAs showed better priming because they are syntagmatic, but the fact that they have the strongest FSG in the data is also certainly because they are syntagmatic. This confound can potentially never be solved since most child-specific associations are syntagmatic.

Taken together, Experiment 3 replicates in-lab findings in as far as a semantic priming effect was measured, but the lack of above-chance looking on (combined) related trials requires further investigation to determine whether the finding was unique to this experiment, or whether it more broadly represents an issue with the sensitivity of an online priming procedure.

General discussion

In three experiments, we tested the strength and types of word–word relationships in English-speaking children as young as 3 years old. Experiment 1 used a WA methodology administered by the parent in the home setting, and Experiment 2 replicated the method in an online format. Responses given by 3-year-olds were compared to the responses found in adult associative norms. Experiment 3 tested how the WA responses given by children, adults, or both groups indexed a semantic priming effect to determine whether some word–word relationships are more consolidated in a 3-year-old’s lexical semantic system.

Conducting a WA task with 3-year-olds at home and online generated the same proportion and type of responses for a subset of stimulus words. This is in line with our hypothesis and indicates that the parent administering the task did not confound the findings. In fact, our attempt to increase the engagement of the task by using puppets to demonstrate the task rather than a parent did not result in improved performance either. We think that this might be due to the parent’s continued involvement even when the task was done online. The parent dictated the pace of the task, was responsible for recording the child’s responses, and was also instructed to encourage second and third attempts at the task for each of the 10 cue words. Thus, the parent was an intrinsic part of the process in both modalities and perhaps was the key contributor to engagement levels and supporting associated response types.

The fact that many WAs in childrenFootnote 12 were not found in adult norms might be indicative of the transitory nature of the immature lexical-semantic system. Some adult associations might not form in infancy; instead, these findings suggest that there are unique WAs at 3 years of age which may be replaced by other, more adult-like associations, with increased age and life experience. This could occur in parallel to a subset of word pairs, shown to exist in both children and adults, though the strength of these associations differs.

For example, in a semantic priming study on children, Arias-Trejo & Plunkett (2009) demonstrated that associative relatedness can provide a ‘priming boost’ for word pairs which are taxonomically related. The authors defined associative word pairs as those taken from adult word association norms (Kiss, 1975; Moss & Older, 1996) without categorical relatedness. Taxonomically related word pairs were defined as objects with the same superordinate term (e.g. clothes, sock–pants) without associative relatedness. Thus, when considering the primacy of word–word relationships in the emerging lexical-semantic system, associative links might support the structuring of more complex, taxonomic connections and explain why they are more prevalent in the associated responses of 3-year-olds. Associative links that exist in memory may arise due to a child’s early experience of a conjunction of events: experience of the real world (e.g. playing with toys in the bath) and their exposure to recurring words that are uttered during those moments. Therefore, the links between toys and bath, for example, might be of two kinds: links between visual representation and lexical forms. In contrast, taxonomic links may emerge from a re-representation of meaning within an existing lexicon, based solely on abstract knowledge. This might suggest why some WA studies on children note a ‘syntagmatic-paradigmatic’ shift (White, 1985), evidencing a change in children’s responses to a WA task as they age. Our findings clearly indicate that WAs produced by 3-year-olds were more syntagmatic in nature, and when these were tested in a priming task, the word pairs with a syntagmatic relationship indexed a larger priming effect than words with a paradigmatic relationship.

According to Fitzpatrick et al. (2013), referencing WAs that have not been taken from the target population might not acknowledge the unique characteristics of the population of interest. This might be true of the WAs found in the children of this study and missing from the adult literature. Therefore, one must be cautious when interpreting the WAs found in adult norms, as the absence of a WA in adult associative norms is not necessarily a reliable indicator of its absence in the developing lexical-semantic system.

The absence of some of the strongest child WAs in the associative norms of adults is of relevance to the wider research field. Studies designed to investigate semantic development in infants rely on the WAs documented in adult norms when selecting appropriate stimuli (i.e. prime and target word pairs). For example, the word pair ‘teddy–bed’ from the WAs found in 3-year-olds is not present in adult norms. This word pair intuitively constitutes a strong association in the mind of a child, though relying on adult norms would not capture it as a suitable pair for use in an experiment. This example serves to highlight the importance of considering the most child-appropriate word pairs for use in experiments investigating the emergence of semantic meaning in infancy.

Limitations

One limitation of this research is the fact that we did not do a direct comparison of syntagmatic adult associations and syntagmatic child associations. This is something we hope to explore in future work. Due to the difficulty in directly comparing syntagmatic and paradigmatic WAs in children, because the children in this study did not produce many of the latter, we might look to the adult data or studies on older children to explore this further.

Conclusion

The sample of 3-year-olds tested in this study clearly share some of the WAs found in adult associative norms, but have their own, more child-specific associations, which can be stronger than word pairs in the adult literature. These child-specific word pairs are predominantly syntagmatic, and they index a larger semantic priming effect compared to paradigmatic word pairs.

This suggests a more reliable source of WAs for use in semantic priming studies needs to come from the WAs documented in children rather than adults, and ideally in children as close in age to the population being tested. The Appendices attached to this paper provide a resource of associatively related word pairs which reflect the associated responses to cue words produced by two or more 3-year-olds engaged in a free association task. Many of these word pairs comprise imageable noun–noun combinations which can be consulted for stimuli selection when designing studies investigating semantic development in young children. These word pairs reflect language production, and since production succeeds language comprehension, which is what studies investigating semantic development typically test, it is the closest we might get to knowing the precise WAs children form as their lexical-semantic system undergoes development.

Appendices

See Tables 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 and Fig. 8

Table 3 Experiment 1. The percentage of 18-month-olds knowing the words used as cues in the word association task
Table 4 Experiment 1. Word association task instructions and script for parents
Table 5 Experiment 1. Categories for coding participant responses
Table 6 Experiment 1. All related responses (first, second, and third attempts) produced by 2+ children in the parentally administered WA task
Table 7 Experiment 1. First responses produced by 2+ children in the parentally administered WA task (ordered alphabetically by cue word)
Table 8 Experiment 1. First responses (nouns) produced by 2+ children in the parentally administered WA task
Table 9 Experiment 1. Related responses given by 2+ children (as first responses) in the parentally administered WA task and represented in adult associative norms
Table 10 Experiment 1. Related responses given by 2+ children (as first responses) in the parentally-administered WA task, and not represented in adult associative norms (n/d = not documented, n/c = not used as a cue)
Table 11 Experiment 2. All related responses (first, second and third attempts) produced by 2+ children in the online WA task
Table 12 Experiment 2. Associative strength for word pairs from online child data (first responses) and represented in adult norms
Table 13 Experiment 2. Imageable noun-noun, cue-response word pairs in the online WA task (first responses in bold)
Table 14 Word associations replicated in Experiments 1 and 2 from all responses
Table 15 Word associations replicated in Experiments 1 and 2 as first responses
Table 16 Experiment 3. Stimuli list for the online semantic priming study on 3-year-olds
Fig. 8
figure 8

Experiment 2. Screenshot of the online WA task