The ability to create temporary binding representations of information from different sources in working memory has recently been found to relate to the development of monolingual word recognition in children. The current study explored this possible relationship in an adult word-learning context. We assessed whether the relationship between cross-modal working memory binding and lexical development would be observed in the learning of associations between unfamiliar spoken words and their semantic referents, and whether it would vary across experimental conditions in first- and second-language word learning. A group of English monolinguals were recruited to learn 24 spoken disyllable Mandarin Chinese words in association with either familiar or novel objects as semantic referents. They also took a working memory task in which their ability to temporarily bind auditory-verbal and visual information was measured. Participants’ performance on this task was uniquely linked to their learning and retention of words for both novel objects and for familiar objects. This suggests that, at least for spoken language, cross-modal working memory binding might play a similar role in second language-like (i.e., learning new words for familiar objects) and in more native-like situations (i.e., learning new words for novel objects). Our findings provide new evidence for the role of cross-modal working memory binding in L1 word learning and further indicate that early stages of picture-based word learning in L2 might rely on similar cognitive processes as in L1.
Component processes central to the early stages of word learning include the mapping of pronunciations (phonological labels) to their printed words (orthographic forms) and semantic referents (e.g., visual objects for concrete nouns). This may initially involve the processing and temporary storage of phonological and visuospatial information, both individually and in combination, within working memory. Previous work has already examined in detail the relationship between word learning and temporary storage and processing of information within isolated domains (i.e., visuospatial or verbal; e.g., Baddeley, Gathercole, & Papagno, 1998; Gathercole, Willis, Emslie, & Baddeley, 1992; Majerus, Poncelet, Greffe, & van der Linden, 2006; Wang & Gathercole, 2013). However, few studies have investigated how the integration within working memory of information drawn across modalities might relate to word learning. A recent study based on written language attempted to address this issue, finding that children’s capacity to temporarily bind novel auditory-verbal and visual information in working memory is linked to the development of their ability to form a long-term orthography–phonology association in their native language (Wang, Allen, Lee, & Hsieh, 2015). What is as yet unknown is whether the mapping of spoken words to their semantic referents (e.g., objects for concrete nouns) during initial stages of word learning is also reliant on the ability to temporarily bind information across modalities in working memory in first (L1) language learning contexts, and, if so, whether it is also true in second (L2) language learning conditions. The aim of the current study is hence to extend the previous finding based on L1 written language to the domain of spoken language in an experimental simulation of L1 versus L2 new word learning in a young adult sample.
Working memory refers to a system that provides temporary maintenance of information necessary to support complex cognitive processing. A number of theoretical approaches have been developed in an attempt to capture working memory structure and function, and how it might relate to wider cognitive processes (e.g., Barrouillet, Bernardin, & Camos, 2004; Cowan, 1999; Unsworth & Engle, 2007). One of the most influential approaches is that advanced by Baddeley and colleagues (Baddeley, 1986, 1996, 2012; Baddeley & Hitch, 1974), with a multicomponent model of working memory that includes separable phonological and visuospatial short-term stores and a central executive control process, along with the more recent addition of the episodic buffer, a modality-general store capable of integrating information drawn from different sources in the environment and from long-term memory (Baddeley, 2000; Baddeley, Allen, & Hitch, 2011). A common assumption across different approaches to working memory is that capacity is assumed to be extremely limited, with recently suggested estimates of around three to five items or chunks of information (e.g., Cowan, 2010). Basic temporary storage of verbal or visuospatial information is typically measured by simple span tasks that primarily require information retention, whereas complex span tasks are designed to capture the simultaneous storage and processing of information. Using such measures, close links between working memory capacity and word acquisition have been identified (e.g., Baddeley et al., 1998; Gathercole et al., 1992; Majerus et al., 2006; Wang & Gathercole, 2013).
This earlier work has typically focused on temporary storage and processing of information within isolated domains (i.e., visuospatial or verbal), with little exploration of how the integration and binding within working memory of information drawn from different modalities might relate to word acquisition. Such exploration is potentially useful, given that learning mapping among phonology (i.e., pronunciations), orthography (i.e., printed words), and semantics (e.g., visual objects for concrete nouns) is an important process of early stage word learning (Seidenberg & McClelland, 1989). Indeed, research shows that the ability to learn arbitrary pairings between visual stimuli and phonological labels is a strong correlate of single word reading ability (e.g., Blomert, 2011; Hulme, Goetz, Gooch, Adams, & Snowling, 2007). However, it remains to be understood what might determine successful association formation between information from different modalities in long-term memory. Recent developments exploring temporary binding in working memory may provide insights into how temporary binding ability involved in initial stages of word learning might be linked to successful word acquisition (Jones, Branigan, Parra, & Logie, 2013; Wang et al., 2015). In the working memory literature, binding refers to the integration of individual features to form a bound representation, typically following a single exposure (e.g., Allen, 2015; Brockmole & Franconeri, 2009).
One potentially useful theoretical framework to address this issue is that proposed by Baddeley and colleagues (Baddeley, 1986, 1996, 2012; Baddeley & Hitch, 1974), which specifies distinctions between visuospatial and phonological modalities and the means to integrate such information within a modality-general episodic buffer (see also Barrouillet & Camos, 2014). This latter component is assumed to comprise a storage capacity based on a multidimensional code, which can be used to integrate information from specialized phonological and visuospatial subsystems, and to interface with long-term memory. This buffer may serve as a storage and modeling space that is informed by but separable from the specialized subsystems and long-term memory (Allen, Havelka, Falcon, Evans, & Darling, 2015; Langerock, Vergauwe, & Barrouillet, 2014), and may form an important stage in long-term episodic learning (Baddeley, 2003). The possible functioning of this proposed component has been intensively investigated by researchers using a range of tasks requiring memory judgments concerning combinations of features within domains (Allen, Baddeley, & Hitch, 2006, 2014; Brown & Brockmole, 2010; Hu, Hitch, Baddeley, Zhang, & Allen, 2014), between verbal and spatial domains (Elsley & Parmentier, 2009; Langerock et al., 2014; Morey, 2009), and across modalities (Allen, Hitch, & Baddeley, 2009). Outcomes to date indicate that conjunctive or intrinsic binding (e.g., of shape and color within an object) may be relatively low level and perceptual in nature, possibly accomplished by specialized visuospatial processing before being consciously retained within the episodic buffer. In contrast, relational or extrinsic binding of materials from different domains or modalities (as examined in the current study) may particularly require the episodic buffer for their formation and retention (see Allen, 2015, for a review; also, see Ecker, Maybery, & Zimmer, 2013; Parra et al., 2013), as implied by Baddeley’s (2000) original proposal.
It would therefore be tempting to suggest that the episodic buffer concept may serve a useful purpose in understanding the possible links between temporary binding and word acquisition through its proposed position at the interface of phonological processing, visuospatial processing, and long-term memory. Specifically, within this framework, verbal and visual information is assumed to be initially processed within specialized phonological and visuospatial subcomponents. However, memory for associations between information from these different subcomponents requires binding within the episodic buffer to form a unified representation. This proposed subcomponent within working memory may act as an intermediate store between initial maintenance and long-term storage, thus forming an important stage in the establishment of long-term representation of cross-modal associations. This perspective therefore suggests that the episodic buffer, or an equivalent modality-general aspect described in other theoretical approaches to working memory—for example, Cowan’s (1995, 2005) focus of attention—may play a role in the formation of short-term and longer term phonological–semantic associations.
In order to start addressing this possible link, Wang et al. (2015) recently examined the performance of children ages 8 and 9 years on a task designed to measure their ability to bind information across auditory-verbal and visual modalities in working memory. This task assessed immediate memory for pairs of visually presented abstract shapes and auditorily presented nonwords, with children required to identify the original shape–nonword combinations they had encountered. Results showed that children’s performance on this task significantly correlated with their single word recognition abilities even when chronological age, nonverbal ability, memory for individual features that constitute the binding task, and other reading-related factors were taken into account. These findings therefore suggest a unique link between children’s working memory binding skills and their capacity to form long-term orthography–phonology associations.
The present study attempted to add to our understanding within this area by assessing the link between working memory binding ability and the mapping of phonological labels to their semantic referents (i.e., visual objects for concrete nouns) in an experimental simulation of L1 new word learning in young adults. In order to tap into the fundamental mental processes involved in L1 word learning, we used a task consisting of pairings of an auditorily presented novel word form (i.e., Mandarin Chinese) and a novel visual object depicting its referent. The use of novel objects, that is, objects that participants have no experience with in daily life, could ensure the to-be-learned words would not be linked to already existing semantic referents in one’s native language (Gupta, 2003). The use of Mandarin words that are phonologically unfamiliar to our participants also help eliminate possible effects of long-term phonological knowledge on word-learning results.
The second aim of the current study was to investigate whether the possible role of cross-modal working memory binding in formation of short-term and longer term phonological–semantic association would similarly hold in second-language learning. To investigate this issue, the present study also included a condition requiring learning of associations between novel words and familiar objects. Contrary to the novel-word–novel-object condition, learning novel phonological words for familiar objects would likely involve linking these novel labels to preexisting word–semantic relations (Gupta, 2003). This resembles typical situations within second-language learning, in which new labels (phonological word forms) are mapped onto existing concepts or percepts of objects and actions. Such distinctions between familiar versus novel objects have significant implications for the learning of a second language, as demonstrated in a study by Barcroft and Sunderman (2008) that compared the learning of associations between phonological forms and either familiar objects or novel objects. Their study suggested that novel words for familiar objects were more likely to be learned through L1 translation equivalents rather than through a direct mapping of spoken words onto pictures that depict their referents (i.e., objects), in line with theoretical account of L1 and L2 word-learning processes proposed by Kroll and Steward (1994) and Hernandez, Li, and MacWhinney (2005). According to these models, second-language words are often mapped to labels of the first language (i.e., word-to-word associations) during early stages of learning, and direct word-to-concept links are only possible as the individual becomes more proficient in the second language. Contrary to this type of theoretical proposal, recent evidence has increasingly indicated that direct mappings between L2 words and their corresponding concepts are possible even for L2 learners who are still at an early point in their L2 learning (e.g., Poarch, van Hell, & Kroll, 2015). Based on results from this line of research, it might be expected that the capacity to bind auditory-verbal information to visual materials in working memory would also play a part in early stages of word learning for familiar objects (i.e., a second language-like situation). On the other hand, if L2 learners would rely exclusively on the lexical connections between L2 and L1 in their early stage of L2 new word learning, we would not expect to observe a link between cross-modal working memory binding performance and word learning outcomes in the learning condition of familiar objects.
In sum, the aims of this study are (1) to identify the role of cross-modal working memory binding in the mapping of spoken word forms to objects, and (2) to determine whether the relationship between cross-modal working memory binding and word learning is modulated by the type of objects in the mapping process (familiar vs. novel objects). To achieve these aims, English monolinguals who had no knowledge of Mandarin Chinese were recruited to participate in a 2-day training study. They learned 24 phonologically unfamiliar novel words in Mandarin in association with either familiar or novel objects as semantic referents via novel-word-learning tasks (e.g., Hulme et al., 2007; H. Li, Shu, McBride-Chang, Liu, & Xue, 2009; Warmington & Hulme, 2012). Learning retention was subsequently assessed by delayed associative recognition tests administered 1 hour (T1) and then 1 day after initial learning (T2). They also completed a cross-modal working memory binding task (Wang et al., 2015), assessing immediate memory for auditory nonwords, for abstract shapes, and, crucially, for the bindings between these features. Findings indicating a relationship between the ability to create and temporarily retain (for a few seconds) associations between auditory-verbal and visual stimuli in working memory and learning and retention (over 24 hours) of spoken words for novel objects would extend evidence for the role of working memory binding in long-term word learning based on written language (Wang et al., 2015) to the domain of spoken language. Furthermore, we wanted to assess how working memory binding performance might be related to learning of words for novel objects (a more native-like situation) versus for familiar objects (a second language-like situation), in order to identify whether cross-modal working memory binding might play a similar or differential role for new word learning in L1 and L2.
A total of 71 English monolinguals were recruited from Pennsylvania State University. None of the participants had prior experience with Mandarin Chinese or other tonal languages, as assessed by the Language History Questionnaire (Li, Zhang, Tsai, & Puls, 2014). Two participants responded by guessing randomly for more than half of the trials in the working memory binding memory task, and their data were excluded from further analysis. Thus, data from 69 subjects (47 females; mean age 20.99 years, SD = 4.75, range: 18–44) were included in the present analyses. For the measures of associative recognition retention T1 (1-hour delay) and recognition retention T2 (1-day delay), two participants with a d-prime (d′) score of zero or less was removed from that given measure at T1, as a value of zero indicates inability to distinguish signal from noise, and negative values of d′ can also arise through response confusions (see Stanislaw & Todorov, 1999). At T2, two participants failed to come back for the recognition retention task, and one participant had a d′ score of zero or below in the novel-object condition. This left 66 and 67 participants in the novel-object condition and the familiar-object condition, respectively. The participants took part in the experiment to gain credits for a course or to receive monetary compensation for their time. The study was approved by the Institutional Review Board of the Pennsylvania State University. Consent was obtained from each participant prior to the experiment.
The experiment was carried out on two consecutive days. On Day 1, participants learned 24 disyllabic words in Mandarin in two conditions: in one condition 12 words were paired with familiar objects, and in the other condition another 12 words were paired with novel objects. There was 3-minute break between conditions. After training, participants could take another 3-minute break. Participants then received a computerized working-memory task that includes measures of memory for auditory nonwords, visual shapes, and, crucially, for the binding between these features. This was followed by three further cognitive tasks. Results from these three additional tasks were not reported here because they were not focus of the current study. These three cognitive tasks took approximately 1 hour to finish. Afterwards, participants received an associative recognition test as the first delayed posttest (T1, 1-hour delay). On Day 2, participants came back to the lab and received the Raven’s Progressive Matrices for nonverbal intelligence ability (Raven, Court, & Raven, 2006), and the other associative recognition test as the second delayed posttest (T2, 1-day delay). This associative recognition test was identical to that used at T1, except that the test pairs were presented in a different predetermined random order.
Materials and task
Word learning task
In this task, 24 spoken disyllabic Mandarin words were either paired with familiar objects or novel objects, yielding a familiar-object condition and a novel-object condition. The 24 spoken Mandarin words were adopted from a Mandarin word-learning study that included a larger number of target words (Lan, Fang, Legault, & Li, 2015; see Appendix 1 for the word list used). The spoken words were recorded by a female native Chinese speaker (see Lan et al., 2015, for details). The familiar and novel objects consisted of line-drawing pictures taken from Verma and Brysbaert (2015; see Appendix 2 for a list of the pictures used). The familiar objects and novel objects were learned in separate blocks. Whether a spoken word was paired with familiar or novel objects was counterbalanced across participants, as was the presentation order of blocks.
Each condition began with an initial presentation trial followed by five learning blocks, with each block containing 12 learning trials. In the initial presentation trial of each condition, 12 spoken word–object pairs were sequentially presented via a computer, one pair after another (see Fig. 1a). Presentation time for each pair was 2 s. In the subsequent learning blocks, each learning trial (see Fig. 1b for an example trial) started with a black fixation cross presented in the center of the screen. Once participants clicked on the fixation cross, one spoken Mandarin word was presented as a retrieval cue. Simultaneously, participants saw all 12 possible choices of picture items displayed evenly on the screen and were required to identify the target item by mouse clicking. Participants made a response within a self-determined time window (i.e., no time limit given). Once the response was made, the correct picture was shown on the next screen as feedback. Participants pressed the spacebar to go to the next learning trial. Within each learning block, the 12 spoken Mandarin words were sampled randomly for each participant without replacement. The display locations of the 12 picture items were also randomized and changed across trials for each participant to prevent the use of location cues. After all 12 spoken Mandarin words were presented as retrieval cues in a learning block, another learning block began. All participants completed five learning blocks for the same 12 spoken word–object pairs. Proportion of correct responses was used as a dependent variable for each learning block, and mean proportion of correct responses across five learning blocks was used to index immediate learning outcomes for either learning condition. The entire training session on Day 1 lasted for approximately 30 minutes (including a 3-minute break between the two conditions).
Associative recognition retention task
Learning retention was assessed via an associative recognition task using a procedure similar to that of Jackson and Schacter (2004). There were 84 test pairs, consisting of 24 targets (intact pairs), 48 foils (24 rearranged pairs and 24 new pairs), and 12 additional trials (four of the targets from either learning condition were repeated twice, and four other targets from either learning condition were repeated once). The 24 targets were intact pairs that were presented unaltered from either training session. The 48 foils consisted of 24 rearranged pairs constructed from separate study pairs and 24 entirely new pairs, where both the constituent elements did not appear in the training sessions. The inclusion of entirely new pairs would allow us to assess whether participants exhibited successful item memory. To create 24 entirely new pairs, 24 new spoken Mandarin words and 24 new picture items were included. The 24 new spoken Mandarin words (see Appendix 1 for a list of words) were also adopted from Lan et al. (2015). The 24 new picture items (12 real and 12 novel) were drawn from the study of Verma and Brysbaert (2015; see Appendix 1 for a list of pictures). Finally, the additional 12 target pairs were included to minimize the possibility that participants would base their responses on how they had responded to similar items that had been presented earlier in the test (Bayley, Wixted, Hopkins, & Squire, 2008; Holdstock et al., 2002). Hence, for example, if a participant had already seen pair A-B, he or she could not assume that any subsequent pairs that was composed of A or B would necessarily be a rearranged pair. Scores were based on responses to the first occurrence of the target pairs (Holdstock et al., 2002).
The 84 pairs were presented one at a time in a predetermined, random order. Participants heard the sound of a Chinese word and simultaneously saw an object picture shown in the center of the screen. They had to indicate their memory for each pair by pressing one of four keys corresponding to four response options: intact, rearranged, new, or single. Descriptions of the response option of intact, rearranged, and new were shown above. The single option denotes the pairs that are composed of a word/object from the study and a new word/object. Note that although the test included a single response option, there were no test pairs of this type. Single response options were designed to avoid the possible strategy that participants may begin to make rearranged responses after recognizing only a single item of the pair, and thus contaminating rearranged responses (see Jackson & Schacter, 2004, for a similar design).
There was no time limit for responding. The same associative recognition task was used as delayed posttest at T1 and T2, with test pairs presented in a different and predetermined random order that was used for all participants at the two time points. Dependent variables were calculated as d′ scores based on the response to the first occurrence of 24 target pairs and on the response to the 24 rearranged pairs.Footnote 1
Cross-modal working memory binding task
The computerized binding task designed by Wang et al. (2015) was adjusted to measure working memory for auditory-verbal and visual information binding (binding condition). In order to be able to separate participants’ memory capacity for individual features from their ability to form associations between features in working memory, two corresponding feature memory tasks were also administered to measure memory for constituent auditory-verbal materials (verbal condition) and visual materials (visual condition).
The stimuli consisted of a set of eight English auditory nonwords (fren, bris, cral, tros, drup, srap, prin, grol) used by Jones et al. (2013) and a set of eight abstract and nonnameable six-point shapes (Numbers 3, 7, 13, 19, 20, 21, 29, 30) drawn randomly from the study of Vanderplas and Garvin (1959). The auditory nonwords were recorded by a male English speaker. The visual shapes were presented in black against a white background, with an approximate size of 1.6 cm2 (see Fig. 2 for examples). All stimuli were sampled randomly without replacement within each trial and used for all participants.
Each memory condition (auditory verbal, visual, binding) was presented in a separate block. The tasks began with the presentation of the auditory-verbal or visual condition, counterbalanced across participants. To ensure that auditory-verbal materials and visual materials were equally familiar to participants in the binding condition, the binding condition was always administered as the final task (following a 3-minute break after the feature conditions). List length was set at three items in the binding condition and six items in the auditory-verbal and visual conditions, to equate the number of individual features to be remembered between binding and individual memory conditions (Wang et al., 2015). Each condition contained 15 experimental trials, preceded by four practice trials of a two-item sequence.
In the auditory-verbal condition (see Fig. 2a), at the study phase, a sequence of six auditory nonwords was presented via headphones. Each trial began with a black fixation cross presented at the upper center of the screen for 500 ms followed by a 250 ms delay. Each to-be-remembered item was then presented for 1,000 ms, with interstimulus intervals of 250 ms, with the screen remaining blank during presentation. A 1,000 ms delay followed offset of the final item in the sequence and was then followed by the test phase.
At the test phase, all eight nonwords were displayed in their visual forms (fren, bris, cral, tros, drup, srap, prin, grol) as response options in the lower half of the screen, each surrounded by a gray square outline. The participants used the mouse to click the target items in any order. No serial order element was required as this was not an explicit part of the binding task. The gray square around the items turned green once selected and remained green until the end of the test phase as a reminder of which items had been selected. The next trial started automatically once all responses had been made or when the total response time exceeded 36 s, giving 6 s on average for each response. Display locations of the eight response options at test were randomized and changed across trials to prevent the potential use of location cues. The dependent variable was proportion of correct responses.
In the visual condition (see Fig. 2b), the procedure was identical to that employed in the auditory-verbal condition, except that the experimental stimuli were replaced by a set of eight shapes (described above). At study, a sequence of six shapes was presented at the upper center of the screen at study. At test, the eight possible choices were presented in the lower half of the screen. The participants had to click to select the target items in any order. The dependent variable was proportion of correct responses.
In the binding condition (see Fig. 2c), at study, a sequence of three arbitrary pairs of auditory-verbal and visual stimuli was presented. The task procedure was identical to that employed in the auditory-verbal condition and visual condition, except that the presentation time for the constituent shape of each to-be-remembered pair was extended to 2,000 ms. The presentation time for each auditory nonword remained the same as that in the feature condition (i.e., 1,000 ms). This gave participants 2,000 ms to process each pair, and hence ensured equivalent feature processing for the individual feature and binding memory conditions. At test, either auditory nonwords or visual shapes were presented one at a time as retrieval cues. Simultaneously, participants saw all eight possible choices of the other features that made up the pair in the study phase displayed at the lower half of the screen and were required to identify the target item by mouse clicking. On the trials where visual shapes were presented as retrieval cues, auditory nonwords were displayed in their visual forms as response options.Footnote 2 The maximum response time for each cue was 6 s. To prevent the role of serial order mechanism, cue items were randomly presented on each given trial. The dependent variable was proportion of correct responses. Additionally, guessing error rate was examined as a proportion of the total number of features that did not appear in the presented sequence but were selected. This response type represents participants’ tendency to produce random guesses when performing the binding task (two participants’ data were removed from analysis due to random guesses; see Participants).
Learning and retention performance
For the initial learning outcomes, mean proportion of correct responses for the five learning blocks were .21 (SD = .14), .36 (SD = .19), .46 (SD = .22), .62 (SD = .23), and .67 (SD = .25) in the novel-object condition and .34 (SD = .18), .47 (SD = .24), .60 (SD=.26), .69(SD = .22), and .80 (SD = .19) in the familiar-object condition. In each condition, participants’ mean performance levels across learning blocks were averaged as dependent variables and used in the following analyses. The descriptive statistics for all principal measures are shown in Table 1.
The distribution of associative recognition responses is presented in Table 2. Across the four conditions (novel-object conditions across T1 and T2; familiar-object conditions across T1 and T2), participants correctly recognized .75–.87 of all intact pairs. Very few rearranged pairs were mistaken for intact pairs (.10–.16). These results demonstrated that participants’ achieved relatively accurate associative memory representations. Recognition of individual items was also evidenced by the fact that most rearranged pairs were correctly identified (.74–.80), while few new pairs were labeled rearranged (1%–3%). In the absence of associative recognition for intact pairs, participants were more likely to make a rearranged response (.11–.19) than new (.00–.01) or single (.02–.06) responses, suggesting these responses were also driven by successful item memory. These results, taken together, suggest that the participants demonstrated success in the learning and retention of the target pairs.
Discriminability scores (d′) were respectively calculated for associative recognition retention tasks across the four conditions (novel-object conditions across T1 and T2; familiar-object conditions across T1 and T2; see Table 1) and used for the following correlational analyses. Rates of hits and false alarms were calculated based on participants’ responses to the intact trials and rearranged trials. When the hit or false-alarm rate equaled zero or one, rates of zero were converted to 0.5/n, and rates of 1 to (n-0.5)/n, where n is the number of signal or noise trials (Macmillan & Kaplan, 1985).
Association between cross-modal working memory binding and learning performance
The simple correlation between measures is shown in Table 3. One-tailed tests were used in the following statistical significance testing given that we hypothesized a positive correlation between binding memory and word learning. The simple correlations show that the measure of cross-modal working memory binding was significantly correlated with the learning and retention of word-object pairs in all conditions across different time points (r = .287–.449).
A set of hierarchical regression analyses was then carried out with word learning outcomes across conditions and time points as dependent variables (see Table 4). For each analysis, nonverbal ability was entered at Step 1 and auditory-verbal memory and visual memory were entered at Step 2, to ensure that any observed correlations between the working memory binding task and word learning were not simply due to effects of general cognitive resources and individual differences in memory capacities for individual features. Results revealed that working memory binding was a significant predictor of successful word learning in the novel-object conditions across the three time points (initial learning outcome; associative recognition retention at T1; associative recognition retention at T2), accounting for a unique 6.6%, 6.8%, and 4.1% of variance in word learning performance, respectively. For the familiar object conditions, the working memory binding task was also a significant predictor, accounting for 9.5%, 5.1%, and 5.4% of variance in word learning performance at the respective time points.
The goal of this study was to enhance our understanding of the relationship between the ability to temporarily bind information across modalities in working memory and novel word learning in L1 and L2. After controlling for general cognitive abilities and memory for individual features, the results showed that individuals’ ability to bind auditory-verbal and visual information for immediate response in a working-memory task was a unique predictor of learning and retention in long-term memory of phonologically unfamiliar words for novel objects (a context resembling L1 word learning processes) and for familiar objects (a context resembling L2 word learning processes). The finding that cross-modal working memory binding performance uniquely predicted success in the mapping of phonological labels to novel objects thus extends previous findings based on written language (Wang et al., 2015) to the domain of spoken language in the L1 learning context. Moreover, the current results suggest that early stage L2 word learning is also reliant on cross-modal working memory binding to some extent.
In the case of learning phonological labels for novel objects, performance on the working memory binding task where participants were required to bind information across different modalities (and retain it for a few seconds) significantly predicted initial word–object pair learning outcomes and, importantly, retention performance measured approximately 1 hour and then 1 day after learning. Crucially, regression analyses indicate that this link cannot be readily explained by general cognitive skills and memory for individual materials that constitute the binding task. That such a relationship can be observed between tasks examining retention over very different timescales, and using different stimuli and response measures, fits well with the idea that temporary feature-binding ability within working memory may form an important stage in long-term episodic learning (Baddeley, 2003), and is also consistent with Wang et al.’s (2015) finding that the capacity to temporarily integrate auditory-verbal and visual information in working memory is linked to long-term word acquisition in children. It is worth noting that the observed link between working memory binding performance and retention of the learned word–object pairs does not seem to simply arise from similarity between test formats, as the working memory binding task employed a reconstruction paradigm while the associative retention tasks used a single-probe procedure. These results are therefore in line with previous work indicating that learning novel words for objects that are previously unnamed, as is the case for early stages of L1 learning, would force learners to directly map the spoken words to their corresponding objects (Barcroft & Sunderman, 2008). As such, it would inevitably rely on the ability to bind information across verbal and visual modalities.
In addition, the current study demonstrated that cross-modal working memory binding performance was also associated with learning and retention of novel-word–familiar-object pairings. This finding runs counter to the theoretical assumption that L2 word forms are exclusively attached to the memory system by lexical links with L1 rather than through direct conceptual links during early stages of L2 learning (Hernandez et al., 2005; Kroll & Stewart, 1994), as they are already named in learners’ first language. The predictive power demonstrated by cross-modal binding in novel-word–familiar-object word-learning outcomes, on the contrary, may imply that the direct mapping of L2 words onto their semantic referents (i.e., concepts) are possible even for L2 learners who are still at early point in their L2 learning (e.g., Poarch et al., 2015).
We would note that our participants learned L2 words via the picture–word paired-associate learning method (i.e., a picture-based method). The use of images may encourage a strategy of direct mapping between phonological and visual information to establish novel word-familiar object pairings, which could be different from other strategies such as L1 word-L2 word association. It is also worth noting that the findings from Poarch et al. (2015) were based on child L2 learners of English in the Netherlands, whose word learning contexts were enriched by pictures and speaking/listening experiences, which is unlike the typical context encountered by adult L2 learners. In future studies, it may be productive to examine whether different types of learning method (e.g., picture-based vs. word-based method) and learning context (e.g., associative learning vs. immersed learning context) would mediate the role of cross-modal working memory binding in early stages of L2 word learning. It may also be worthwhile to assess whether different forms of working memory binding (e.g., word–nonword binding as well as cross-modal binding) differentially contribute to early stages of L2 word learning due to possible variation in learning methods (e.g., picture-based vs. word-based method). Nevertheless, this study provides a first set of evidence to suggest that learning words for both novel objects and familiar objects can be similarly related to the ability to temporarily bind information across modalities in working memory. The results have implications for understanding the common cognitive processing underlying establishment of word-concept connections that might be shared by learning in L1 and L2.
In summary, the present data indicates that the capacity to form and maintain temporarily bound auditory-verbal and visual information in working memory is related to the learning/retention of phonologically unfamiliar words both for novel objects, a context resembling L1 word learning processes, and for familiar objects, a context resembling L2 word learning processes. This extends evidence concerning the role of cross-modal working memory binding in word learning from the written language (i.e., phonological–orthographic mapping; Wang et al., 2015) to the domain of spoken language (i.e., mapping of spoken word form to lexical semantics) in L1. Our findings also have implications for understanding the impact that different contexts have on early stages of word learning and highlight the need for further investigation of mechanisms underlying the similarities and differences in L1 and L2 word learning and their relations to different forms of working memory binding.
The results were essentially identical when we also calculated the score based on all occurrences of the target. This was also the case when d′ scores were calculated based on the response to all 48 foils (24 rearranged pairs and 24 new pairs). In order to be concise, the current study only reports d′ scores that were calculated based on the response to the first occurrence of 24 target pairs and on the response to the 24 rearranged pairs.
Possible contributions of symbol recognition skills to outcomes related to binding performances can be ruled out by statistically controlling for performance on a corresponding auditory nonword task as described in the auditory-verbal condition.
Allen, R. J. (2015). Memory binding. In J. D. Wright (Ed.), International encyclopedia of the social & behavioural sciences (2nd ed.). New York: Elsevier.
Allen, R. J., Baddeley, A. D., & Hitch, G. J. (2006). Is the binding of visual features in working memory resource-demanding? Journal of Experimental Psychology: General, 135(2), 298–313. doi:10.1037/0096-34188.8.131.528
Allen, R. J., Hitch, G. J., & Baddeley, A. D. (2009). Cross-modal binding and working memory. Visual Cognition, 17(1/2), 83–102. doi:10.1080/13506280802281386
Allen, R. J., Baddeley, A. D., & Hitch, G. J. (2014). Evidence for two attentional components in visual working memory. Journal of Experimental Psychology: Learning, Memory, and Cognition, 40(6), 1499–1509. doi:10.1037/xlm0000002
Allen, R. J., Havelka, J., Falcon, T., Evans, S., & Darling, S. (2015). Modality specificity and integration in working memory: Insights from visuospatial bootstrapping. Journal of Experimental Psychology: Learning, Memory, and Cognition, 41(3), 820–830.
Baddeley, A. D. (1986). Working memory. Oxford: Oxford University Press.
Baddeley, A. D. (1996). Exploring the central executive. The Quarterly Journal of Experimental Psychology: Section A, 49(1), 5–28.
Baddeley, A. D. (2000). The episodic buffer: A new component of working memory? Trends in Cognitive Sciences, 4, 417–423.
Baddeley, A. D. (2003). Working memory and language: An overview. Journal of Communication Disorders, 36(3), 189–208. doi:10.1016/S0021-9924(03)00019-4
Baddeley, A. D. (2012). Working memory: Theories, models, and controversies. Annual Review of Psychology, 63(1), 1–29. doi:10.1146/annurev-psych-120710-100422
Baddeley, A. D., Allen, R. J., & Hitch, G. J. (2011). Binding in visual working memory: The role of the episodic buffer. Neuropsychologia, 49(6), 1393–1400. doi:10.1016/j.neuropsychologia.2010.12.042
Baddeley, A. D., Gathercole, S. E., & Papagno, C. (1998). The phonological loop as a language learning device. Psychological Review, 105(1), 158.
Baddeley, A. D., & Hitch, G. J. (1974). Working memory. In G. A. Bower (Ed.), Recent advances in learning and motivation (Vol. 8). New York: Academic Press.
Barcroft, J., & Sunderman, G. (2008). Learning new words for objects and nonobjects: Theoretical and methodological implications. Mental Lexicon, 3(3), 325–348. doi:10.1075/ml.3.3.05bar
Barrouillet, P., & Camos, V. (2014). Working Memory: Loss and Reconstruction. Psychology Press.
Barrouillet, P., Bernardin, S., & Camos, V. (2004). Time constraints and resource sharing in adults’ working memory spans. Journal of Experimental Psychology: General, 133(1), 83.
Bayley, P., Wixted, J. T., Hopkins, R. O., & Squire, L. R. (2008). Yes/no recognition, forced-choice recognition, and the human hippocampus. Journal of Cognitive Neuroscience, 20(3), 505–512.
Blomert, L. (2011). The neural signature of orthographic–phonological binding in successful and failing reading development. NeuroImage, 57(3), 695–703. doi:10.1016/j.neuroimage.2010.11.003
Brockmole, J. R., & Franconeri, S. L. (2009). Binding: A special issue of visual cognition. Hover: Psychology Press.
Brown, L. A., & Brockmole, J. R. (2010). The role of attention in binding visual features in working memory: Evidence from cognitive ageing. The Quarterly Journal of Experimental Psychology, 63(10), 2067–2079.
Cowan, N. (1999). An embedded-processes model of working memory. In A. Miyake & P. Shah (Eds.), Models of working memory: Mechanisms of active maintenance and executive control (pp. 62–101). Cambridge: Cambridge University Press.
Cowan, N. (2010). The magical mystery four how is working memory capacity limited, and why? Current Directions in Psychological Science, 19(1), 51–57.
Ecker, U. K. H., Maybery, M., & Zimmer, H. D. (2013). Binding of intrinsic and extrinsic features in working memory. Journal of Experimental Psychology: General, 142(1), 218–234. doi:10.1037/a0028732
Elsley, J. V., & Parmentier, F. B. (2009). Is verbal–spatial binding in working memory impaired by a concurrent memory load? The Quarterly Journal of Experimental Psychology, 62(9), 1696–1705.
Gathercole, S. E., Willis, C. S., Emslie, H., & Baddeley, A. D. (1992). Phonological memory and vocabulary development during the early school years: A longitudinal study. Developmental Psychology, 28(5), 887–898. doi:10.1037/0012-16184.108.40.2067
Gupta, P. (2003). Examining the relationship between word learning, nonword repetition, and immediate serial recall in adults. The Quarterly Journal of Experimental Psychology: Section A, 56(7), 1213–1236.
Hernandez, A., Li, P., & MacWhinney, B. (2005). The emergence of competing modules in bilingualism. Trends in Cognitive Sciences, 9(5), 220–225.
Holdstock, J., Mayes, A., Roberts, N., Cezayirli, E., Isaac, C., O’Reilly, R., & Norman, K. (2002). Under what conditions is recognition spared relative to recall after selective hippocampal damage in humans? Hippocampus, 12(3), 341–351.
Hu, Y., Hitch, G. J., Baddeley, A. D., Zhang, M., & Allen, R. J. (2014). Executive and perceptual attention play different roles in visual working memory: Evidence from suffix and strategy effects. Journal of Experimental Psychology: Human Perception and Performance, 40(4), 1665–1678. doi:10.1037/a0037163
Hulme, C., Goetz, K., Gooch, D., Adams, J., & Snowling, M. J. (2007). Paired-associate learning, phoneme awareness, and learning to read. Journal of Experimental Child Psychology, 96(2), 150–166. doi:10.1016/j.jecp.2006.09.002
Jackson, O., & Schacter, D. L. (2004). Encoding activity in anterior medial temporal lobe supports subsequent associative recognition. NeuroImage, 21(1), 456–462.
Jones, M. W., Branigan, H. P., Parra, M. A., & Logie, R. H. (2013). Cross-modal binding in developmental dyslexia. Journal of Experimental Psychology: Learning, Memory, and Cognition. doi:10.1037/a0033334
Kroll, J. F., & Stewart, E. (1994). Category interference in translation and picture naming: Evidence for asymmetric connections between bilingual memory representations. Journal of Memory and Language, 33(2), 149–174.
Lan, Y.-J., Fang, S.-Y., Legault, J., & Li, P. (2015). Second language acquisition of Mandarin Chinese vocabulary: Context of learning effects. Educational Technology Research and Development, 63(5), 671–690.
Langerock, N., Vergauwe, E., & Barrouillet, P. (2014). The maintenance of cross-domain associations in the episodic buffer. Journal of Experimental Psychology: Learning, Memory, and Cognition, 40(4), 1096–1109.
Li, H., Shu, H., McBride-Chang, C., Liu, H. Y., & Xue, J. (2009). Paired associate learning in Chinese children with dyslexia. Journal of Experimental Child Psychology, 103(2), 135–151. doi:10.1016/j.jecp.2009.02.001
Li, P., Zhang, F., Tsai, E., & Puls, B. (2014). Language history questionnaire (LHQ 2.0): A new dynamic web-based research tool. Bilingualism: Language and Cognition, 17(03), 673–680.
Macmillan, N. A., & Kaplan, H. L. (1985). Detection theory analysis of group data: Estimating sensitivity from average hit and false-alarm rates. Psychological Bulletin, 98(1), 185.
Majerus, S., Poncelet, M., Greffe, C., & van der Linden, M. (2006). Relations between vocabulary development and verbal short-term memory: The relative importance of short-term memory for serial order and item information. Journal of Experimental Child Psychology, 93(2), 95–119.
Morey, C. C. (2009). Integrated cross-domain object storage in working memory: Evidence from a verbal–spatial memory task. The Quarterly Journal of Experimental Psychology, 62(11), 2235–2251. doi:10.1080/17470210902763382
Parra, M. A., Fabi, K., Luzzi, S., Cubelli, R., Hernandez Valdez, M., & Della Sala, S. (2013). Relational and conjunctive binding functions dissociate in short-term memory. Neurocase, 1–11. doi:10.1080/13554794.2013.860177
Poarch, G. J., van Hell, J. G., & Kroll, J. F. (2015). Accessing word meaning in beginning second language learners: Lexical or conceptual mediation? Bilingualism: Language and Cognition, 18(03), 357–371.
Raven, J. C., Court, J. H., & Raven, J. (2006). Raven’s progressive matrices [Assessment]. San Antonio: Harcourt Assessment.
Seidenberg, M. S., & McClelland, J. L. (1989). A distributed, developmental model of word recognition and naming. Psychological Review, 96(4), 523–568. doi:10.1037/0033-295X.96.4.523
Stanislaw, H., & Todorov, N. (1999). Calculation of signal detection theory measures. Behavior Research Methods, Instruments, & Computers, 31(1), 137–149.
Unsworth, N., & Engle, R. W. (2007). On the division of short-term and working memory: An examination of simple and complex span and their relation to higher order abilities. Psychological Bulletin, 133(6), 1038.
Vanderplas, J. M., & Garvin, E. A. (1959). The associative values of random shapes. Journal of Experimental Psychology, 57, 147–154.
Verma, A., & Brysbaert, M. (2015). A validated set of tool pictures with matched objects and non-objects for laterality research. Laterality: Asymmetries of Body, Brain and Cognition, 20(1), 22–48. doi:10.1080/1357650X.2014.914949
Wang, S., Allen, R. J., Lee, J. R., & Hsieh, C.-E. (2015). Evaluating the developmental trajectory of the episodic buffer component of working memory and its relation to word recognition in children. Journal of Experimental Child Psychology, 133, 16–28. doi:10.1016/j.jecp.2015.01.002
Wang, S., & Gathercole, S. E. (2013). Working memory deficits in children with reading difficulties: Memory span and dual task coordination. Journal of Experimental Child Psychology, 115(1), 188–197. doi:10.1016/j.jecp.2012.11.015
Warmington, M., & Hulme, C. (2012). Phoneme awareness, visual-verbal paired-associate learning, and rapid automatized naming as predictors of individual differences in reading ability. Scientific Studies of Reading, 16(1), 45–62. doi:10.1080/10888438.2010.534832
This research is partially supported by the Aim for the Top University Project and Center of Learning Technology for Chinese of National Taiwan Normal University (NTNU), sponsored by the Ministry of Education, Taiwan, R.O.C. and the International Research-Intensive Center of Excellence Program of NTNU and Ministry of Science and Technology, Taiwan, R.O.C. under Grant No. MOST 104-2911-I-003-301 and MOST 105-2410-H-003-091. Partial support was also provided by the U.S. National Science Foundation (BCS-1338946; BCS-1349110). We thank Nan Zhang for her assistance with running the experiment.
The original version of this article was revised: There was an error in the PDF version of the article that caused Chinese characters presented in groups of two in Table 5 to be presented as single characters (i.e., one of the two characters in each pair was dropped). This error has now been rectified and the PDF version of the article is correct.
An erratum to this article is available at https://doi.org/10.3758/s13421-017-0749-5.
About this article
Cite this article
Wang, S., Allen, R.J., Fang, S. et al. Cross-modal working memory binding and L1-L2 word learning. Mem Cogn 45, 1371–1383 (2017). https://doi.org/10.3758/s13421-017-0731-2
- Working memory
- Episodic buffer
- Cross-modal binding
- Word learning
- Second language