Advertisement

Lexical Availability of Basic and Advanced Semantic Categories in English L1 and English L2

  • Roberto A. Ferreira Campos
Chapter
Part of the Educational Linguistics book series (EDUL, volume 17)

Abstract

The current investigation used lexical availability to assess the performance of Chilean university students, advanced English (L2) learners in comparison with English native speakers (L1) in basic (‘Body parts’, ‘Food and drink’), and advanced (‘Terrorism and crime’, ‘Health and medicine’) semantic categories. Three analyses were conducted looking at number of words produced, lexical availability values, and correlations between L1 and L2 speakers in all four semantic categories. The results of the first analysis showed that L1 outperformed L2 speakers and that basic categories were more productive than advanced categories regarding number of words produced. The second analysis showed no group effect, but a significant effect of semantic category, with basic showing higher lexical availability than advanced categories. The last analysis revealed strong correlations between L1 and L2 speakers in all semantic categories, with stronger correlations for basic than advanced categories. However, the most significant finding in this study is that both groups retrieved a greater number of words for basic semantic categories than for advanced semantic categories which seem to point to similar patterns in the organization of the available lexicons of L1 and L2 speakers.

Keywords

Native Speaker Semantic Category Basic Category Mental Lexicon Word Production 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

2.1 Introduction

Second language learning is a discipline that has become increasingly important around the world, thanks to the numerous opportunities for people to work and travel in multilingual environments. In this context, there is a growing interest in improving current teaching methods and materials in order to facilitate the process of second language acquisition. Overall, policies towards second language learning seem to be pointing in the right direction, since most countries promote this activity by offering a wide range of programs to learn not only the most popular foreign languages such as English, Spanish, French, or Mandarin; but also other less known languages.

When enrolled in any language programme, either within a university or a language institute, learners are usually assessed and then classified according to their initial proficiency in the second language. This is regularly done by using local language tests or standard international tests such as TOEIC1 in English, DELE2 in Spanish, and DELF-DALF3 in French, among others. Once evaluated, second language (L2) learners are typically placed under categories such as beginner, pre-intermediate, intermediate, upper-intermediate, or advanced, which are common labels for the different stages of language acquisition. As L2 learners progress through these different phases, they are supposed to acquire and advance their knowledge about the different components of the language: vocabulary, grammar, syntax, phonology or orthography, in order to become fluent L2 speakers. However, the reality seems to suggest that this might not always be the case since even advanced L2 speakers sometimes fail to use appropriate grammar and accurate vocabulary. The current study focuses specifically on vocabulary and offers an assessment of the lexicon that advanced L2 learners are capable of eliciting during a lexical availability task, in comparison with native speakers.

If people are exposed to a new language in a foreign environment, probably the first words they will try to learn are those corresponding to greetings, places, food and drink, and so forth. Interestingly, some of the words and expressions they will encounter might not even appear in dictionaries or textbooks. This is because human language is very dynamic, constantly changing and incorporating new words to its network (Ferreira and Echeverría 2010). As a result, in order to teach a foreign language properly, it is important to know what vocabulary L2 learners should be exposed to at different stages of language learning and how this vocabulary should be presented. Generally, L2 instructors organize new vocabulary in different semantic categories which fit appropriate proficiency levels. For instance, beginners are likely to be exposed to new vocabulary from categories such as ‘Body parts’, ‘Food and drink’ or ‘Parts of the house’; whereas advanced students are more likely to be taught new words within the categories of ‘Health and medicine’, ‘Politics’ or ‘Economy and finance’. This way of dealing with vocabulary is believed to be very beneficial (e.g., Anwar Amer 1986; Channell 1990) and is widely used in different learning materials such as textbooks for basic and intermediate levels (e.g., McCarthy and O’Dell 2002; Pye 2002; Redman 2001), and for advanced L2 learners (e.g., Richards and Sandy 2008). While there is well-established agreement that organizing the vocabulary into semantic categories is advantageous, the efficiency of the methods and criteria for selecting words within each category can be questionable.

In the process of vocabulary selection for the second language class, most researchers seem to turn to frequency of use for answers. Frequency is a very powerful variable used quite extensively in the cognitive sciences and has been shown to affect reading aloud (Balota et al. 2004), lexical decision (Balota and Chumbley 1984), and object naming (Barry et al. 1997; Ellis and Morrison 1998; Cuetos et al. 1999), among other tasks. Thus, it is not surprising that it has traditionally been used as the main method for word selection in second language teaching. The early compilation of 10,000 words in the English language by Thorndike (1921), followed by Kucĕra and Francis (1967)’s work, the CELEX database by Baayen et al. (1993), and more recently, Brysbaert and New (2009)’s improved word frequency measure are good examples of the long trajectory of frequency as a well-established reference for vocabulary selection. Despite its relevant role, there is growing concern that word frequency might fail to capture informal every-day vocabulary, and probably over represents formal vocabulary found in written texts and compilations of spoken language from which frequency is extracted (Hernández-Muñoz et al. 2006). This is potentially disadvantageous for L2 learners since they are probably not being exposed to the vocabulary native (L1) speakers use in everyday life.

In view of these facts and as explained by López-Morales in the introductory chapter, another less popular variable, lexical availability, has been regarded as an alternative approach for vocabulary selection and the study of learners’ lexicons. Lexical availability measures are obtained by having participants elicit words from different semantic categories (e.g., ‘Body parts’, ‘Food and drink’), which are similar to those found in second language learning textbooks. After the test is conducted, each generated word is then given a lexical availability value, which is calculated based on the number of participants who produce the word, its position on the list within a given category, and the lowest position the word occupies in any of the lists (see Sect. 2.2 for more detail). Since lexical availability is obtained directly from participants and not from edited written texts (unlike frequency), it might offer a very good representation of the functional everyday vocabulary people actually use in conversations. It is true that while a participant is performing a lexical availability test, he/she sometimes produces rare words. However, as these words are unlikely to be elicited by other participants, they never reach acceptable lexical availability values and end up at the bottom of the list.

In understanding the nature of lexical availability, researchers have investigated the contribution of different predictors that can drive the lexical availability effect. Hernández-Muñoz et al. (2006), in a multilevel multiple regression analysis, found that typicality, familiarity, and age of acquisition (AoA) were the only significant predictors of lexical availability. This means that what primarily drives individuals to produce words from a given category is how typical or familiar the items in each category are, and the age at which they learned those items. Unlike the above variables, frequency was not a significant predictor of lexical availability (Hernández-Muñoz et al. 2006). This strengthens the idea that lexical availability and frequency might target slightly different things and could generate deviant results when considered as reference to select words for inclusion in second language learning materials.

In the present study, we used a lexical availability task to compare the size and availability of the vocabulary that L1 and L2 speakers are able to retrieve from different semantic categories, within a time frame of 2 min.

First, we wanted to compare advanced university L2 learners and native speakers regarding number of words produced across basic and advanced semantic categories.4 Second, we were also interested in looking at lexical availability values including speakers and type of semantic category (basic or advanced) as factors. Third, we also carried out a correlational analysis between lexical availability values of words generated by L1 and L2 speakers across basic and advanced semantic categories.

A number of hypotheses were tested. We first assessed the hypothesis that L1 speakers would outperform L2 speakers regarding the mean number of words produced in each semantic category. This might not be particularly surprising since L1 speakers in this study had lived in an English speaking environment since they were born, whereas L2 speakers learned English as a second language primarily in a school setting. We also expected that both L1 and L2 speakers would elicit more words for basic categories (e.g., ‘Body parts’) than for advanced categories (e.g., ‘Terrorism and crime’). This is based on the assumption that words belonging to basic categories are likely to be more familiar, typical or learned earlier in life than words from advanced categories. Predictions regarding a direct comparison of lexical availability in L1 and L2 are not very straightforward. However, in line with the first hypothesis, lexical availability values should be higher in L1 than L2 speakers because the more words produced from a given category, the greater the chance of words to be repeated across participants, which would increase lexical availability values. Similarly, we expected that basic semantic categories would show higher lexical availability values in comparison with advanced categories. Basic categories seem to have higher familiarity and are generally acquired earlier in life, which would benefit word production and, consequently, lexical availability. As stated earlier, this is supported by Hernández-Muñoz et al. (2006)’s study, which found that familiarity and AoA were strong predictors of lexical availability. See also  Chap. 3 by Jiménez Catalán, Agustín, Fernández, and Canga in this volume. Finally, we also expected to find a correlation between lexical availability values of words produced by native speakers and the same words elicited by L2 speakers.

2.2 Method

2.2.1 Participants

The data used in this Chapter is part of a larger data set collected by Ferreira (2006). The investigation included a total of 50 English native speakers (mean age 16.4, SD 0.6) and 50 advanced second language students (mean age 21.4, SD 0.4). All native speakers who qualified for the study were monolingual female students at The Royal School located in Haslemere, Surrey, United Kingdom. Prior to the lexical availability test, all prospective participants were asked orally whether they were able to speak a second language. Students who reported that they did so were excluded from the study before it took place. The fact that L1 speakers in this study were non-specialized secondary school students allowed us to obtain a sample of the average vocabulary English speakers can produce in a timeframe of 2 min. The L2 speakers were all undergraduate students from the University of Concepción, Chile. Prior to enrolment at the university, they had studied English in a school setting for 8 years on average. All the students were in their fourth (last) undergraduate year, so they had completed at least 1,000 h of instruction in English. Their academic program included several general English language courses covering pre-intermediate, intermediate, upper-intermediate, and advanced levels. Other more advanced courses comprised phonetics, English literature, American and British history, applied linguistics, translation (English-Spanish); apart from optional courses such as academic writing, short-story writing, and drama.

2.2.2 Materials and Design

The full data set by Ferreira (2006) included ten semantic categories, which were selected on the basis of previously established categories as part of the Panhispanic Project5 (see also López-Morales 2012, for details) and English as a Second Language (ESL) textbooks such as the Interchange series (Richards et al. 2005). The current investigation only used four categories in order to examine relevant factors and interactions more carefully. In Ferreira (2006) all semantic categories were classified into basic or advanced, depending on the degree of specialization of the lexicon they contained. Here, we revalidated this classification by asking 20 currently employed English teachers to classify all ten categories into basic or advanced. Participants were told to choose five categories or units that they would normally use to teach beginner students and five categories they were more likely to use with advanced students. See Appendix 2.1 for Instructions sheet. All participants agreed that ‘Body parts’, ‘Food and drink’, ‘Holidays’, ‘Clothes’, and ‘Entertainment’ were more likely to be taught at a beginners’ level; whereas ‘Economy and finance’, ‘Terrorism and crime’, ‘Politics’, ‘Pollution and the environment’, and ‘Health and medicine’ were more suited for an advanced audience. For the current publication, we randomly selected two basic (‘Body parts’ and ‘Food and drink’) and two advanced (‘Terrorism and crime’, and ‘Health and medicine’) categories.

2.2.3 Procedure

Both L1 and L2 speakers were given a paper-based lexical availability test in a classroom setting. They were presented with all ten semantic categories in a pseudorandom order, in order to ensure that categories corresponding to the same classification (basic or advanced) never appeared together. Each category was displayed on a different page and participants were told to read the name of the category (appearing on top) and then write as many words as possible from the given category within a time period of 2 min. A table with 50 spaces was provided for the purpose (see Appendix 2.2). Participants were not allowed to move on to the following page until the end of the 2-min period, and were asked to immediately hand in the test after all semantic categories were presented. The complete test lasted around 20 min.

In order to edit and process the lists of words obtained from the participants, a set of criteria was adopted. Windows XP Note block (Microsoft Corporation 2007) was used to type in the responses produced by both groups of participants. First, two different number codes were used to differentiate the different types of speakers: L1 speakers were coded as 11111, while L2 speakers were identified with the number 11112. Each participant in each group was identified with a number ranging from 001 to 050 and a similar procedure was adopted to code semantic categories, which ranged from 01 to 10. Responses from each participant were entered in lower-case including the group code first, followed by the participant’s code and finally the category code. Each code was separated from other codes and the words by a single spacebar press, whereas words were separated from each other by a comma followed by a spacebar press (e.g., 11111 001 01 leg, arm, hand). Each list of words corresponding to the same category and the same participant was separated from subsequent lists by an Enter press so that each list would be placed on a different line. Regular nouns and adjectives were typed in singular form, but irregular nouns were kept in their original form. Except for gerunds and participles, all other verb forms were transformed to infinitive. Finally, compound nouns, short phrases, or expressions (e.g., orange squash, september eleventh, etc.) were hyphenated (e.g., orange-squash, september-eleventh) and turned into a single entry.

After all words were entered in Block note, they were saved in a single txt. file in order to be processed. Data processing was carried out using Dispogen II (Echeverría et al. 2005), which allowed us to obtain lexical availability values for each word elicited in each semantic category. This software is an application created in MATLAB version 7 (The Math Works Inc. 2005) and uses a formula developed by López-Chávez and Strassburguer-Frías (1991) (see Fig. 2.1), which computes lexical availability values according to the position that a word takes in a list, the number of participants who elicit the word at those positions, and the lowest position the word is observed in any of the lists (see Hernández-Muñoz et al. 2006). Based on this formula, words produced by a large number of participants and which appear early on the lists will obtain a high lexical availability value, whereas words elicited by few participants and appearing at the bottom of the lists will rank low in lexical availability.
Fig. 2.1

Formula to calculate lexical availability (‘D(Pj)’ represents the lexical availability value of the word j within a semantic category; ‘I’ denotes the total number of participants who performed the test; ‘i’ represents the position of the word j in a given list; ‘f’ is the number of participants who elicited the word j at that position in their list; ‘n’ denotes the lowest position obtained by word j in any list produced for the category; and ‘e’ is the natural number 2.718181818459045 (see Hernández-Muñoz et al. 2006))

As stated earlier, only four categories out of ten were used in the current analysis. These corresponded to ‘Body parts’, ‘Food and drink’ (basic), and ‘Terrorism and crime’, ‘Health and medicine’ (advanced).

2.3 Results

Results were obtained for three main analyses. The first analysis aimed to examine differences between L1 and L2 speakers and between semantic categories regarding mean number of words produced (see Table 2.1). The second analysis included a direct comparison of lexical availability values across speakers and categories for the 100 most available words in each category and each group of speakers (see Table 2.2). Mixed-factorial analysis of variance (ANOVA) was used in the first and second analyses. When further analyses were required, one-way within subjects ANOVAs and Bonferroni-corrected t-tests were used. Effect sizes were reported using partial Eta squared (ηp 2) and when sphericity was violated, Greenhouse-Geisser correction was used to adjust p values. The third analysis included a correlation (Spearman’s rho) between the lexical availability values of the 100 most available words produced by native speakers in each category and the values of the same words generated by L2 speakers. When a word elicited by native speakers was not generated by L2 speakers, it received a lexical availability value of ‘0’.
Table 2.1

Total number of different words per category and mean number of words per participant in each group of speakers

 

Speakers

BP

F&D

T&C

H&M

Total number of different words

L1

206

450

413

468

L2

109

253

316

276

Mean number of words

L1

26.7

29.6

21.0

23.6

L2

21.2

23.6

14.9

15.1

Note: BP ‘Body parts’, F&D ‘Food and drink’, T&C ‘Terrorism and crime’, H&M ‘Health and medicine’

Table 2.2

Mean lexical availability values for the first 100 words in L1 and L2 speakers

 

BPL1

BPL2

F&DL1

F&DL2

T&CL1

T&CL2

H&ML1

H&ML2

Mean lexical availability

0.14

0.11

0.11

0.11

0.08

0.06

0.09

0.06

SD

0.18

0.17

0.08

0.09

0.09

0.06

0.11

0.09

Range

0.01–0.79

0.00–0.76

0.03–0.49

0.02–0.38

0.03–0.48

0.02–0.31

0.03–0.75

0.02–0.58

2.3.1 Analysis 1: Mean Number of Words Produced

As explained earlier, this analysis assessed the number of words retrieved by L1 and L2 speakers in each semantic category, and the possible interactions between group and semantic category. See Table 2.1 and Fig. 2.2.
Fig. 2.2

Mean number of words produced by participants in each group (L1 and L2)

The first step in the analysis was carried out using a 2 × 4 mixed-factorial ANOVA with speaker (L1, L2) and category (‘Body parts’, ‘Food and drink’, ‘Terrorism and crime’, ‘Health and medicine’) as the main factors. The mixed-factorial ANOVA showed a significant main effect of group, with native speakers outperforming non-native speakers, F 1(1, 98) = 2808.65, MSE = 68.65, p < .001, ηp 2 = .39. There was also a significant effect of semantic category, F 1(1, 98) = 112.86, MSE = 17.27, p < .001, ηp 2 = .53, and a significant group x category interaction, F 1(1, 98) = 3.31, MSE = 17.27, p = .03, ηp 2 = .03.

In order to explore the group x category interaction, separate one-way ANOVAs and post hoc tests were conducted for the data from each group of speakers. Results of the ANOVA conducted on the L1 data showed a significant main effect of category, F 1(1, 98) = 40.91, MSE = 20.08, p < .001, ηp 2 = .46. Bonferroni-corrected paired-samples t-tests were used for all post hoc analyses. Results showed that native speakers produced a significantly higher number of words for ‘Food and drink’ than for all other three categories (p < .001). The second most productive semantic category was ‘Body parts’, where participants elicited significantly more words than for ‘Terrorism and crime’, and ‘Health and medicine’ (p < .001). The least productive category was ‘Terrorism and crime’ that showed a significantly lower number of words produced than ‘Health and medicine’ (p < .01).

Results for the one-way ANOVA run on the data corresponding to L2 speakers also showed a significant main effect of semantic category, F1(1, 98) = 82.58, MSE = 15.49, p < .001, ηp 2 = .63. Post hoc tests (Bonferroni-corrected) revealed that advanced L2 speakers (similar to L1 speakers) produced significantly more words for ‘Food and drink’ than for ‘Body parts’ (p < .01), ‘Terrorism and crime’ (p < .001), and ‘Health and medicine’ (p < .001). The second most productive category was also ‘Body parts’, which showed significantly more words than ‘Terrorism and crime’, and ‘Health and medicine’. Unlike the results in the L1 group, ‘Terrorism and crime’ and ‘Health and medicine’ did not differ in the L2 group (p = 1.0).

In summary, native speakers outperformed L2 speakers within all semantic categories. Overall, both native and non-native speakers produced more words for basic semantic categories (‘Body parts’, and ‘Food and drink’) than for advanced categories (‘Terrorism and crime’, and ‘Health and medicine’). The group x category interaction can be explained by the fact that L1 speakers elicited more words for ‘Health and medicine’ than for ‘Terrorism and crime’, whereas these two categories were not significantly different from each other in the group of L2 speakers.

2.3.2 Analysis 2: Lexical Availability

In order to perform Analysis 2, the 100 words with the highest lexical availability values from each group of speakers in each semantic category were selected. See Appendix 2.3 for a sample of ten words in each category and each group of speakers Fig. 2.3.
Fig. 2.3

Mean lexical availability for the 100 words with the highest values in L1 and L2 speakers

As in Analysis 1, a 2 × 4 mixed-factorial ANOVA was first conducted on the data and included the same main factors. The factorial ANOVA found no effect of group, F 1(1, 98) = 1.79, MSE = 0.05, p = .18, ηp 2 = .01. However, there was a highly significant effect of semantic category, F 1(1, 98) = 45.22, MSE = 0.01, p < .001, ηp 2 = .19. The group x category interaction did not reach significance, F 1(1, 98) = 82.58, MSE = 15.49, p < .001, ηp 2 = .63. Since there was no effect of group or interaction, but a significant effect of category, post hoc tests (Bonferroni-corrected) were run on the L1 and L2 data combined. No difference was found between the two basic categories, ‘Body parts’ and ‘Food and drink’, (p = .13). However, both these categories showed significantly higher lexical availability values than any of the advanced categories (‘Terrorism and crime’, and ‘Health and medicine’ (p < .001)). At the same time, ‘Health and medicine’ showed higher lexical availability than ‘Terrorism and crime’ (p = .04).

2.3.3 Analysis 3: Correlation of Lexical Availability Between Native and Non-native Speakers

Despite the fact that in the previous analysis lexical availability values were compared across speakers and categories, this did not clarify whether lexical availability values of words in L1 speakers correlate with the values of the same words in L2 speakers. In order to investigate this, the 100 most available words produced by native speakers in each category were used once again. However, this time the analysis compared the lexical availability values of these words with the same words produced by L2 speakers. Bivariate nonparametric correlations (Spearman’s rho) were performed on the data from each semantic category. As observed in Table 2.3 and Fig. 2.4, there was a high correlation between lexical availability values in L1 and L2 speakers across the different categories. Both basic (‘Body parts’, and ‘Food and drink’) and both advanced semantic categories (‘Economy and finance’, and ‘Terrorism and crime’) showed a highly significant correlation (p < .001). It is also important to notice that basic categories, especially ‘Body parts’ showed higher correlations than advanced categories.
Table 2.3

Correlations of lexical availability in L1 and L2 speakers for the first 100 words in each category

 

Semantic categories

Spearman’s rho

Basic

‘Body parts’

.79***

‘Food and drink’

.51***

Advanced

‘Terrorism and crime’

.45***

‘Health and medicine’

.47***

Note: *** = <.001

Fig. 2.4

Correlation of lexical availability for the 100 words with the highest values in L1 speakers and their translation equivalent in L2 speakers

Despite the fact that there were very significant correlations between L1 and L2 speakers, a substantial number of words with high lexical availability values (among the 100 most available) produced by L1 speakers were not elicited by L2 speakers. Some examples include kidney, fingernail, and torso in ‘Body parts’; crisp, carbohydrate, and protein in ‘Food and drink’; september-eleventh, saddam-hussein, and burglary in ‘Terrorism and crime’; paracetamol, nhs, and penicillin in ‘Health and medicine’. See Appendix 2.4 for full list.

2.4 Discussion

The ultimate aim of this investigation was to compare the lexicon of native speakers and advanced students of English as a second language across different semantic categories, using number of words produced and lexical availability as dependent variables. Another important aim was to examine whether basic (e.g., ‘Food and drink’) and advanced (e.g., ‘Health and medicine’) semantic categories would show divergent results across L1 and L2 speakers.

The first part of the investigation focused on the average number of words produced by each participant in each semantic category: ‘Body parts’, ‘Food and drink’, ‘Terrorism and crime’, and ‘Health and medicine’. The first hypothesis stated that L1 speakers would outperform L2 speakers regarding number of words produced in each semantic category. The results confirmed this hypothesis since L1 speakers clearly produced a higher number of words than L2 speakers across all semantic categories. This was not surprising considering that native speakers are exposed to their mother tongue at all times, so they clearly get more exposure to the language than L2 speakers. However, it is important to notice that this might not be the only reason why L1 performed better than advanced L2 language users. Another important factor could be the fact that native speakers in this study were all monolinguals, so they were able to elicit words in their mother tongue without facing competition from words in another language. There is widespread evidence suggesting that bilingual language processing is nonselective (e.g., Ferreira 2011; Dijkstra 2005; Costa et al. 1999), which means that words from both languages become activated and compete for selection during word production. In this particular case, it is possible that when the L2 participants were asked to produce words from a given semantic category, they encountered more difficulties than monolinguals to select appropriate words in their L2, due to possible interferences from the L1. This implies that L2 speakers might have lost time suppressing words in their L1 in order to only elicit words that are part of the L2 lexicon. This almost unnoticeable phenomenon is also likely to increase the demand on memory resources, which can certainly delay word production or make the process more error-prone (Ferreira 2011).

The second hypothesis in this study predicted that basic semantic categories would show a greater number of words than advanced categories across both groups of participants. This hypothesis was also confirmed since the two basic categories (‘Body parts’ and ‘Food and drink’) showed significantly more words than the advanced categories (‘Terrorism and crime’, and ‘Health and medicine’). This might be because words from basic categories tend to be more familiar than words from advanced categories, thus fostering production and increasing lexical availability. This explanation is in line with the results of a multilevel regression analysis performed by Hernández-Muñoz et al. (2006), which showed that familiarity was one of the strongest predictors of lexical availability. Familiarity has been defined as a measure of how often people think of concepts or things and is obtained by having participants rate different concepts (Cycowicz et al. 1997). Based on these ratings, familiarity has been found to influence word recognition (e.g., Cuetos et al. 2002) and word production (e.g., Ellis and Morrison 1998). In a lexical availability test, where participants are required to produce words from different semantic categories, words with higher familiarity are more likely to get activated and, consequently, elicited than words with lower familiarity. Another factor that can also influence the number of words produced in a lexical availability task is age of acquisition (AoA) or order of acquisition. This variable is a strong predictor of accuracy and speed in different language tasks such as reading (Monaghan and Ellis 2002; Morrison and Ellis 2000) and object naming (Carroll and White 1973; Bates et al. 2001; Ellis and Morrison 1998; Snodgrass and Yuditsky 1996). Since words belonging to basic categories are likely to be learned early in life (e.g., head, water), they might be easy to activate and elicit as opposed to words from advanced categories, which are more likely to be learned at a much later stage in life (e.g., murder, drug). Given the above, AoA seems to be another important factor contributing to the difference between basic and advanced semantic categories.

The fact that both L1 and L2 speakers produced more words for basic than advanced semantic categories reflects, to some extent, that both groups of language users behaved similarly regarding category type. The only difference found was outlined by the significant group x semantic category interaction, which reflects the advantage of ‘Terrorism and crime’ over ‘Health and medicine’ (advanced categories) only present in native speakers. This suggests that despite the underlying differences in vocabulary production between the two groups, the words acquired by L2 speakers throughout the learning stages take similar pathways to those of native speakers. Perhaps this represents similarities in exposure, acquisition, and organization of the words in the mental lexicon.

The second set of predictions involved lexical availability values. In line with the number of words produced, it was expected that lexical availability values would be much higher for native speakers than for L2 speakers. However, this hypothesis was not confirmed since no difference between the two groups of speakers was found. This result might suggest that both native and non-native speakers show similar patterns of organization for the most available vocabulary, independently of whether they know the same words. This result is supported by the fact that no interaction between group and semantic category was found, which implies that both groups of language users show a similar pattern of behavior when producing words from different semantic categories.

It was also predicted that basic categories would show an advantage in comparison with advanced categories. This hypothesis was successfully confirmed since the two basic categories (‘Body parts’ and ‘Food and drink’) showed significantly higher lexical availability values than the advanced categories (‘Terrorism and crime’, and ‘Health and medicine’). This means that words generated from advanced categories varied more across participants than words produced from basic categories. This high variability is explained by the fact that each individual word in the advanced categories was produced by fewer participants. This difference in lexical availability had previously been demonstrated for abstract (‘Intelligence’) versus concrete categories (e.g., ‘Animals’), where ‘Intelligence’ showed lower lexical availability values than four different concrete categories (Hernández-Muñoz et al. 2006), but had never been assessed for proficiency levels such as basic and advanced. An important factor that could help understand underlying differences in lexical availability between semantic categories is familiarity. As stated earlier, it seems that basic categories tend to gather words with higher familiarity than advanced categories. This is supported by the fact that familiarity has been found to correlate strongly with lexical availability (Hernández-Muñoz et al. 2006) and has been reported to benefit performance during different tasks such as object naming (e.g., Ellis and Morrison 1998; Cuetos et al. 1999), and semantic categorization (e.g., Larochelle and Pineau 1994; Malt and Smith 1982). Age of acquisition (AoA) is another variable that has shown a high (negative) correlation with lexical availability (Hernández-Muñoz et al. 2006) and provides more insights into the nature of the available lexicon. In this line, we can argue that basic categories are more likely to contain words acquired early in life than advanced categories since words belonging to basic categories showed higher lexical availability than those in advanced categories.

The third analysis of this study looked at correlations between lexical availability values in L1 and L2 across basic and advanced semantic categories. All categories showed very high correlations between L1 and L2 speakers, but it is important to notice that basic semantic categories, especially ‘Body parts’, showed a much higher correlation than advanced categories. This might suggest that basic categories experience less lexical variability across speakers, and that L2 learners acquire vocabulary from these categories more accurately. Even though all correlations between L1 and L2 speakers were highly significant, a sizeable number of words with high lexical availability values in L1 speakers was not produced by L2 speakers. This shows that despite their high proficiency level, advanced L2 speakers still struggle to produce relatively common words (used by native speakers) when a semantic category is presented as stimulus. The failure to produce these apparently common words might reflect difficulties during the retrieval of the words’ lexical representations. This is likely to be caused by incomplete word knowledge or interference from the nontarget language as has been shown in studies of word production (e.g., Ferreira 2011; Costa 2005; De Bot 1992; Poulisse 1999). It is important to notice that available words, which were not elicited by L2 speakers, might not show differences in performance between L1 and L2 speakers in other tasks such as naming or recognition memory. In a word learning study conducted by Ferreira (2011), L1 speakers clearly outperformed advanced L2 speakers in a production task, where participants were asked to elicit novel words based on orthographic cues and a definition. However, participants did not differ in the two recognition tasks: reading aloud and recognition memory. This implies that advanced L2 speakers are comparable to native speakers in recognition tasks, but they struggle to match native speakers in production tasks. This might be due to the fact that production tasks require participants to activate different components of the words in order to elicit their lexical representation, which is probably harder for L2 speakers, who face competition from the L1 language.

The process of word production seems to be hierarchical (Caramazza 1997; Levelt 1989), so activation flows from conceptual representations to phonological and orthographic representations. This also seems to be the case for bilingual word production (Costa 2005), except that lexical representations from both languages can be activated. In a lexical availability task, participants are asked to produce words from a semantic category (e.g., ‘Body parts’). If we assume that word production is hierarchical, then semantic representations related to ‘Body parts’ would be first activated in the speaker’s mental lexicon. Then activation would spread to the lexical level, where lexical representations start competing for selection. At this stage, hierarchical monolingual models propose the activation of several candidates within the target language (e.g., head, leg, arm, etc., for ‘Body parts’). Since bilingual language processing appears to be nonselective (e.g., Costa et al. 1999; De Bot 1992; Poulisse 1999), hierarchical bilingual models of word production propose that lexical competition also includes words from the nontarget language, in this case, Spanish. Thus, L2 speakers in the current investigation probably also activated words such as cabeza, pierna, and brazo when attempting to produce head, leg, and arm. The fact that words from the nontarget language become activated might make retrieval in the target language harder for L2 speakers than for L1 speakers. This together with the fact that L2 words are also less integrated in the mental lexicon could produce a decline in performance, with L2 speakers failing to elicit words such as kidney in ‘Body parts’ or plaster in ‘Health and medicine’, which are relatively common words in the English language.

In summary, this investigation has provided new insights into the nature of the lexicon of advanced L2 speakers in comparison with monolingual native speakers. It has been demonstrated that the latter outperform non-native speakers regarding number of words produced in each semantic category. However, groups do not significantly differ when comparing lexical availability values for the 100 most available words, which suggests that despite the difference in number of words produced, both L1 and L2 speakers share similar patterns of integration and organization of the words in the mental lexicon. This is also supported by the fact that lexical availability values correlate highly between native and non-native speakers in each semantic category. It is important to notice, however, that despite the similarities across groups in relation to lexical availability values, advanced L2 speakers are still unable to activate and elicit a number of words which are highly available among native speakers. This fact is important to consider when teaching English as a second language. Perhaps the current methodologies used to teach new vocabulary are not completely appropriate, or the vocabulary itself might not entirely correspond to the one used by native speakers. Due to the fact that most textbooks used at school or even in higher education have been elaborated based on word frequency and do not take into account other variables such as lexical availability, they might be failing to capture the average vocabulary that native speakers use in everyday life. It is worth noticing that L1 speakers in this investigation were not specialized in any particular area, so their ‘available’ lexicon is a glimpse inside the average vocabulary used by native speakers of English. Given the results of our investigation, we propose lexical availability as a complement of word frequency in the selection of words for inclusion in different types of ESL materials. By combining both methods we can ensure that ESL students do get access to some words, which even though are highly common in the language, they are not captured by word frequency alone.

Regarding basic and advanced semantic categories, it was discovered that both native and non-native speakers produced more words from basic than advanced categories. Lexical availability values were also higher for basic than advanced semantic categories. These two findings are particularly important since they reveal that L2 speakers follow the same pattern of vocabulary growth and organization as native speakers. This implies that traditional teaching methods, which present vocabulary organized in a progression of units or lessons, are fairly accurate in simulating the way native speakers deal with vocabulary, but probably fail to introduce all relevant lexicon commonly used by native speakers.

In conclusion, our findings suggest that even though L2 speakers resemble native speakers in different aspects of vocabulary development, they might still need to incorporate relevant words to their available lexicon. Noteworthy, we need to be cautious about our claims since the samples used (50 participants in each group) are certainly not representative of the entire population of English native speakers or that of advanced L2 English users. Our research is only a first attempt to directly compare monolingual native speakers and advanced L2 speakers regarding their performance in a lexical availability task, and to provide relevant cognitive explanations about the processes underlying word production. Future studies will require much bigger samples, perhaps from different geographical regions, in order to cover the full spectrum of target populations. It would also be advisable to have tighter control over sociocultural and economic variables that might have an effect on the results.

Footnotes

  1. 1.

    Test of English for International Communication (Educational Testing Service 2012).

  2. 2.

    Diplomas de Español como Lengua Extranjera (Instituto Cervantes 2012).

  3. 3.

    Diplôme d’études en langue française (DELF) and Diplôme approfondi de langue française (DALF) (Centre International d’études pédagogiques 2012).

  4. 4.

    Basic categories correspond to language units introduced at a beginners’ level, whereas advanced categories represent units students learned at an advanced level.

  5. 5.

    See  Chap. 1 by Humberto López Morales in this volume.

Notes

Acknowledgments

This study was partially supported by Proyecto Fondecyt 1050598 (2005). We would like to thank Linda Salter and Paul Salter for their collaboration in collecting the data from the English speakers. We would also like to thank Prof. Lilian Gómez who allowed us to take part of her teaching time in order to conduct the lexical availability test at the University of Concepción. Many thanks to the students from the Royal School Haslemere, and the students from the University of Concepcion, who participated in the study. Finally, we would also like to thank Mariya Bistrina for proofreading this chapter.

References

  1. Anwar Amer, A.A. 1986. Semantic field theory and vocabulary teaching. English Teaching Forum 24: 30–31.Google Scholar
  2. Baayen, R.H., R. Piepenbrock, and H. van Rijn. 1993. The CELEX lexical database (CD-ROM). Philadelphia: Linguistic Data Consortium.Google Scholar
  3. Balota, D.A., and J.I. Chumbley. 1984. Are lexical decisions a good measure of lexical access? The role of word frequency in the neglected decision state. Journal of Experimental Psychology: General 133: 283–316.CrossRefGoogle Scholar
  4. Balota, D.A., M.J. Cortese, S.D. Sergent-Marshall, D.H. Spieler, and M.J. Yap. 2004. Visual word recognition of single-syllable words. Journal of Experimental Psychology: General 133: 283–316.CrossRefGoogle Scholar
  5. Barry, C., C.M. Morrison, and A.W. Ellis. 1997. Naming the Snodgrass and Vanderwart pictures: Effects of age of acquisition, frequency and name agreement. Quarterly Journal of Experimental Psychology 50A: 560–585.Google Scholar
  6. Bates, E., C. Burani, S. D’Amico, and L. Barca. 2001. Word reading and picture naming in Italian. Memory and Cognition 29: 986–999.CrossRefGoogle Scholar
  7. Brysbaert, M., and B. New. 2009. Moving beyond Kučera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behavior Research Methods 41: 977–990.CrossRefGoogle Scholar
  8. Caramazza, A. 1997. How many levels of processing are there in lexical access? Cognitive Neuropsychology 14: 177–208.CrossRefGoogle Scholar
  9. Carroll, J.B., and M.N. White. 1973. Word frequency and age-of-acquisition as determiners of picture-naming latency. Quarterly Journal of Experimental Psychology 25: 85–95.CrossRefGoogle Scholar
  10. Centre International d’études pédagogiques. 2012. Diplôme d’études en langue française DELF and Diplôme approfondi de langue française DALF. http://www.ciep.fr/delfdalf/index.php. Accessed 5 Jan 2012.
  11. Channell, J. 1990. Vocabulary acquisition and the mental lexicon. In Meaning and lexicography, ed. J. Tomasczyk and B. Lewandowska-Tomasczyk, 21–30. Amsterdam: John Benjamins.Google Scholar
  12. Costa, A. 2005. Lexical access in bilingual production. In Handbook of bilingualism: Psycholinguistic approaches, ed. J.F. Kroll and A.M.B. De Groot, 308–325. Cary: Oxford University Press.Google Scholar
  13. Costa, A., M. Miozzo, and A. Caramazza. 1999. Lexical selection in bilinguals: Do words in the bilingual’s two lexicons compete for selection? Journal of Memory and Language 41: 365–397.CrossRefGoogle Scholar
  14. Cuetos, F., A.W. Ellis, and B. Alvarez. 1999. Naming times for the Snodgrass and Vanderwart pictures in Spanish. Behavior Research Methods, Instruments, and Computers 31: 650–658.CrossRefGoogle Scholar
  15. Cuetos, F., G. Aguado, C. Izura, and A.W. Ellis. 2002. Aphasic naming in Spanish: Predictors and errors. Brain and Language 82: 344–365.CrossRefGoogle Scholar
  16. Cycowicz, Y.M., D. Friedman, M. Rothstein, and J.G. Snodgrass. 1997. Picture naming by young children: Norms for name agreement, familiarity, and visual complexity. Journal of Experimental Child Psychology 65: 171–237.CrossRefGoogle Scholar
  17. De Bot, K. 1992. A bilingual production model: Levelt’s speaking model adapted. Applied Linguistics 13: 1–24.CrossRefGoogle Scholar
  18. Dijkstra, T. 2005. Bilingual visual word recognition and lexical access. In Handbook of bilingualism: Psycholinguistic approaches, ed. J.F. Kroll and A.M.B. De Groot, 179–201. Cary: Oxford University Press.Google Scholar
  19. Echeverría, M.S., P. Urzúa, and I. Figueroa. 2005. Dispogen II. Programa computacional para el análisis de la disponibilidad léxica. Concepción: Universidad de Concepción.Google Scholar
  20. Educational Testing Service. 2012. Test of English for international communication TOEIC. http://www.ets.org/toeic. Accessed 5 Jan 2012.
  21. Ellis, A.W., and C.M. Morrison. 1998. Real age of acquisition effects in lexical retrieval. Journal of Experimental Psychology: Learning, Memory, and Cognition 24: 515–523.CrossRefGoogle Scholar
  22. Ferreira, R.A. 2006. Disponibilidad léxica en inglés como lengua materna e inglés como lengua extranjera: Estudio del léxico disponible desde un enfoque psicolingüístico. Unpublished dissertation. Concepción: University of Concepción.Google Scholar
  23. Ferreira, R.A. 2011. Learning the meaning of new words: Behavioural and neuroimaging evidence. Unpublished doctoral dissertation. York: University of York.Google Scholar
  24. Ferreira, R.A., and M.S. Echeverría. 2010. Redes semánticas en el léxico disponible de inglés L1 e inglés LE. Onomazein 21: 133–153.Google Scholar
  25. Hernández-Muñoz, N., C. Izura, and A.W. Ellis. 2006. Cognitive aspects of lexical availability. European Journal of Cognitive Psychology 18: 730–755.CrossRefGoogle Scholar
  26. Instituto Cervantes. 2012. Diplomas de español como lengua extranjera. http://diplomas.cervantes.es/es/informacion/inscripcion_fechas_examen_dele.html. Accessed 5 Jan 2012.
  27. Kucĕra, H., and W.N. Francis. 1967. Computational analysis of present-day American English. Providence: Brown University Press.Google Scholar
  28. Larochelle, S., and H. Pineau. 1994. Determinants of response times in the category verification task. Journal of Memory and Language 33: 796–823.CrossRefGoogle Scholar
  29. Levelt, W.J.M. 1989. Speaking: From intention to articulation. Cambridge, MA: MIT Press.Google Scholar
  30. López-Chávez, J., and C. Strassburguer-Frías. 1991. Un modelo para el cálculo del índice de disponibilidad léxica individual. In La enseñanza del español como lengua materna, ed. H. López-Morales, 91–112. Río Piedras: Universidad de Puerto Rico.Google Scholar
  31. López-Morales, H. 2012. EL Proyecto Panhispánico. http://www.dispolex.com/. Accessed 12 June 2013.
  32. Malt, B.C., and E.E. Smith. 1982. The role of familiarity in determining typicality. Memory and Cognition 10: 69–75.CrossRefGoogle Scholar
  33. McCarthy, M., and F. O’Dell. 2002. English vocabulary in use: Advanced. Cambridge: Cambridge University Press.Google Scholar
  34. Microsoft Corporation. 2007. Windows XP note block. http://windows.microsoft.com/es-XL/windows/products/windows-xp. Accessed 3 Apr 2007.
  35. Monaghan, J., and A.W. Ellis. 2002. What, exactly, interacts with spelling-sound consistency in word naming? Journal of Experimental Psychology: Learning, Memory, and Cognition 28: 183–206.CrossRefGoogle Scholar
  36. Morrison, C.M., and A.W. Ellis. 2000. Real age of acquisition effects in word naming and lexical decision. British Journal of Psychology 91: 167–180.CrossRefGoogle Scholar
  37. Poulisse, N. 1999. Slips of the tongue: Speech errors in first and second language production. Amsterdam: Benjamins.Google Scholar
  38. Pye, G. 2002. Vocabulary in practice 1. Cambridge: Cambridge University Press.Google Scholar
  39. Redman, S. 2001. English vocabulary in use: Pre-intermediate and intermediate. Cambridge: Cambridge University Press.Google Scholar
  40. Richards, C.J., and C. Sandy. 2008. Passages student’s book, 2nd ed. New York: Cambridge University Press.Google Scholar
  41. Richards, C.J., J. Hull, and S. Proctor. 2005. Interchange, 3rd ed. New York: Cambridge University Press.Google Scholar
  42. Snodgrass, J.G., and T. Yuditsky. 1996. Naming times for the Snodgrass and Vanderwart pictures. Behavior Research Methods, Instruments, and Computers 28: 516–536.CrossRefGoogle Scholar
  43. The Math Works Inc. 2005. MATLAB version 7.1. Natick, Microsoft Corporation.Google Scholar
  44. Thorndike, E.L. 1921. A teacher’s word book of 20,000 words. New York: Teacher’s College.Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2014

Authors and Affiliations

  1. 1.Departamento de Lenguas. Facultad de EducaciónUniversidad Católica de la Santísima ConcepciónConcepciónChile

Personalised recommendations