Picture-naming tasks have provided critical data for theories of lexical retrieval and language production and have contributed to our understanding of the organization of the mental lexicon (Levelt et al., 1999). Timed picture-naming paradigms have been used as a tool for determining how easily a lexical representation can be retrieved from memory and as a useful method for assessing real-time language processing (for reviews, see Fiez & Tranel, 1997; Johnson et al., 1996; Snodgrass & Vanderwart, 1980). Timed picture naming has also been used in electrophysiological and neuroimaging studies to assess the neural underpinnings of the cognitive processes involved in word production (Cummings et al., 2016; Saccuman et al., 2006; Schmitt et al., 2000; van Turennout et al., 1997). The time it takes to name a picture in spoken language is influenced by several target-name-specific factors, such as lexical class, lexical frequency, or phonological properties (Alario et al., 2004; Barry et al., 1997; Bates et al., 2003; Belke et al., 2005; Bonin et al., 2002; Cuetos et al., 1999; Cummings et al., 2016; Johnson et al., 1996; Karimi & Diaz, 2020; Snodgrass & Yuditsky, 1996; Szekely et al., 2005). Naming behavior is also impacted by stimulus-specific properties, such as nameability, i.e., how much participants agree on a target name, how consistently participants name the pictures, or the visual complexity of the pictures (Alario et al., 2004; Snodgrass & Yuditsky, 1996).

Notably, in spoken languages, the relationship between the word form and the picture depicting the concept is largely arbitrary and thus should have little or no effect on the kind of processing required for lexical retrieval. In contrast, sign languages often exhibit non-arbitrary form-meaning mappings, i.e., iconicity, to a much greater degree than spoken languages (Taub, 2001). This form-to-meaning mapping could have a unique influence on naming behavior in sign languages and might give rise to modality-specific influences on picture naming. Indeed, several picture-naming studies have reported that iconicity facilitated lexical access and sped up response times for pictures that were named with iconic compared to less iconic or non-iconic signs (Baus, Gutierrez-Sigut, et al., 2008b; McGarry et al., 2020; Navarrette et al., 2017; Pretato et al., 2017; Vinson et al., 2015). However, it is unknown how iconicity uniquely contributes to lexical retrieval and production in sign language when other variables are factored into the analysis (e.g., lexical class, lexical frequency, phonological density, and phonological complexity).

There are numerous large databases and data sets of picture-naming norms available for spoken languages. These resources allow researchers to evaluate the contributions of many lexical and phonological factors to lexical retrieval for spoken or written words (Balota et al., 2001; Bates et al., 2003; Bird et al., 2001; Carroll & White, 1973; Cortese & Khanna, 2008; Cuetos et al., 1999; Snodgrass & Vanderwart, 1980; Snodgrass & Yuditsky, 1996; Stadthagen-Gonzalez & Davis, 2006; Szekely et al., 2004; Szekely et al., 2005; Torrance et al., 2018). Picture-naming studies have also been instrumental in developing psychometric assessments of word retrieval and production abilities in clinical populations (Walker et al., 2018). For example, the University of California, San Diego (UCSD) International Picture Naming Project (IPNP) (https://crl.ucsd.edu/experiments/ipnp/) database for seven spoken languages (Bates et al., 2003) and the Multilanguage Written Picture Naming Dataset (Torrance et al., 2017, 2018) have both provided important resources for clinicians and researchers working on language processing and have facilitated cross-linguistic comparisons. Although picture-naming tasks have been used successfully in sign language experiments (Emmorey et al., 2012; Vinson et al., 2015), there are no existing databases or data sets of picture-naming norms suitable for sign language research that parallel those for spoken languages. Further, one cannot assume that pictures selected and standardized for spoken languages are appropriate for use with sign languages, particularly given the possible effects of iconicity. An understanding of how the lexical and phonological properties of signs influence picture naming in American Sign Language (ASL) will provide a valuable contribution to theories of lexical processing and retrieval that have been dominated by evidence from spoken languages. We next give an overview of several important factors that are known to influence picture naming in spoken languages and what is currently known about whether and how these factors might influence lexical retrieval in sign language production.

Lexical class is an important organizational principle of lexical knowledge (Shapiro et al., 2006; Vigliocco et al., 2011). Studies of picture naming in spoken languages have revealed potential differences between the mental representations of nouns (i.e., object names) and verbs (action names). These studies indicate that object names tend to be retrieved faster and with better agreement than action names (Bayram et al., 2017; Khwaileh et al., 2018; Mätzig et al., 2009; Szekely et al., 2005). This pattern holds even when the stimuli are matched for relative difficulty in terms of the target name (e.g., age of acquisition) and picture-related properties (e.g., picture complexity) (Szekely et al., 2005). The semantic representations of verbs tend to be more complex with fewer shared features than nouns (Vigliocco et al., 2004), and verbs tend to be morphologically and syntactically more complex than nouns. These properties may render verbs more difficult to retrieve. It remains unclear whether lexical class similarly impacts picture-naming behavior in sign languages, as no study to our knowledge has compared object and action naming in a sign language. Nonetheless, we hypothesize that the semantic and syntactic properties that distinguish verbs from nouns hold for sign languages as well (e.g., Sandler, and Lillo-Martin, 2006), and thus predict parallel results, with faster ASL naming times for object than action pictures.

Lexical frequency of the target name is one of the most robust predictors of the speed of lexical retrieval in picture naming, with high-frequency words being retrieved faster than low-frequency words for both object naming (Barry et al., 1997; Bates et al., 2003; Bonin et al., 2002; Cuetos et al., 1999; Cuetos & Alija, 2003; Oldfield & Wingfield, 1964, 1965; Snodgrass & Yuditsky, 1996; Szekely et al., 2005) and action naming (Cuetos & Alija, 2003). Similarly, the frequency of lexical signs for objects has been shown to impact picture naming across sign languages (Spanish Sign Language (LSE): Baus, Costa, & Carreiras, 2008a; ASL: Emmorey et al., 2013). We expected to replicate frequency effects for ASL with a larger data set, but we note that how lexical frequency is measured differs for signed and spoken languages. Frequency is typically assessed by word counts in corpora that contain millions of words, but no such large corpus currently exists for ASL, or for any other sign language, although smaller corpora do exist (Schembri et al., 2014). For sign languages, frequency is generally assessed by subjective ratings from deaf signers (Carreiras et al., 2008; Caselli et al., 2017).

The phonological properties of the target name, such as length, phonological complexity, and phonological neighborhood density (PND), have been shown to index the ease of lexical encoding in spoken languages. Several studies have found that longer words (measured by the number of syllables or phonemes) lead to longer picture-naming latencies than shorter words (Cuetos et al., 1999; Roelofs, 2002; Santiago et al., 2000; Szekely et al., 2005). Phonologically complex words, e.g., words containing an initial liquid consonant, tend to be retrieved more slowly than phonologically simpler words, e.g., those beginning with a stop consonant (Cummings et al., 2016). Neighborhood density in spoken language is typically calculated as the number of words in the lexicon that share all but one phoneme with the target word (e.g., Garlock et al., 2001; Luce & Pisoni, 1998). Speakers name pictures with words belonging to denser neighborhoods faster than those belonging to sparser neighborhoods (Baus, Costa, & Carreiras, 2008a; Vitevitch, 2002), suggesting that lexical selection and production is facilitated by the number of neighbors. Phonologically related words may boost activation of the target word phonology, which facilitates word retrieval and phonological encoding.

A crucial issue in sign language research is how to optimally determine phonological properties of signs to derive measures such as sign length, complexity, or neighborhood density. Phonological parameters, such as handshape, location, and movement, occur more often simultaneously than sequentially, and most signs are monosyllabic (Brentari, 1998). Sign length may be defined as the time it takes to articulate the sign in milliseconds, i.e., sign duration, which may or may not map onto the phonemic length. For example, a sign with a long path movement (NORTH)Footnote 1 and a sign with a short movement (ZERO) can have the same number of phonological segments, despite a difference in articulatory duration. Nonetheless, previous studies have found that signs that are shorter in duration tend to be more frequent (Börstell et al., 2016;Caselli et al., 2017 ; Sehyr et al., 2021). Further, sign length effects have been reported to play a role in working memory such that long signs, which contain a location change, were recalled less accurately from short-term memory than short signs (no location change) (Wilson & Emmorey, 1998). It remains unclear whether and how sign duration impacts picture-naming latencies in ASL. The fact that signs have a more simultaneous phonological structure than spoken words could reduce the predictive power of sign duration on lexical retrieval times.

In contrast to what has been reported for spoken language, phonological complexity was not found to be a significant factor in predicting picture-naming latencies in British Sign Language (BSL) (Vinson et al., 2015), although the authors acknowledged that the lack of effects could have been due to an inadequate measure of phonological complexity (derived from Mann et al., 2010). Here we use a new measure of phonological complexity developed by Morgan et al. (2019), which provides a more nuanced measure of sign form complexity. If form complexity impacts production similarly for signed and spoken languages, then we predict slower naming times for pictures named with signs that are phonologically more complex.

To our knowledge, the effects of PND on sign production have not been studied, although phonological relatedness was found to impact lexical retrieval in a picture–sign interference paradigm. Baus, Gutierrez-Sigut, et al. (2008b) reported faster picture-naming times when a superimposed sign had the same handshape as the target sign, but slower naming times when the superimposed and target signs shared location. Carreiras et al. (2008) found a similar inhibitory effect of location neighborhood density on sign recognition. However, these studies did not examine the effects of phonological density defined as the number of signs that share all but one phonological feature or parameter. Recently, Caselli et al. (2021) investigated the effects of PND on ASL sign recognition using a lexical decision task and taking advantage of the ASL-LEX database, which provides measures of PND for ASL signs, defined in a manner that is parallel to spoken languages (Caselli et al., 2017; Sehyr et al., 2021). Similar to results from spoken languages (e.g., Andrews, 1989; Lim, 2016), Caselli et al. (2021) found that ASL signs with high PND were recognized more slowly than those with low PND, and this effect was strongest for low-frequency signs. Thus, for comprehension, phonologically similar words or signs compete for recognition and slow response times. In contrast, for production, phonologically similar words tend to facilitate lexical retrieval such that picture-naming RTs tend to be faster for words with high PND (Baus, Costa, & Carreiras, 2008a; Vitevitch, 2002). In the present study, we investigated whether signs residing in denser phonological neighborhoods were retrieved faster than those in sparse neighborhoods, or whether the distinct contribution of location neighbors (e.g., inhibition) might alter this pattern for production.

The time required to name a picture also depends on several stimulus properties, such as the “nameability” and the complexity of the picture. The extent to which participants converge on a name has been a robust predictor of naming latencies in spoken languages. Pictures with better target name agreement and fewer alternative names (synonyms) tended to be named faster and more accurately than pictures with low name agreement or pictures with many different possible names (Barry et al., 1997; Bates et al., 2003; Cuetos et al., 1999; Ellis & Morrison, 1998; Griffin, 2001; Snodgrass & Yuditsky, 1996). This finding may be due to inhibitory effects from lexical competitors (Bates et al., 2003). That is, when name agreement is low and multiple names are available to describe a given picture, the increased lexical competition and selection demands make word retrieval more arduous. Conversely, when name agreement is high and few names are available, the lower competition and selection demands might facilitate retrieval. It remains unattested whether name agreement or the number of alternative ASL signs for a given picture would similarly influence response latencies. Based on what we know from spoken languages, we would expect faster naming times for pictures with better consistency of naming. However, it is possible that naming consistency might have a less robust effect on ASL naming compared to spoken languages because sign language lexicons are smaller. There is presently no firm estimate of the average ASL vocabulary size for adults, but ASL dictionaries or databases typically list between 2000 and 5000 signs.Footnote 2 Compared to English, the ASL lexicon is likely to be smaller due in part to its relative youth—ASL is only a few hundred years old. The size of the English lexicon has grown steadily over the last five centuries (Michel et al., 2011), and estimates of English vocabulary size for adults can be as high as 58,000 words (Nagy & Anderson, 1984). There could be other reasons why signed language lexicons might be smaller, such as the signers’ ability to borrow from the majority spoken language, or the ability to modify signs for unique meanings that means fewer unique signs are needed. Thus, lexical competition at retrieval might be weaker for ASL than English. We additionally tested this hypothesis by directly comparing English name agreement and ASL name agreement for the same pictures.

Finally, one of the factors that might influence performance in picture naming is the visual complexity of the image. Visual processing is a necessary initial stage in picture naming including both low-level (e.g., processing shapes and lines) and high-level processes (e.g., object or scene recognition). The objective visual complexity (OVC) of the stimuli is defined as the amount of detail or intricacy of the line in a picture (Snodgrass & Vanderwart, 1980). This type of picture complexity has been found to increase retrieval times even after controlling for a host of other factors (e.g., frequency, length) (Alario et al., 2004; D’Amico et al., 2001; Szekely et al., 2005). We thus included the same measure of visual complexity to examine whether picture complexity influenced sign retrieval differently from spoken word retrieval when other factors were accounted for.

For the purpose of cross-linguistic comparison, we compared the aggregated data for the same pictures from ASL (gathered in this experiment) with the English data gathered in previous studies by Szekely and colleagues (2005). Our goal was to gain further insights into the suitability of these pictures for naming in ASL and to see how the languages compare on action versus object naming more generally. This basic comparative analysis also provides a springboard for a deeper examination of the differences in picture naming in spoken and signed languages.

In sum, we used a timed picture-naming paradigm to examine factors that are likely to be modality-independent and are predicted to exhibit similar effects on lexical retrieval in sign and spoken languages (e.g., lexical class, frequency, name consistency) and factors that may be modality-specific (e.g., phonological variables, iconicity). The goals of this study were (1) to gather picture-naming data in ASL for object and action pictures and establish a normative data set of pictures that correspond to specific signs that can be used by researchers, educators, or clinicians, (2) to examine the influence of theoretically relevant psycholinguistic variables (lexical class, frequency, phonology, and iconicity) and picture properties (target name agreement, visual complexity) on picture-naming behavior assessed by reaction times (RTs) and target name agreement, and (3) to compare ASL naming data with spoken English data obtained from Szekely et al. (2005) for a large subset of overlapping pictures. The pictures (line drawings) used in this study are available for preview on the Open Science Framework (OSF) (https://osf.io/25mga/), and the ASL picture-naming data set is also available for download from the OSF site. This data set is intended to serve as an important resource for researchers designing experiments or developing clinical or vocabulary assessments in ASL. The pictures used in this study are also available for reference on the OSF page (some images are subject to copyright; contact the UCSD IPNP database for permission). This data set facilitates further research on lexical retrieval and production in signed languages.



Twenty-one deaf ASL signers (Mage = 32 years, SD = 6, age range 22–49 years; 13 native and 8 early-exposed signers; 12 female) participated in the picture-naming experiment. Participants were deaf from birth or became deaf soon after birth and reported using ASL as their main language for communication. Native signers were exposed to ASL from birth from deaf parents or caretakers, and early signers were exposed to ASL before age 7 from relatives and/or educators. All participants had normal or corrected-to-normal vision and reported no pre-existing neurological or cognitive impairments. All participants were proficient ASL signers as determined by two existing standardized ASL assessments. The signers performed with 90% accuracy (SD = 9%) on the ASL Comprehension Test (ASL-CT) (Hauser et al., 2015), and with 70% accuracy (SD = 13%) on the ASL Sentence Repetition Test (ASL-SRT) (Supalla et al., 2014). This accuracy was comparable to the published results: the average performance of deaf native ASL signers was 85% on the ASL-CT test published in Hauser et al. (2015) and was 74% on the ASL-SRT published in Supalla et al. (2014). Finally, the deaf native and early-exposed signers in this study did not differ on either measure of ASL proficiency (p = .535 and p = .076, respectively)Footnote 3.


The stimuli were 524 black-and-white line drawings—426 pictures were taken from the UCSD International Picture Naming Project (IPNP) (https://crl.ucsd.edu/experiments/ipnp/) (Bates et al., 2003; Szekely et al., 2004; Szekely et al., 2005), and the remaining 98 images originated from Caselli et al. (2020). There were 272 images that represented objects and 252 images that depicted transitive and intransitive actions—one action picture from the IPNP data set (act206cut.jpg)Footnote 4 was excluded completely from the data set, as the participants reported difficulties recognizing that picture, resulting in a data set for 523 pictures (272 objects and 251 actions). For all stimulus items, we extracted the log10 word frequency from SUBTLEXUS (Brysbaert & New, 2009) based on the empirically determined English target name (Sehyr et al., 2021; Szekely et al., 2005). The difference in log10 word frequency between objects (M = 2.97, SD = .63; CI [2.95; 3]) and actions (M = 3.12, SD = .70, CI [3.09; 3.14]) was not significant, F(1, 518) = 3, p = .072. For the 425 pictures from UCSD-IPNP that were included in the cross-linguistic comparison, we also extracted the OVC values that were estimated on the file size in JPEG format (Snodgrass and Vanderwart; Szekely et al., 2005). In this subset, action pictures (M =16,732, SD = 8801; CI [16532; 17,112]) were more visually complex than object pictures (M = 23,600; SD = 7717; CI [23,251; 23,767]), F(1, 413) = 86, p < .001.


All participants were seated in front of a computer with the keyboard located in front of them in a well-lit room. Instructions were provided in ASL and written English (see Appendix for English instructions). Object and action pictures were blocked for naming, and the order of blocks was counterbalanced across participants. Participants were asked to name the pictures as quickly as possible, using a single sign response, and to avoid phrases and descriptions. To start viewing the picture, participants used both hands to hold down the space bar. The use of both hands on the space bar was monitored and reinforced throughout the experiment for consistency. After pressing the space bar, a fixation cross appeared on the screen for 500 ms, followed by a stimulus picture which disappeared as soon as the participant lifted their hands to name the picture. The stimulus automatically disappeared after 3000 ms, and a blank screen remained on display until the participants released the space bar. After producing the sign, participants pressed and held the space bar again to view the next picture. A video camera was positioned just above the monitor and continuously recorded the participant’s responses. Participants were instructed to only release the space bar when they were ready to name the picture. Signed responses were video-recorded and imported into the ELAN annotation tool (2019 Version 5.8) for scoring and analysis.

Scoring and analysis

Our scoring criteria were modeled on Szekely et al. (2005) and Snodgrass and Vanderwart (1980), with modifications for responses in ASL. For example, the signed responses were coded using English glosses. A team of four proficient ASL signers coded the signed responses: two deaf (one native signer exposed to ASL from birth, one late signer exposed to ASL after age 8) and two hearing proficient ASL signers (one native signer exposed to ASL from birth, one late signer exposed to ASL after age 8). All coders were linguistically trained. Coding proceeded in the following steps. First, each response was annotated with trial attributes (block—object or action, trial number, and picture code) and glossed/transcribed using separate tiers in ELAN. Second, annotated trials were identified as either accepted or rejected responses. A response was accepted if the ASL sign correctly referred to the concept depicted in the picture and was a “clean,” codable production (e.g., no false start). Such trials were referred to as “valid” in Szekely et al. (2005; p. 5). For example, for a picture depicting “a cherry” (obj091cherry.jpg), we accepted all phonological, morphological, and semantic variants of the ASL sign CHERRY, and for a picture depicting “a canoe” (obj078canoe.jpg), we accepted both the sign CANOE and BOAT because they are both semantically appropriate labels for the depicted concept. We determined phonological variants based primarily on entries in ASL Signbank (Hochgesang et al., 2017). If an entry in ASL Signbank was not available, we considered signs to be phonological variants when a parameter modification did not result in a distinct lexical meaning. Examples of phonological variants include the sign for “vacuum,” where the number of hand movements varied from one to three, or the sign for “story,” which can be produced with three distinct handshapes (see ASL Signbank).

A response was rejected if it incorrectly labeled the concept; for example, if a participant produced the ASL sign APPLE for the picture depicting a cherry, the response was coded as rejected (5% of all trials were rejected due to an incorrect name). Further, responses were rejected from the main analysis if: the response was a phrase or multi-sign response (4% of trials), e.g., “MILITARY+MARCH+#MARCH” for a picture depicting the action “march” (act137march.jpg), “SWEATER + HOODIE” for a picture of “a sweater” (obj433sweater.jpg), or “a person is feeling down” for a picture depicting a verb “cry” (act095cry.jpg). Due to the timed paradigm, we rejected responses that contained additional descriptive information because these trials would distort RTs related to the retrieval of the target lexical item itself and the length of the signed responses. Further, we rejected responses that contained false starts, pauses, or hesitations, or that were incomplete (3% of trials) or when the participant produced “don’t know” or no response (2% of trials). We rejected responses that were fingerspelled when a commonly used lexical sign was known to exist (1% of trials; e.g., D-I-N-O-S-A-U-R) since the lexical status of fingerspelled words is not completely clear. Fingerspelling might be stored differently or separately from lexical signs. However, fingerspelling could also signal difficulty or failure to access or retrieve a lexical sign. We felt that treating such fingerspelled responses on par with lexical signs would not be entirely appropriate. We did, however, accept fingerspelled responses that are commonly used to describe the concept, e.g., #DOG, or #BUS.

We also rejected responses that were uncodable or unrecognizable by the coders (1% of trials) or that were a mimetic or gestural depiction (1% of trials, i.e., the participant mimed the action of the agent in the picture). Additionally, 1% of trials were rejected because the participant timed out (i.e., recorded their response, valid or not, outside the 3000 ms response window). A total of 18% of responses were rejected from the analysis.

Note that our procedure for coding accepted responses was simplified from the coding system used by Szekely et al. (2005) in that we did not separate the different categories of lexical codes; that is, their Lexical Codes 1–3 (target word, morphological variant, and semantic variant) were all treated simply as accepted responses in our study. For their Lexical Code 4 (“other”), we accepted semantically related responses (e.g., BOAT for CANOE), but we excluded Lexical Code 4 responses that were hyponymy (e.g., MAN for DOCTOR, or BIRD for EAGLE or ROOSTER) and frank visual errors (e.g., APPLE for CHERRY) or responses that were completely unrelated to the target concept (e.g., TALK for MARRY). In essence, we had only had two “codes”—accepted (including the target) and rejected responses—whereas Szekely et al. (2005) used four Lexical Codes, i.e., 1 = target, 2 = morphological variant, 3 = synonym, and 4 = “other,” which included hyponyms, visual errors, and completely unrelated responses.

In the final step of coding, all accepted responses were cross-referenced with the ASL-LEX database (Sehyr et al., 2021) and ASL Signbank (Hochgesang et al., 2017) and annotated using the corresponding ASL-LEX or Signbank Entry ID labels. If a response was not in ASL-LEX or Signbank, a deaf native ASL signer provided an appropriate English gloss that was distinct from existing glosses in ASL-LEX. Note that the English glosses only loosely capture the meaning of the ASL sign and are used for sign stimulus identification purposes only.

Name acceptability rate and target name agreement

Name acceptability rate (TOT). This measure is the proportion of trials on which participants provided an accepted (“valid”) response with a usable RT (a “clean” production) for a given picture. This value reflects the “acceptability rate” rather than accuracy—the proportion of responses we deemed as valid (accepted) based on the above criteria. Because our criteria for accepting/rejecting responses were decisions about what constitutes usable data for this picture set, a low acceptability rate may indicate that the picture was not easily nameable in ASL, rather than the difficulty in lexical retrieval. There were, in fact, no a priori ASL names for these pictures.

ASL target name and target name agreement (TAR). The ASL target name was determined empirically as the most frequent (dominant) sign produced for each picture (“Lex1dom” in Szekely et al., 2005). Next, the TAR value is the proportion of all accepted trials on which participants provided the most frequent name. This value reflects the extent to which participants agreed on the target name for a given picture and corresponds to “Lex1dom” in Szekely et al. (2005).

Alternative names and naming consistency. These two measures provide an index of picture “nameability.” The number of alternative names represents the number of accepted ASL names for each picture, including the target name itself. Following Snodgrass and Vanderwart (1980) and Szekely et al. (2005), we calculated the H statistic for each item. The H statistic delimits the naming consistency, i.e., the extent to which participants converged on a name, and is weighted for the number of participants producing each name type. It is computed using the following formula:

$$H=-\Sigma \left({\mathrm{P}}_{\mathrm{i}}\ast \ln {\mathrm{P}}_{\mathrm{i}}\right)$$

Pi is the proportion of participants that produced an accepted name for a picture (the number of all names and target names produced for each picture). The higher the H value, the more diverse the naming responses for a given picture. An H value of “0” indicates perfect name agreement (maximum convergence).

Reaction time and response duration

Reaction time (RT, milliseconds) was recorded as the time between the picture onset and the time the participant released their hands from the space bar to produce an ASL response. Average RTs and standard deviations were calculated for all trials with an acceptable ASL response. RTs to target (dominant) responses were the main variable of interest in the analyses. Average response duration (milliseconds) was used as a proxy for sign length and represents the time between when participants lifted their hands from the space bar to name the picture and when they placed their hands back on the space bar to initiate the next trial. Because this measure includes transitional movements to and from a rest position, it represents an approximate measure of sign length and is rather different from the word length measures used in spoken languages (e.g., number of letters or syllables). Again, only response duration to target signs was used in the analysis. Importantly, the average response duration of target signs correlated significantly with the sign duration measures from ASL-LEX which excluded transitional movements (r = .419, p < .001). Response duration seems to be a reasonable proxy for sign length.

Target response properties

Once we determined what the target (dominant) name for each picture was, we retrieved the following lexical and phonological properties for the target signs from the ASL-LEX database (http://asl-lex.org/) (Sehyr et al., 2021; Caselli et al., 2017) (see Table 2 for descriptive statistics). Lexical frequency (Freq) (z-transformed) represents the average subjective frequency rating recorded by a group of deaf ASL signers on a scale of 1–7, where 7 represents a high frequency of occurrence. Iconicity (Icon) of signs (z-transformed) represents the average rating based on the perceived relationship between the form and meaning as rated by a group of hearing non-signers on a scale of 1–7, where 7 represents a high iconicity of a sign. Phonological complexity was calculated based on the criteria set out in Morgan et al. (2019). Specifically, signs in ASL-LEX were scored for complexity in seven categories, receiving one point if they met the category description and zero points if they did not. The maximum complexity score for any given sign is 7. For the phonological neighborhood density (PND) measure, we utilized the generic neighborhood density measure, which defines neighbors as the number of signs that share all but one phonological feature with the target (features included sign type, location, movement, selected fingers, and finger spread, among others)Footnote 5. For further details and methodological procedures related to obtaining the values for all these lexical properties, see Sehyr et al. (2021) and Caselli et al. (2017). These properties were available for a total of 477 target names. Response duration (ms) was calculated as the average duration of the target (dominant) responses for each item. Only the target (dominant) response properties were included in the regression analyses of the influence of lexical and phonological variables on naming behavior in this study.


We first report summary results for overall acceptability (TOT) and target agreement rate (TAR) for all items, as well as separately for object and action naming (Table 1), and provide RT and other descriptive properties of the target (dominant) ASL responses (Table 2). We next examine correlations among target RTs, target agreement, lexical properties of the target name, and picture properties for all items (Table 3), as well as separately for object and action pictures (Table 4). To examine the predictive strength of these variables on target RTs and target agreement (i.e., the proportion of responses on target), we ran linear mixed-effects models (Tables 5 and 6). Finally, we directly compared naming RTs, acceptability rate and target name agreement, number of alternative names, and naming consistency (H statistic) for ASL and English for a subset of pictures (Table 7).

Table 1 Average proportion of acceptability rate (TOT), target agreement (TAR), and naming consistency measures
Table 2 Descriptive characteristics of target TAR (dominant) ASL responses
Table 3 Correlations among target name RTs (RT TAR) and target agreement (TAR), naming consistency (H), the number of alternative names (Alt names), the target name properties, and stimulus properties
Table 4 Correlations among target name RTs (RT TAR) and target agreement (TAR), naming consistency (H), number of alternative names (Alt names), the target name properties, and stimulus properties., separately for object naming (top) and action naming (bottom)
Table 5 Summary of the linear mixed-effects regression model for RTs
Table 6 Results of target agreement analysis (GLMMs)
Table 7 Average acceptability (TOT) rate (proportion), target agreement (proportion), target RTs, and naming consistency (alternative names, H) for ASL and English picture naming

Naming object and action pictures in ASL

Table 1 provides a summary of naming acceptability rate (TOT; i.e., the proportion of accepted responses), target agreement (TAR; i.e., the proportion of dominant responses), the average number of alternative names produced for each picture (one measure of name agreement), and name consistency (H statistic) for all pictures, and object and actions separately. A total of 7358 response trials were on target.

A binomial regression analysis revealed that the acceptability rate (TOT), i.e., the proportion of accepted responses, was 83% and was significantly higher for objects (88%) than actions (76%), B = –.86, SE = .05, Wald χ2 (1) = 268, p < .001, exp(B) = .42; 95% CI [.38; .47]; log likelihood = 9856, R = .03 (Cox & Snell), .04 (Nagelkerke). Target agreement (TAR) was 67% and was also higher for objects (76%) than actions (59%), B = –.78, SE = .04, Wald χ2 (1) = 346, p < .001, exp(B) = .46; 95% CI [.42; .50], log likelihood = 13,322; R = .03 (Cox & Snell), .05 (Nagelkerke). The average number of alternative names was 2.2. An analysis of variance (ANOVA) revealed that signers produced fewer alternative names for objects (M = 1.9, range 1–7) than for action pictures (M = 2.5, range 1–7), F(1, 522) = 528, p < .001. Similarly, name consistency (measured by the H statistic) was better for objects (.39) than for actions (.61), F(1, 522) = 31, p < .001. Examples of pictures that yielded 100% acceptability rate and perfect name agreement (i.e., H = 0) were “horse” (horse.png) and “boxing” (act022box.jpg); examples of pictures with seven alternative ASL names and low name consistency (but good acceptability rate [TOT] > 75%) were “pineapple” (obj320pineapple.jpg) and “explode” (act078explode.jpg). We found that 33 action pictures and 116 object pictures were named with ≥ 90% target name agreement.

Next, Table 2 provides a summary of average RT and response duration (m) for the target (dominant) ASL names, and a summary of the lexical and phonological characteristics of the target ASL names for each picture.

Objects were named faster than actions, F(1, 7296) = 1267, p < .001, and sign responses to objects were shorter than responses to actions, F(1, 7354) = 376, p < .001). Target object names were less iconic than action names, F(1, 485) = 19, p < .001, and less phonologically complex than action names, F(1, 485) = 9.5, p = .001. The target object and action names did not differ in frequency (p = .324) or neighborhood density (p = .745).

Factors predicting picture-naming behavior in ASL

To characterize variables that influence picture naming in ASL, we first examined the correlational relationships among the target RTs and target agreement, lexical and phonological characteristics of the target ASL responses, and picture properties. Table 3 shows the correlations, and Table 4 shows the correlations separately for objects and actions.

Shorter signs were faster to retrieve, as naming RTs for target names positively correlated with target name response duration, although this result was significant only for action naming. RTs were longer for pictures that yielded worse target name agreement, and this result held separately for objects and actions. Indicators of naming consistency also correlated with RTs; that is, poorer naming consistency (H) and more alternative names were associated with slower RTs, suggesting that pictures with more competitors led to slower retrieval. However, this pattern was only found for action naming.

Pictures named with higher-frequency target names were named faster than pictures with less frequent names, although this result was significant only for object naming. High-frequency signs were also shorter in duration than low-frequency signs. Pictures with more iconic target names were named faster than pictures named with less iconic targets. For action naming, the target sign iconicity also correlated with better naming consistency (H); that is, pictures with iconic targets received fewer alternative names. However, iconicity was unrelated to the duration of signed responses. The phonological characteristics of the target signs (complexity and PND) were also unrelated to the target RTs and the proportion of target names. Phonological complexity was associated with longer response durations for target action names. Finally, objective visual complexity (OVC) was associated with slower RTs and longer response durations, but only for action naming.

Next, we examined the predictive influences of these variables on target-naming RTs and target name agreement. To analyze RTs, we ran linear mixed-effects (lmer) models from the lme4 package (Bates et al., 2011) in RStudio (Version 1.1.456, R Development Core Team, 2018). To analyze binomial agreement data, we ran a generalized linear mixed-effects (GLMM) model. Lmer models can account for the variability in the effects within random effects (e.g., participants, pictures), and are therefore less susceptible to spurious results and provide more generalizable outcomes. To ensure a normal distribution of RT data for these statistical analyses, we applied log transformation on raw RTs (following Snodgrass & Yuditsky, 1996). We centered (z-transformed) all other variables except phonological complexity scores and H values. We also checked for multicollinearity among all independent (fixed) variables included in the linear model using the car package in R (Zuur et al., 2010); all variance inflation factor (VIF) values for all independent variables in the model were ≤ 1.333).

We included eight fixed variables in the model: lexical class (object, action), frequency, iconicity, response duration, phonological complexity, phonological neighborhood density (PND), naming consistency (H statistic), and objective visual complexity (OVC) of the pictures; the R syntax is shown in (1). A total of 5461 observations that were the target (dominant) responses with valid RTs, and phonological and lexical values available from ASL-LEX, comprised the data set. Significant effects are listed in Table 5 in order of prediction strength and significance.

$${\displaystyle \begin{array}{c}\mathrm{aslrt}=\mathrm{lmer}\Big(\mathrm{ASLRTlg}\sim \mathrm{LexicalClass}+\mathrm{H}+\mathrm{FreqZ}+\mathrm{IconZ}+\mathrm{Complexity}+\\ {}\begin{array}{c}\mathrm{PNDz}+\mathrm{OVCz}+\mathrm{Durationz}+\left(1|\mathrm{Participant}\right)+\left(1|\mathrm{Stimulus}\right),\mathrm{data}=\mathrm{PicNaming},\\ {}\mathrm{control}=\mathrm{lmer}\mathrm{Control}\left(\mathrm{optimizer}="\mathrm{bobyqa}",\mathrm{optCtrl}=\mathrm{list}\left(\mathrm{maxfun}=2\mathrm{e}6\right)\right)\Big)\end{array}\end{array}}$$

The results were consistent with the correlations above. All factors predicted target-naming RTs, except for PND, which was unsurprising given the lack of correlations between PND and RTs. Lexical class was the strongest predictor in the model, with object pictures being named faster than action pictures, followed by target response duration, iconicity, name consistency (H), frequency, and phonological complexity. Frequent, iconic, shorter, and phonologically simpler sign names were retrieved faster than less frequent, non-iconic, longer, and phonologically complex signs. Better name consistency and lower picture complexity also facilitated naming.

We next analyzed target agreement—the proportion of target (dominant) responses—using a generalized linear mixed model fit by maximum likelihood (Laplace approximation) (glmer package) to estimate the likelihood of a model with the same predictor variables except for the H statistic (due to large covariance with agreement), and random intercepts by subjects and stimuli. All independent variables except phonological complexity and PND were centered (z-transformed). The R syntax for the main statistical models for target agreement is reported in (2); data from 6722 observations were included in the analysis of target agreement, and the outcome is summarized in Table 6.

$${\displaystyle \begin{array}{c}\mathrm{acctar}=\mathrm{glmer}\Big(\mathrm{TargetMatch}\sim \mathrm{LexicalClass}+\mathrm{FreqZ}+\mathrm{IconZ}+\mathrm{PND}+\mathrm{Co}\\ {}\begin{array}{c}\mathrm{mplexity}+\mathrm{OVC}+\mathrm{Duration}+\left(1|\mathrm{Participant}\right)+\left(1|\mathrm{Stimulus}\right),\mathrm{data}=\mathrm{PicNam},\\ {}\begin{array}{c}\mathrm{family}=\mathrm{binomial},\mathrm{control}=\mathrm{glmer}\mathrm{Control}\Big(\mathrm{optimizer}="\mathrm{bobyqa}",\\ {}\mathrm{nAGQ}=10\left)\right)\end{array}\end{array}\end{array}}$$

The results revealed that lexical class, iconicity, phonological complexity, and picture visual complexity significantly contributed to target name agreement. Target agreement was higher for object pictures than for action pictures. Further, higher iconicity and lower phonological complexity of the target names improved target agreement (i.e., more names on target). Target agreement was also higher for visually more complex images. Finally, neither frequency nor phonological neighborhood density nor response duration predicted target agreement. Additionally, we ran separate models, first including all possible three- and two-way interactions among lexical class, iconicity, and frequency, and another model including two-way interactions, but no interactions were significant in these models. Interaction terms were thus eliminated in the final model which is reported here.

Cross-linguistic comparison of picture naming in ASL and English

We conducted a direct comparison between average RTs to target names, target agreement (TAR), naming consistency (H), and the number of alternative names for picture-naming responses in English (aggregated data retrieved from Szekely et al., 2005) and for ASL for an overlapping subset of 180 object pictures and 245 action pictures from the UCSD-IPNP database. The results are summarized in Table 7. Note that due to our simplified coding criteria, the overall acceptability rate (TOT) corresponds to the “% valid response” values in Szekely et al. (2005).

Both languages patterned similarly to each other concerning object and action naming; that is, in both ASL and spoken English, actions were named more slowly, with fewer target name responses, and less consistently than objects. For both languages, actions yielded a higher number of alternative names than objects. Interestingly, there were also several cross-linguistic differences. Both naming acceptability (TOT) and target agreement (TAR) were higher in English than in ASL, and this pattern held up separately for object and action naming. In contrast, name consistency was better in ASL than English overall and for objects and actions separately. Interestingly, ASL signers were faster at naming pictures overall and for object pictures, but there was no group difference in RTs for action pictures. Finally, there were positive correlations between ASL and English acceptability rates (percent of valid response in Szekely et al., 2005) (r = .458), target agreement rates (cf. “%Lex1dom”) (r = .182), RTs to target names (r = .769), and naming consistency (H) (r = .208), all ps < .001.


This is one of the first studies to investigate the effects of multiple linguistic variables on lexical retrieval for a sign language using a large set of both object and action pictures. The study investigated how modality-dependent variables, such as iconicity and sign-based phonology, and modality-independent variables, such as frequency and lexical class, influence naming. The results revealed that iconicity, a modality-dependent factor, facilitated picture naming in ASL, replicating previous small-scale studies. Below we discuss the possible mechanisms that might underlie these results. Secondly, modality-independent factors and picture properties influenced sign retrieval in similar ways as word retrieval in previous research. Thirdly, phonological properties of signs (duration, complexity, neighborhood density) had mixed effects on naming behavior. Finally, the direct comparison of English and ASL picture-naming behavior yielded clear differences in how easily the same pictures were named in the two languages. We discuss this comparison below. First, we consider the effects of each of the lexical and stimulus variables on naming in ASL.


Sign iconicity was a robust predictor of ASL naming times and target name agreement and was associated with improved naming consistency, particularly for action pictures. These results align with previous picture-naming studies that specifically contrasted highly iconic and non-iconic signs (Baus & Costa, 2015; McGarry et al., 2020; Navarrette et al., 2017) and replicate the regression results with BSL picture naming by Vinson et al. (2015). Thus, the facilitatory effect of iconicity on lexical retrieval appears to be general and can be observed even in a large data set in which pictures were not preselected for a significant discrepancy in iconicity.

One possible explanation for the facilitatory effect of iconicity on lexical retrieval is that iconic signs become activated more quickly and robustly than non-iconic signs because iconic signs receive additional activation from the perceptual and action-related semantic features that they encode (Navarrette et al., 2017). That is, iconicity may pattern somewhat like concretenessFootnote 6. Many studies suggest that concrete words are recognized more quickly and accurately than abstract words because concrete words have stronger and denser semantic associates and activate more sensorimotor information (see Barber et al., 2013). Support for this hypothesis was reported in a recent event-related potential (ERP) study of picture naming in ASL. McGarry et al. (2020) found that the N400 response (an ERP component sensitive to lexical-semantic processing) was modulated by iconicity in a manner that paralleled the effect of concreteness: pictures named with iconic signs elicited a larger N400 response (greater negativity) than non-iconic signs. Concrete words also elicit a larger N400 amplitude than abstract words (e.g., Barber et al., 2013; Holcomb et al., 1999). This effect is generally attributed to increased activation of perceptual and action-related semantic features associated with concrete words (e.g., Holcomb et al., 1999). Iconic signs often depict perceptual semantic features—for example, distinctive body parts of animals, such as the trunk of an elephant (Hwang et al., 2017)—and/or motoric features (e.g., how an object is handled (Padden et al., 2013). McGarry et al. (2020) suggested that the concreteness-like N400 response observed for iconic signs reflects the more robust encoding of sensorimotor semantic features that are depicted by these signs and that are emphasized by the picture-naming task.

The picture-naming task itself may also enhance the facilitatory effects of iconicity, particularly if iconicity is viewed as a structured alignment between a conceptual representation (depicted in the picture) and a phonological form (Emmorey, 2014; Taub, 2001). Specifically, the structure-mapping account of iconicity posits that there are structure-preserving mappings between features of phonological form and semantic representations and that effects of iconicity are observed for tasks that tap into these mappings (Emmorey, 2014). Supporting this hypothesis, Thompson et al. (2009) found that picture–sign-matching decisions were faster when the phonological form of the iconic sign was aligned with the picture (e.g., a picture of a bird with a prominent beak aligns with the ASL sign BIRD, which depicts a bird’s beak) compared to non-aligned pictures (e.g., a bird in flight). Similarly, McGarry et al. (2020) found that pictures that aligned with the iconic sign were named faster than non-aligned pictures and exhibited a reduced N400 component, indicative of facilitation. Thus, pictures that correspond to iconic target signs may be more likely to visually prime the phonological form of the sign, leading to faster picture-naming latencies.

For action naming, iconicity correlated with naming consistency (H statistic), which is particularly interesting given that action naming was less consistent than object naming. One possible explanation for this result is that iconic action signs are more easily depicted in a line drawing than are non-iconic action signs. For example, the sign COMB_2 (iconicity M = 6.2) pantomimically depicts the combing action that is illustrated in the picture (act043comb.jpg), whereas the less iconic sign SING (iconicity M = 2.6) bears little relation to the picture depicting this concept (act202sing.jpg). The mapping between iconic action signs and their illustrations may have boosted the consistency of naming responses across signers.

To explore the potential relationship between picture alignment and iconicity, we coded whether there was a mapping between the phonological form of the target sign and the picture eliciting that sign, i.e., whether there was an alignment between phonological features of the sign and visual aspects of the picture. We conducted a two-way ANOVA with iconicity ratings (Z scores) as the dependent measure and picture type (objects, actions) and picture alignment (aligned, non-aligned) as the independent factors. As expected from the main analysis, target signs for action pictures were rated as more iconic than those for object pictures, F(1,508) = 20.0, p < .001). There was also a main effect of alignment: when pictures were aligned with the target signs, the signs were more iconic (M = 0.655, SD = 0.775; n = 286) than when the pictures did not align with the target signs (M = 0.143, SD = .803, n =237), F(1,508) = 44.0, p < .001). There was no interaction between picture type and picture alignment, F(1,508) < 1, p = .973. Interestingly, a higher proportion of action pictures were aligned with their target signs (65%) compared to object pictures (45%), supporting our hypothesis that picture alignment may have boosted naming consistency for iconic action signs. Note that the positive relationship between sign iconicity and picture alignment was not a foregone conclusion. For example, the iconic sign BABY (mean iconicity rating = 6.8) depicts rocking an infant in one’s arms, but the picture used to elicit this sign was non-aligned and shows a baby sitting up and wearing a diaper. Similarly, the non-iconic sign SKI (mean iconicity rating = 1.2) was elicited by an aligned picture in which the shape of the two skis maps to the two extended index fingers of the sign. However, this post hoc analysis should be viewed with caution because the pictures were not selected to manipulate alignment properties, and the pictures could also vary in the strength of the alignment—e.g., many phonological features or only a single feature could align with elements of the picture.

Iconicity did not interact with other variables in the model, contrary to previous findings by Baus and Costa (2015), who reported an interaction between iconicity and frequency (based on spoken language translations) for object pictures. Baus and Costa (2015) found that it was the lower-frequency signs that benefitted from iconicity during retrieval in hearing bilinguals who knew both Catalan Sign Language (LSC) and Spanish. Given that sign language tends to be the less dominant language for hearing bilinguals (Emmorey et al., 2016), an interaction between frequency and iconicity may only be evident in less dominant or less proficient signers (most participants in the Baus and Costa study had only ~2.5 years of exposure to Catalan Sign Language). Such an observation would be congruent with research showing that iconicity is most helpful for learning the semantics of signs in novice, less proficient L2 signers (see Ortega, 2017 for a review). Further, Vinson et al. (2015) found a relationship between iconicity and age of acquisition for deaf BSL signers, such that iconicity speeded picture-naming times only for signs rated as having a later age of acquisition, leading Vinson et al. to suggest that iconicity may only facilitate naming when there is some degree of difficulty in lexical retrieval, as may be the case for late-acquired signs. However, our results do not seem to align with this explanation. Typically, later-acquired signs tend to be less frequent in the lexicon, and less frequent signs are also more difficult to retrieve (Emmorey et al., 2013). Vinson et al.’s hypothesis would thus predict that iconicity should speed retrieval only for less frequent signs. However, we did not find an interaction between iconicity and frequency. We suggest that the pattern observed in the Vinson et al. study may have been due to the small number of early-acquired signs (as acknowledged by the authors) or due to other confounding factors (Navarrette et al., 2017).

Although the facilitatory effect of iconicity on sign production seems to be relatively well established, particularly in tasks that involve picture naming, the effects of iconicity on sign comprehension have been mixed. Bosworth and Emmorey (2010) found no difference between iconic and non-iconic signs in a lexical decision experiment. Iconicity slowed (i.e., inhibited) decisions about handshape (curved vs. straight fingers) (Thompson et al., 2010) but facilitated decisions about movement (upward vs. downward motion) for deaf signers (Vinson et al., 2015). Iconicity did not appear to modulate the N400 response during sign comprehension for adult deaf signers (Emmorey et al., 2020; Mott et al., 2020). However, iconic signs tend to be more easily learned by second language learners and by deaf signing children (Caselli & Pyers, 2017, 2020; Thompson et al., 2012). More research is needed to understand the role of iconicity in lexical access and representation in sign languages. Also, there is now growing evidence that iconicity can influence lexical representations and processing for spoken languages (see Dingemanse et al., 2015).

Lexical class

Lexical class was a major determinant of target-naming RTs in ASL picture naming. Action pictures were named more slowly than object pictures, as is found for spoken languages (Bates et al., 2003; Bayram et al., 2017; Khwaileh et al., 2018). Naming actions also yielded a higher number of alternative names, leading to poorer name consistency compared with naming objects. Worse name agreement might arise due to a higher number of competing lexical items available for action concepts, which in turn decreases naming agreement and slows RTs. These findings are compatible with the hypothesis that verbs have a more complex semantic organization than nouns, which leads to more difficult action name retrieval (Huttenlocher & Lui, 1979; Vinson & Vigliocco, 2002). Also, the action pictures in our study were more visually complex than the object pictures. This may have contributed to slower naming times for actions. Overall, our results are consistent with the idea that lexical class is an organizational principle of semantic knowledge regardless of language modality.


As predicted, lexical frequency was a significant predictor of naming RTs such that pictures with more frequent target names were named faster than pictures with less frequent target names. We found this expected effect of frequency on RTs even though frequency of target names was assessed by subjective ratings rather than by corpus count. Further, although the lexical frequency of target names was associated with better target agreement (higher proportion of responses on target) and better naming consistency, frequency did not correlate with or predict the proportion of target agreement, and correlated positively with target agreement only for object pictures, not actions. The lack of relationship between frequency and proportion of responses on target for action pictures may be because action pictures are harder to name, and after abandoning the search for a specific name, signers might have resorted to a higher-frequency alternative verb instead.

The inverse correlations between lexical frequency and target sign (response) duration and phonological complexity (Tables 3 and 4) indicate that frequent signs tended to be shorter and less phonologically complex. This pattern is consistent with the well-documented notion that as languages evolve, they develop structures that maximize communicative efficiency (Zipf’s law). With frequent use, words or signs become shorter and simpler to maximize the rate of message transmission (Bybee, 2006; Gibson et al., 2019). In addition, more frequent signs tended to be less iconic, replicating Sehyr et al. (2021), and supporting the hypothesis that the iconicity of signs erodes with frequent use over time. Importantly, when other factors were accounted for in the regression model, the frequency of signs uniquely predicted naming latencies, suggesting that a similar mechanism can account for frequency effects in both signed and spoken languages (e.g., variation in resting activation levels or selection thresholds).

Phonological properties

Response duration. First, signed responses were longer for action pictures than for object pictures. ASL verbs permit a greater amount of spatial and temporal modifications (e.g., path movement) than ASL nouns, which could lengthen the time it takes to name an action in ASL. Indeed, in naturally occurring signed language discourse, verbs have been observed to contain longer, larger, and unrestrained movement compared with nouns (Hunger, 2006; Johnston, 2001; Kimmelman, 2009; Supalla & Newport, 1978). Thus, a longer response duration for single sign responses elicited in this study may reflect the specific phonological or phonetic properties of ASL verbs.

Second, the average response duration (our proxy for sign length) was a significant predictor of naming RTs, such that pictures with shorter signs were named faster than pictures with longer signs, and this relationship was particularly apparent in action naming. This result is in line with previous picture-naming studies in spoken languages (e.g., Cuetos et al., 1999; Santiago et al., 2000; Szekely et al., 2005) and suggests that longer signs may take longer to retrieve and/or phonologically encode than shorter signs when other factors are controlled. Importantly, even though the phonological features of signs tend to be articulated simultaneously relative to spoken words, there nevertheless appears to be sufficient variation in sign duration to reveal a correlation with naming latencies. This correlation is also consistent with the hypothesis that signs—like words—are phonologically assembled during language production (with longer signs taking longer to encode) and are not retrieved as holistic gestures.

Phonological complexity of signs predicted naming latencies, such that pictures for which target signs were phonologically more complex had slower RTs. This pattern of findings is congruent with effects of phonological complexity found in spoken languages where words beginning with marked complex sounds, such as liquids, were retrieved more slowly than words beginning with unmarked oral stops (Cummings et al., 2016). The complexity of certain sounds influences motor planning and execution. Production of complex words would therefore elicit longer response times than production of simple words. Our results for ASL align with this argument. Signs containing complex features (e.g., marked handshapes) might require more time to plan and encode than signs consisting of simpler, unmarked features. However, the relationship between RTs and complexity in ASL was rather weak because it disappeared when correlations were conducted separately for objects and actions.

Phonological neighborhood density (PND) did not correlate with naming behavior and had no predictive power for target-naming RTs or target agreement when other variables were in the model (Tables 3 and 5). In addition to the lack of a PND effect, we found no interactions between PND and other variables in the model (beyond phonological complexity, which is expected). Based on evidence from spoken languages and interactive models of lexical processing (Baus, Costa, & Carreiras, 2008a; Vitevitch, 2002), we predicted that the coactivation of neighbor signs would facilitate lexical retrieval. In addition, Caselli et al. (2021) found inhibitory effects of PND on sign comprehension, indicating that phonologically similar signs compete during recognition. Our negative results for sign production are partially congruent with a recent large-scale study of spoken English naming of over 2000 photographs. Karimi and Diaz (2020) also failed to find a main effect of PND on spoken word retrieval (see also Zhang et al., 2020). Nonetheless, they did find that PND interacted with name agreement such that PND facilitated retrieval only for low-agreement words, a pattern that we did not observe. It remains to be established whether the effects of PND and interactions with other variables could be observed with a much larger set of stimuli for ASL. It is also possible that PND plays a different role in sign recognition from that in sign production, perhaps due to the more simultaneous nature of phonological units.

Objective visual complexity

As expected, picture complexity predicted naming RTs overall such that simpler pictures were named faster than complex pictures, as found for spoken languages (Alario et al., 2004; D’Amico et al., 2001; Szekely et al., 2005). This relationship was significant only for action naming, as found for English by Szekely et al. (2005), suggesting that this relationship was weak. Further, we found that more complex pictures were named with longer signed responses. It is possible that signers slowed their articulation when naming visually complex pictures. However, this relationship disappeared when we examined action and object naming separately, suggesting that this relationship was also relatively weak.

Interestingly, picture complexity was negatively correlated with iconicity for object naming (Table 4). The target signs used to name simpler object pictures tended to be more iconic. It seems that iconic signs may lend themselves well to concepts that can be visually depicted in simple drawings, at least for objects. The association between picture complexity and iconicity of target signs might uniquely influence picture naming in ASL, in contrast to spoken English, where there is generally no relationship between the word form and the concept depicted in the picture.

Picture naming in signed versus spoken language

In this section, we discuss the results of the direct comparison between ASL and English picture naming, based on English data reported in Bates et al. (2003) for an overlapping set of 425 pictures (see Table 7). The primary goal of this comparison was to assess how the object and action picture sets, designed specifically for naming in spoken languages, compared to naming performance in a signed language. Both ASL signers and English speakers named action pictures more slowly, less often on target, and less consistently than object pictures. Further, both ASL signers and English speakers produced more alternative names for actions than objects. This shared pattern likely reflects the general finding that verbs have more complex semantic and morpho-syntactic representations than nouns, as discussed above. In addition, the greater visual complexity of the action pictures may have impacted naming similarly for both signers and speakers.

Overall, ASL signers were faster than English speakers but exhibited fewer alternative names and better name consistency (i.e., lower H statistic) than English speakers. One possible explanation for this result is that ASL has a smaller lexicon relative to English. A smaller lexicon yields fewer available candidates, decreasing competition at retrieval and facilitating retrieval times. That is, fewer lexical competitors lead to faster picture-naming latencies (Alario et al., 2004; Bates et al., 2003; Snodgrass & Yuditsky, 1996). Further, in spoken English naming, even on trials when the speakers provided the dominant (i.e., target) name, the possibility of alternative names for that picture slowed their response times (Bates et al., 2003; Szekely et al., 2005).

The lower acceptability rate and lower target name agreement for ASL could have occurred because the pictures were selected for English naming and may be biased toward the English lexicon. Bates et al. (2003) reported a similar English advantage in comparison with seven other spoken languages when naming object pictures. English speakers were significantly more accurate (target agreement 85%) than speakers of other languages, particularly Hungarian and Chinese (78% and 72% target agreement, respectively). A similar English bias has been observed for the Boston Naming Test when used with Spanish-English bilinguals, in that Spanish vocabulary is underestimated by the pictures used for this test (Gollan et al., 2012). Interestingly, some pictures resulted in better target name agreement in ASL than in English. One example is “walk” (act257walk.jpg), which received a 100% acceptability rate (TOT) and 71% of responses were on target (TAR) in ASL, but in English only 76% of responses were valid and only 45% were on target. Together, these findings indicate that stimuli developed for use with one language and its associated linguistic community may not be appropriate for use with other languages and communities.

A smaller ASL lexicon could also explain why ASL signers used fewer alternative names (up to seven names; 2.5 names on average) than English speakers, who had many more alternative names (up to 18; five names on average; Szekely et al., 2005, p. 7). Perhaps due to the fewer alternatives available, ASL signers also named pictures more consistently (H = .61) than English speakers (H = 1.2). Nonetheless, some pictures were named less consistently in ASL than in English because the number of possible variants for the concept was higher in ASL. One example is “pineapple” (obj320pineapple.jpg), for which the total acceptability rate was good in both ASL (81%) and English (98%), but in ASL, target name agreement (35%) and naming consistency (H = 1.5) were worse than in English (98%; H = .17). This result likely occurred because there are at least five distinct sign variants for “pineapple,” while English has only one word for this concept. This example illustrates that a smaller lexicon does not necessarily imply less lexical variation.

In sum, these results emphasize the importance of developing language-specific norms in order to obtain a more valid and representative tool to assess lexical retrieval. In addition, further work is needed to identify pictures that are balanced for difficulty across English and ASL (cf. Gollan et al., 2012).


The primary goal of this study was to characterize the linguistic factors that influence lexical retrieval in a signed language, assessed by picture-naming latencies and target name agreement for both objects and action pictures. We found that many of the same factors that influence lexical retrieval of spoken words influence sign retrieval: lexical class, lexical frequency, phonological properties (complexity, length), name agreement (number of alternative responses; naming consistency), and picture complexity. Importantly, iconicity was also a robust predictor of naming latencies, suggesting that the relationship between sign form and depicted concept played a role in lexical retrieval. This effect may well be specific to sign languages, but more work is needed to determine the possible impact of iconicity on picture naming in spoken languages. Overall, the results indicate that theoretical models of lexical representation and retrieval developed with data from spoken languages can be adopted for signed languages, with the possible exception that iconicity needs to be included as a factor. Further research is needed to delineate whether iconicity is a property of lexical (or sublexical) representations or plays a more constrained and specific role in picture naming (e.g., mapping a visually depicted concept to a lexical form).

We also directly compared ASL and English object naming to delimit the key similarities and differences between the two languages. ASL and English target name agreement and naming latencies were correlated (.458 and .899, respectively), and lexical retrieval was influenced by the same lexical variables. Nonetheless, there were differences in acceptability rates, target agreement, and response latencies that we hypothesized were related to a difference in lexicon size and the English bias of the selected pictures. These findings highlight the need to develop standardized picture sets that can be reliably and appropriately used for naming in ASL (and other sign languages).

The outcome of this study is an open-source data set consisting of naming data in ASL for 523 black-and-white line drawings of objects and actions. Each picture is associated with its target ASL signs, alternative sign variants, and the associated naming data (data set is available for download on OSF: https://osf.io/7umva/). This is the largest compilation of picture-naming data in any sign language to date. The ASL picture-naming data set provides a much-needed resource for researchers, educators, clinicians, or test developers wishing to utilize picture sets that are suitable for use with ASL signers.