Psycholinguistic research on sign language has traditionally focused on investigating whether spoken and sign language processing are governed by similar or different cognitive mechanisms and underpinned by similar or different neuroanatomical substrates. Studies have looked into various aspects of processing in signed languages and these findings so far have shown that lexical access in signed languages is broadly affected by similar features to those in spoken languages (for an overview see Carreiras, 2010). Previous work has confirmed the fundamental distinction between form and meaning through “tip of the finger” experiences (Thompson, Emmorey, & Gollan, 2005), the role of morphological complexity (Emmorey & Corina, 1990) and of phonological parameters (Gutiérrez, Müller, Baus, & Carreiras, 2012), semantic interference effects (Baus, Gutiérrez-Sigut, Quer, & Carreiras, 2008), familiarity and phonological neighborhood (Carreiras, Gutiérrez-Sigut, Baquero, & Corina, 2008), and cross-language interactions in bimodal bilinguals (Kubus, Villwock, Morford, & Rathmann, 2014; Morford, Kroll, Piñar, & Wilkinson, 2014; Morford, Wilkinson, Villwock, Piñar, & Kroll, 2011).

While many of these findings provide parallels for what is already known about spoken languages, results that are puzzling, inconclusive or contradictory to previous findings have also been found. For instance, priming studies with sign languages have shown the expected facilitatory effect of a semantic relation (Mayberry & Witcher 2005) but not always clear effects of the phonological parameters. Phonological parameters (location, handshape, and movement) influence sign recognition in a different manner, with some parameters showing an inhibitory effect and others showing facilitation (Carreiras et al., 2008; Gutierrez, Williams, Grosvald, & Corina, 2012; see also Caselli & Cohen-Goldberg, 2014, for a computational model). Furthermore, results are not consistent: for example, some studies have found location to have an inhibitory effect on lexical retrieval (Corina & Hildebrandt, 2002; Carreiras et al., 2008), while other studies have found a facilitatory effect of location combined with movement (Baus, Gutiérrez & Carreiras, 2014; Dye & Shih, 2006). In addition to these results that are inconclusive or do not sit well with spoken language findings, there has also been a more recent trend in research on sign languages to explore modality differences. Specifically, there is a growing line of work that looks at those aspects that lead to differences in processing (see for example Gutierrez, Williams, et al., 2012; Marshall, Rowley, & Atkinson, 2014).

The progress of cognitive research into sign languages is hindered by several complicating factors. Firstly, psycholinguistic work into sign languages is a much younger field than its spoken language counterpart and has accumulated a much smaller empirical base. Secondly, the foundational study of signed languages from a linguistic point of view is similarly underdeveloped when compared with the large body of work on spoken languages, and many basic questions remain to be uncovered, let alone answered (see Sandler & Lillo-Martin, 2006, for an overview). Finally, sign languages are articulated in a different modality to spoken languages, and so considerations and factors that are irrelevant for spoken languages may be of great importance for the visual-gestural domain.

In order to move forward to describe in detail and hence make theoretical models of processing and brain functioning related to sign language use, it will be necessary to address these issues and carry out more empirical research. The lack of psycholinguistic and linguistic research pertaining to sign languages can only be remedied by more work on these languages. Nonetheless this rigorous empirical research can only be successful if a careful description of the stimuli is possible so that meticulous manipulation and control of important variables can be carried out. In fact, some of the contradictory results mentioned above may be in part due to differences in confounding factors in the stimulus material such as physical saliency of parameters, simultaneous load of information, or even more basic variables such as image quality (grain, focus, perceptibility of handshapes), lighting, and so on. These perceptual factors are even more relevant in the context of neuroimaging techniques due to the brain’s high sensitivity to such differences.

Related to modality differences and the visual nature of signs, the technological difficulties involved in automating or comparing videos make it difficult to study the influence of different properties of sign languages on lexical access and language processing. This problem operates at two different levels. Technically, we do not have a great deal of resources for working with dynamic visual stimuli, while there are a wealth of tools and techniques for creating, manipulating, and analyzing dynamic acoustic material (i.e., spoken words or sounds) or static visual stimuli (printed words or images). Although current technology allows a quicker workflow with videos and better image quality than in previous decades, handling video material is still a complex issue.

At a more conceptual level, there is no commonly used means for quantifying or visualizing the properties of a complex dynamic visual signal, making it difficult to assess a given input stimulus and thus to compare different stimuli. Researchers working with the speech signal are used to examining waveforms and spectrograms, and extracting measurements such as amplitude and formant frequency to quantify a given acoustic signal, but the relative lack of work on dynamic visual linguistic input means that similar ways of characterizing video signals have not been developed and/or used in the field.Footnote 1 This is not a trivial matter since, for example, the selection of an experimental stimulus is often made based on the citation form, but this may differ from the actual realization of the sign by the model during the recording session. How those differences should be measured, and what effect they might have on processing are open questions. These issues depend on a more basic understanding of the visual phonetics and phonology of sign languages, and the nature of categorical perception in the visual domain.

In order to study how signers process individual signs, what is needed is a collection of recordings of signs that contains a description of as many variables as possible related to the actual recording, including signer identity and perceptual conditions of the actual video, such as angle, lighting, background, and so on, so that researchers are able to control for unwanted variables and manipulate other variables in a smoother way. In addition to these physical characteristics, it is also necessary to control for psycholinguistic factors inherent in the signal, in other words, properties such as grammatical category, phonological structure, or lexical attributes (frequency, familiarity, age of acquisition, etc.). Finally, to facilitate stimuli selection for experiments it is very important that all this information can be searched easily with a tool that allows either the selection of stimuli with some specific set of features or that displays the features of a stimulus or of a set of stimuli. Such a collection of recordings and the corresponding search tool has been created for the LSE (lengua de signos española – Spanish Sign Language) and is available in the LSE-Sign database.

Introducing LSE-Sign

LSE-Sign is a lexical database containing 2,400 signs from the most recent standardized Spanish Sign Language dictionary (Fundación CNSE, 2008) and a total of 2,700 nonsigns (items that are sign-like in form but which have no meaning in LSE) with a corresponding search tool for selecting stimuli. All signed forms are coded according to formal and grammatical criteria as according to their glosses (an approximate Spanish translation of the sign). Searches using LSE-Sign can therefore be carried out from a list of Spanish words or by selecting the formal and grammatical criteria of interest. LSE-Sign is a highly flexible system adapted to the specific characteristics of signed languages. Importantly, it is a straightforward and intuitive visual interface for searching for signs.

Creation of the LSE-Sign database

Signs were taken from the first standardized LSE dictionary published by the Spanish National Association of Deaf People. The entire contents of the dictionary were used (although a small number of entries were lost due to technical problems during the production process). The signs included in this dictionary were selected as a result of a standardization process carried out by Fundación CNSE and represent those signs judged to be the most standard (i.e., commonly used and understood) by members of the Deaf Community throughout Spain (for more information on the selection and standardization procedure for the original dictionary, see Vicente Rodríguez, Fornés Ribes, Costa Rodríguez, Sánchez Moreno, & Pinto Muñoz, 2008).Footnote 2 Nonsigns were generated by altering one of the principal phonological parameters (handshape, location, or movement) of a given sign. The majority of the nonsigns (92 %) were legal, “pronounceable” nonsigns, but we also included a small percentage of nonsigns with an illegal phonological parameter (e.g., 4 % with an illegal handshape). For some signs, two nonsigns were created (making the number of nonsigns slightly greater than the number of signs).

Video recording and editing

The signs were recorded in high definition (50 frames/s) in a video recording studio with controlled lighting conditions and a chroma background from two different angles. Two simultaneous recordings were carried out using two cameras: one camera was placed in front of the model while the other faced the model’s right side (perpendicular to the front camera). Signs were produced by two highly proficient native signers from deaf parents, one male, one female, who each produced half of the signs in the database. The recordings were made over a single week to minimize any changes in appearance and the models maintained the same appearance (clothing, hairstyle) across the different recording sessions.

All signs were produced within the same carrier sentence, which consisted of producing the sequence signing target signing (signing is a two-handed sign produced in the central neutral signing space using the unmarked “5” handshape). Models were asked to produce the same sentence twice at a normal signing speed while looking at the front camera. To avoid the unnecessary presence of mouthing (derived from the spoken word associated with the meaning of the sign), models were instructed to include only those non-manual elements that were an integral part of the lexical item. This also avoided the introduction of emotional content through facial expressions. The model produced a given sign based on the video recording of the sign from the LSE dictionary (Fundación CNSE, 2008) and then produced the corresponding nonsign by changing a specific parameter of the sign. The parameter to be changed was provided (in order to make sure that there was an even distribution of the parameters altered across all signs), but the model was free to decide how the parameter was modified. As far as possible, all other elements of the sign (including non-manual features) were kept the same..

Video files were edited by trained video editors. The first frame of the sign was defined as the first frame with a clear and well-defined image showing the initial (dominant) handshape and location; the final frame was the last frame in which the final (dominant) handshape was still recognizable prior to transitional movement for the rest of the carrier sentence. Clips from both angles were cut at exactly the same start and end frame. Images where the model was looking away from the front camera or the handshape was not clearly visible in the first 2–3 frames of the clip (due to fast transitional movement) were discarded (since there were two recordings of each sign, a minimal number of signs were lost due to this filtering process). The chroma screen was replaced by a neutral grey background, and a color and a black and white version of each video were created.

Coding the entries

The coding was carried out by three deaf signers from different areas of Spain (San Sebastián, Madrid, and Valencia), all of whom had good metalinguistic knowledge of LSE due to extensive experience in working with the language (e.g., as teachers). The coding process was coordinated by the second author, who, as a qualified LSE interpreter and trained sign linguist, is competent in LSE. One of the recordings for each entry was coded for a detailed set of information, explained in detail in the following section. The coding was based on the actual video so that the transcription was an accurate reflection of the form signed in the video, rather than that of an “idealized” citation form which might differ from the exact content of the real recording.

The coding period spanned a 5-month period and began with a week of training to familiarize the coders with the interface and to standardize criteria and conventions among the coders. A visual interface was designed to facilitate the coding process. This made it possible to discuss doubts and to clarify issues related to the transcription conventions. A further week-long training session was held 3 months into the coding period to guarantee inter-coder reliability. Although each coder worked on a different set of signs (each transcribing a third of the contents of the database), the high level of interaction and communication among the coding team meant that criteria and conventions were common to all (see “Inter-rater reliability” below for more details). Furthermore, a test of inter-coder reliability using a small sample of signs (n=10) at the beginning of the process revealed a high degree of uniformity across the coders. Additionally, each entry in the database includes an Observations field in which coders could remark upon any issues relating to the coding of the sign (see “General information” subsection below), so a transparent record is left in case of any doubts.

Once the coding period concluded and the database contained all the necessary information, a search interface was developed to provide a tool for the final user, namely, an experimenter looking for sets of signs with specific characteristics. The search interface allows the user to search across nearly all the properties coded in the database and to control the amount of information displayed in the results. The interface is both visual and intuitive, and includes additional functionality (such as the ability to modify previous searches) to improve usability.

Contents of the database

The database includes a wide range of detailed phonetic, phonological and grammatical information for each of the 5,100 entries (2,400 signs and 2,700 nonsigns). The information is divided into six different categories, each of which is described in full in the following subsections. The criteria for selecting the fields and the values for each field were based on several factors. Obviously, existing models of sign language phonology and phonetics provided an initial framework, and the coding used in the CNSE dictionary that provided the LSE signs for this database also served as a starting point. Most importantly, the aim of the database is to provide a tool for searching for and creating sets of LSE sign stimuli and so we have attempted to include as many variables as could be of interest to an experimenter. This means that in certain instances, we have attempted to provide more detailed classification than would be provided by a phonological description: a case in point is the parameter of location (see below), for which we decided to use 126 unique locations rather than the ten or so features that a phonological model might use to define a location (e.g., Sandler 1989). This makes it possible to provide a more exact description of the particular articulation of the sign rather than an idealized citation form of the sign. Furthermore, it does not depend on any one particular model: the corresponding representation for a specific model can be constructed from the detailed surface form coded in the database. In contrast, adopting a specific model would have tied us (and any researchers who wished to use the database) to that model. Finally, the selection of fields and values was also based on our previous experience with encoding and selecting signs for use in psycholinguistic experiments, and also on feedback during the initial pilot coding period of this database.

Most of the fieldsFootnote 3 coded for in the database can be used as search criteria in the interface and all of them are available in the search results. Each subsection is displayed as a separate tab in the search interface, and on each tab the user can specify the criteria that limit the search for entries in the database. The options available in the corresponding tab of the search interface are also described at the end of each of the following subsections. The search interface also includes contextual help for each of the search fields in the form of a pop-up text box with a brief explanation and information about how to use that field. A more complete description of the contents of the database and the search interface is available (in Spanish) in the online instructions for the database on the Portal LSE website where the database is available (http://www.bcbl.eu/databases/lse/).

General information

This section includes grammatical and semantic information about the sign and its basic properties. Leme is a unique identifier for the entry, which is a transparent label rather than a random code. In the case of sign entries, Leme is based on the sign’s meaning [e.g., “cabeza3” (“head3”) is the third of various different signs the meaning of which is related to the concept ‘head’]; for nonsigns, Leme is the name of the base sign (from which the nonsign is formed) plus a suffix that identifies the parameter that was modified to create the nonsign (e.g., “cabeza3 ns_movimiento”). A specific field indicates whether a given entry is a sign or a nonsign, thus making it simple to distinguish between these two types of entry (and to restrict a search to one or the other, if necessary). Since nonsigns lack meaning, several of the following fields (Gloss, Leme type, Grammatical category, Semantic field, Sign origin, Dialectal variation, and Region of use) do not apply.

Gloss is a standardized representation of the meaning of the sign in Spanish. This is the most widespread means of representing signs in written form, using capital letters and hyphenation when more than one word is necessary (e.g. “café-con-leche”). Leme type identifies whether the entry consists of single sign, or is made up of two elements (i.e., a compound) or more (i.e., a multi-word unit). The vast majority of sign entries are single signs (92.8 %), although the tendency of LSE to create compounds is reflected in the number of compounds present in the database (6.9 %). The Grammatical category of the entry is given for both the sign itself and the corresponding Spanish word (i.e., the gloss). Generally the two coincide, but the two are separated for two important reasons. Firstly, the grammatical category of signs tends to be more fluid than in the spoken language and the distinction between different word classes is far from clear (for an overview, see Meir, 2012). An (apparent) adjective, for example, may behave predicatively and inflect like a verb, such as the sign enfermo (“sick”), which may appear directly with a noun phrase like padre (“father”) to give the meaning “Father is sick,” and may also be modified to show aspect, such as the continuative (“constantly sick”) or the iterative (“often sick”) (Cabeza Perreiro & Fernández Soneira, 2004; Klima & Bellugi, 1979). Secondly, verbs in sign language fall into different categories, namely, plain, localizable, and directing verbs (Fischer & Gough, 1978; Padden, 1988) and this distinction is reflected in the options available for the Grammatical category in LSE. The values for the grammatical category in Spanish were based on a standard list of grammatical categories that had been used in the CNSE dictionary. For the LSE grammatical categories, we reviewed the sign language literature and adapted the list accordingly. Generally, this involved removing irrelevant categories (such as gender distinctions on nouns), except for the case of verbs, where we set out to provide a basic taxonomy that was not committed to any specific theory. As a result, there is a form-based distinction between verbs which cannot inflect (invariable verbs), those which can be articulated at different locations (localizable verbs), and those that can move from one location to another (directional verbs). Semantic field provides a categorization of the meaning of the sign from a closed set of options (animals, food, sports, etc.) based on the contents of the dictionary that provided the entries for the database.

The Number of syllables is based on the hold-movement-hold model (Liddell & Johnson, 1989) and used the following guidelines to determine the number of syllables in a sign: a syllable cannot contain more than two handshapes or two orientations; changes in internal movement (i.e., change in handshape or orientation) or in non-manual markers often coincide with syllable boundaries; and restrained repetition (see the movement section for more information) is considered part of the previous syllable and not an independent syllable.

Sign origin (etimología in Spanish) includes any information about the origin of the sign based on the coder’s knowledge. As such, this field does not provide detailed diachronically based evidence for the evolution of a given form, but simply indicates a similar sign that is a likely candidate for the origin of the sign in question (e.g., the origin given for the sign canción [‘song’] is música [‘music’]). Since LSE shows a significant amount of dialectal variation and often a single meaning may have different forms, the database also captures this information: the Dialectal variation field indicates whether alternative signs exist for the same meaning, and Geographic area specifies which regions the use of the sign in question is limited to (thus, if the field is empty, the sign is used in all LSE regional dialects). The coders provided this information based on their own knowledge of LSE. While all three coders were broadly familiar with different dialects of LSE due to their experience with the language and contact with signers from other regions, their knowledge was not exhaustive and is to some extent idiosyncratic.

The fields for Sign origin, Dialectal variation and Geographic area were included in the database in order to provide additional information that could be useful to experimenters when selecting signs. The comments provided by the coders in these fields are not reliable for lexicographic or etymological purposes but rather serve to indicate that a given sign may be problematic for use in a psycholinguistic experiment because it is similar in form to another sign, it is limited in its use, or its meaning changes from one dialect to another.Footnote 4 Any further issues are highlighted in the final two fields in this category, which offer supplementary information in the form of free text. The Notes provide any relevant additional information about the entry. For signs, this includes remarks indicating similarity to another sign with a different meaning from a specific dialect, or use restricted to a specific age group, for example; for nonsigns, this includes possible confusion with other real signs, and discrepancies between the nonsign and its corresponding sign beyond the modified parameter. The Observations field is related specifically to the coding of the entry, and points out any doubts the coder may have had (e.g., an unclear number of syllables), as well as any details which could not be captured in database (e.g., some nonsigns used a handshape that was not included as an option). In short, the Notes and Observations fields provide supplementary and metadata about each entry, where relevant.

In the search interface, the first eight fields (Leme, Sign/Nonsign, Gloss, Leme type, Grammatical category in Spanish, Grammatical category in LSE, Lexical field, Number of syllables) are available as search criteria. Both the Leme and Gloss fields permit searches for the exact word (“is”), part of the word (“contains”), or the start of the word (“begins with”). Additionally, the Leme field can be defined using a text file containing a list of lemes, making it easy to recover the details of a previous search whose results have been exported (see below).

Type of sign and iconicity

This category includes information about the involvement of the hands in the sign and the type of iconicity displayed. Type of sign is based on Battison’s (1978) basic taxonomy and distinguishes between one- and two-handed signs, and within the latter category between signs in which the hands act together, either simultaneously or symmetrically (in alternating motion), or one hand acts upon the other, which remains static. Furthermore, the dominant hand may have the same handshape as the non-dominant hand, or both may have the same handshape.

Sign languages show an increased presence of iconicity, even at the lexical level, and many forms have some degree of visual motivation (Perniss, Thompson, & Vigliocco, 2010). The role of iconicity in processing the language and lexical access is under debate (Bosworth & Emmorey, 2010), and it is thus important to be able to control for this when selecting experimental stimuli. However, iconicity is not a simple binary property, and the relation between the form and the meaning of a sign may be of several different types (Taub, 2001). For this database, we devised a taxonomy of 11 categories of Iconicity, set out in Table 1, in order to provide a more fine-grained classification of the different ways that meaning and form may be related.

Table 1 The different types of iconicity used to classify the entries in the database

The different categories were based both on meaning relations, such as synecdoche (part refers to the whole) or metonymy (associate refers to referent), and on mechanisms known to be used by sign language for representation, such as tokens and tracing as used by entity and SASS classifiers, respectively (for an overview of classifiers see Zwitserlood, 2012), or constructed action (Lillo-Martin, 2012). The list is not exhaustive, and the categories are not mutually exclusive: the form-meaning relationship is often complex, and may involve several processes that contribute to the construction of meaning (Taub, 2001). For example, the sign cuchillo (“knife”) uses the extended index and middle finger to represent the object (“full token”) but also involves a backward and forward motion on the non-dominant hand to represent cutting (“action metonymy”). Furthermore, the form-meaning relationship depends to a certain extent on the subjective perception of what a given sign represents. For example, the sign mayo (“May”) is considered by some signers to be a representation of the hammer and sickle (associated with International Workers’ Day on 1 May), whereas others view the sign to be a representation of the kneeling virgin (May is dedicated to the Virgin Mary in the Catholic calendar). The multilayered and somewhat subjective nature of iconicity became evident during the coding process, and coders were asked to identify the most salient form-meaning relation for each sign. Again, this field is not supposed to be a definitive categorization of the iconicity of the sign but to alert the experimenter to the fact that a sign involves some degree of iconicity.

If the sign makes reference to an object or action that is not the meaning of the sign, this apparent meaning is recorded in the Referent field (for example, the sign monja (“nun”) makes reference to the veil worn by nuns). Since iconicity depends on the relationship between form and meaning, the fields Iconicity and Referent are not relevant to nonsigns, which have no meaning.

In the search interface, Type of sign and Iconicity are available as search criteria and multiple values may be selected for each, making it possible to limit the search to a specific value or a set of values for a given field (e.g., all types of two-handed signs).

Parameters: location

Location is specified by four fields: Plane, Location (Facial/Body), and Point of contact. The original CNSE dictionary provided a very broad coding for location, so we decided to use a more detailed method that could provide greater number of distinctions. The fields and values selected are based on previous work on the articulatory parameters of LSE (Muñoz Baell, 1999; Rodríguez González, 1992), and previous experience coding a sample of LSE lexicon when creating experimental materials (Gutierrez & Carreiras, 2009). The result is a detailed surface description of the place of articulation of each sign.

Plane defines the distance of the sign from the signer’s body, and is particularly useful for signs articulated in neutral space (the space in front of the signer), and may occasionally distinguish between different signs (Muñoz Baell, 1999). The location of the sign is divided between Facial location and Body location (to avoid having a single graphic with all the possible points of articulation thus making the visual display as clear as possible), which are represented by points on a graphic. The points fall into five (color-coded) types: green dots indicate contact with the body, with light green representing an area of the body (e.g., “forehead”) and dark green a specific point (e.g., “centre of the forehead”); blue dots involve no contact, with dark blue representing a general area in the signing space (e.g., “right side of the neutral space”) and light blue a specific part of the space (e.g., “upper right neutral space”); finally, orange dots show those points which are not directly visible on the diagram (e.g., “inner side of the forearm”). For both Plane and Location, separate values are specified for the start and for the end of the sign, since due to movement, these values change during the articulation of the sign. If the sign involves contact, Point of contact defines which part(s) of the dominant hand contacts with some other part of the body.

In the search interface, individual or multiple selections can be made for each field. This is done using a simple graphic interface in which the relevant points are selected by clicking on them (see Fig. 1). To make the interface as clear as possible, a text label is associated with each point and can be viewed by holding the cursor over that specific point. In the case of Plane and Location, the option “At any moment in the sign” makes it possible to collapse across the initial and final values and to find all those entries that have the specified value(s) regardless of position (see the subsection “Search logic” below for more information).

Fig. 1
figure 1

Screenshot of the Location tab of the search tool, showing the graphic interface for defining the search criteria

Parameters: handshape

This category provides information about configuration and orientation of the hand for the leme. For two-handed signs in which the hands have different handshapes, information is given for each hand; for all other types of signs, in which there is only one hand or both hands have the same handshape, only one hand is coded. Handshape is specified as one of 86 different options that are phonologically viable in LSE (based on the contents of the original CNSE dictionary). Additionally, alternative values may be specified to reflect allophonic variation (possible handshapes which would not change the meaning of the sign) in Allophones. Orientation is specified as one of 64 different options that reflect the attested range of hand positions in the original CNSE dictionary. The values for orientation were influenced by the use of the Signwriting notation system used in the CNSE dictionary, and provide values for orientations at intervals of 45° or 90° within an ideal geometric space. As such, the orientation values provide a surface description of the absolute position of the hand, as opposed to the relative or relational values used in some phonological models (Brentari, 1998; Liddell & Johnson, 1989; Uyechi, 1996). Both Handshape and Orientation have initial and final values, to reflect any changes that occur in each field during the articulation of the sign. Additionally, an intermediate value may be specified; this is used only for those polysyllabic signs in which a sign-internal handshape or orientation appears that would not be expected during the transition between the initial and final values. Just 6 % of the lemes include an intermediate handshape or orientation.

In the search interface, all the fields are available as search criteria except Allophones. The values for Handshape and Orientation are defined by dragging graphic symbols onto the relevant slot (initial, intermediate or final) for the sign (see Fig. 2). Values can also be defined for the non-dominant hand; this automatically restricts the search to two-handed signs in which the hands have different handshapes (since these are the only entries which have values for the non-dominant hand). The symbols for the handshapes are transparent cartoons of hands, while the orientation symbols are adapted from the Signwriting transcription method (Parkhurst & Parkhurst, 2001) and a legend with explicatory photographs is provided for clarity. The option “At any moment in the sign” makes it possible to collapse across the initial and final values and to find all those entries that have the specified handshape or orientation value(s) regardless of position.

Fig. 2
figure 2

Screenshot of the Handshape tab of the search tool, showing the selection of various values for the handshape of the dominant hand, and the use of the “At any moment in the sign” option for searching for orientation values regardless of position

Parameters: movement

This category captures the movement described by (the manual part of) the sign, mainly from an articulatory/phonetic point of view but also including phonological considerations, in order to capture as much detail as possible. The articulators involved in the production of the sign are reflected in the field Body part, which specifies both the part of the arm that moves and the type of movement (e.g., “finger adduction”). To give a complete description of the movement, both path movement (from one location to another) and internal movement (that does not involve a translational motion of the hand through space but rather a change in the configuration or orientation of the hand) were considered. The path movement is described by: Path movement, which specifies the overall shape of the movement from a closed set of options; Zigzag, which shows whether an oscillation is added to the main movement; and the From and To fields, which indicate whether the movement has a specific start and end point, respectively, particularly relevant for directional verbs. Internal movement is captured by the fields Handshape change and Orientation change, both of which include the option “trill” to describe wiggling or fluttering movements, of relevance for phonological models of sign language (Brentari, 1998; Sandler, 1993).

If the sign involves contact, the Contact type may be one of a restricted set of types (tap, brush, grasp, etc.), and the Moment of contact is initial, medial, final, or sustained. If the movement of the sign involves Repetition, a distinction is made between restrained repetition, which involves repeating just the final part of the movement, single (full) repetition and multiple repetition, and the Number of executions of the movement is also recorded. The quality of the movement is recorded in the Boolean fields Tense and Fluid, and Speed is marked as normal, fast, or slow. These notions do not normally appear in phonological models, but are included in the database as they may be perceptually salient for visual stimuli, and an experimenter may want to ensure that stimulus sets are balanced for these properties. Since the aim of the database is to provide as full a description as possible of these signs with a view to using them as stimulus material in psycholinguistic experiments, we included information that is relevant from a phonological point of view and also from a articulatory/perceptual perspective, as an experimenter may wish to take into account any combination of these considerations when devising stimulus sets.

In the search interface, all the fields from this category can be used to define the search. Those fields with more than two possible values give the option of making a multiple selection so that the search criteria can be adjusted as closely as possible to the desired outcome (for example, all signs which involve movement of any type in the fingers), thus avoiding the need to carry out multiple searches.

Parameters: non-manuals

This category includes all those elements of signs not expressed on the hands, which are also relevant at the lexical level for sign languages (Sandler & Lillo-Martin, 2006) and for LSE (Herrero Blanco, 2009). There are fields for the Eyes, Eyebrows, Cheeks, Mouth, Head, and Shoulders. Each field has a closed set of possible values, varying from a few options – Eyes and Cheeks have just four values each – to many – Mouth has 34 different values. Additionally, any traces of mouthing derived from a spoken word are captured in the Vocalization field. Spoken components which accompany a sign often undergo a process of reduction (Sutton-Spence & Woll, 1999), so the coders entered an approximate transcription of any mouthing as it could be perceived, not necessarily the full word. Thus, for example, the sign sindicato [trade union] has the value “sinda” in the Vocalization field.

In the search interface, the Non-manuals tab includes all the fields from this category. The fixed values for each field are represented by cartoons which can be dragged into place to make the relevant selection and multiple selections are possible for each field (see Fig. 3). To make the interface as clear as possible, a text label is associated with each cartoon and can be viewed by holding the cursor over a specific graphic, as shown in Fig. 3. Additionally, since the Mouth field has such a large number of possible values, the cartoons were subdivided into four color-coded groups: yellow (mouth closed), purple (mouth open), pink (tongue visible), and green (vocalizations unrelated to spoken language words). The Vocalization field cannot be searched for specific content but can be used to limit the search to entries with or without some element of vocalization from the spoken language.

Fig. 3
figure 3

Screenshot of the Non-manuals tab of the search tool, showing the selection of various values for different fields and the use of the popup cursor tip to view a written label for one of the values of the Mouth field

Inter-rater reliability

As mentioned in the description of the encoding process, an initial test of inter-coder reliability on a small sample of signs (n=10) revealed a high degree of uniformity across the three coders. In order to measure the inter-rater reliability (IRR) more thoroughly, a sample of the database was recoded to compare with the original coding. Since the original coders were no longer available to do this, three new coders carried out the recoding. The recoders were hearing researchers highly proficient in LSE and qualified sign language interpreters (one of whom was the coordinator of the original coding process). Each recoder was randomly paired with one of the original coders and assigned 100 entries that the original coder had transcribed. This meant that 300 lemas (both signs and nonsigns) were recoded, representing almost 6 % of the database. The recoders underwent a similar training process to unify coding criteria, and the recoding process and interface were the same as those employed in the original coding.Footnote 5

The results of this process showed a high rate of agreement between coders, with an overall average of 81 % of agreement (Cohen’s κ=0.65) and little difference between coder pairs (78–82 % agreement; 0.60<κ<0.68). However, although the overall reliability was high, for some specific fields the values were low. The rest of this section provides details of the IRR scores and comments on those that were low (κ<0.6) across the coder-recoder pairs.

The IRR scores for the General Information category (e.g. Sign/Nonsign, Leme type, Gramatical category in Spanish, Grammatical category in LSE) were high (all κ>0.77). The scores for Type of sign showed agreement (all κ>0.62), but for Iconicity were substantially lower across all three coder-recoder pairs (0.54<κ<0.66). This is doubtless due to the fact the categorization of iconicity involves a certain degree of subjectivity, as described above in the discussion of iconicity.

The IRR scores for location were relatively low: values for Facial location were marginal (all κ>0.57) but for Body location they were lower across all coder-recoder pairs (0.35<κ<0.48). This lack of consistency may be due to the fact that location includes a large number of options and many are to some extent overlapping. Furthermore, the case of Body location is complicated by the fact that many values are in the neutral signing space in front of the body and thus difficult to delimit (in comparison to the anatomically anchored locations on the face and head).

For Handshape IRR scores were very high (all κ>0.86), but somewhat lower for Orientation (κ>0.56). This may in large part be due to the fact that the system used to encode the orientation (based on Signwriting notation) gives rise to a certain amount of ambiguity since orientations that fall between the values available may be classified one way or the other. Furthermore, this suggests that considering orientation as a relative phonological feature could provide more consistent (and possibly more meaningful) results.

The IRR values for Movement were mixed. The field Path movement was treated as separate binary subfields for each value: although mutual agreement was relatively high (72–99 %), the κ scores were low for all values except one (Circular: all κ>0.78). Even though the low scores may be heavily influenced by the nature of the data (binary values with an uneven distribution), this confirms that this aspect of signs is difficult to categorize. With respect to internal movement, IRR scores for Handshape change were high (all κ>0.87), while for Orientation change they were considerably lower (0.42<κ<0.64), in accord with the difference in reliability described above for Handshape and Orientation. For Contact and Moment of contact, IRR scores were high (all κ>0.71).

The results of the inter-rater reliability based on the recoding of a sample of the database show a high degree of consistency across raters. Those fields that had lower scores are of two types. In the case of Location, the large number of slightly overlapping options (especially for Body location) made it possible for discrepancies between coders. This problem will be addressed in the next section. The second type are features, such as Orientation or Movement, that are debated and questioned in sign language phonology literature, suggesting that the difficulty lies in properly defining the feature in hand, or even that the feature may not be relevant for identifying signs.

Search tool

The LSE-Sign database is available via the Portal LSE website http://www.bcbl.eu/databases/lse/, and requires (free) registration for access. The website is currently available in written Spanish and includes a detailed set of instructions that include explanations of all the fields in the database. The search interface is highly graphic and was designed to be easy to use. The selection of search criteria is divided across six different categories, which are presented as separate tabs in the interface. The use of these tabs has been described in the previous section; the following subsections describe the search logic implemented in the search tool and how the results are displayed.

Search logic

When values are selected for different fields, this restricts the search to those entries which fulfill the specifications for each field. However, when different values are selected within one field, then the search engine returns all those entries which fulfill any of the specifications for that field. To give an illustrative example, specifying Number of syllables as ‘two’ and the Grammatical category in LSE as ‘noun’ will return all those entries that have two syllables and are also nouns. The ability to select several values for a given field makes it possible for the user to tailor the search according to his or her own categories. To a certain extent, this also overcomes some of the problems with those fields that have lower IRR scores: for fields like Location, with multiple values, the fine-grained encoding meant that coders were more likely to differ in their choice of value (e.g., “High left neutral space” versus “Mid left neutral space”). However, the user can include several values in a search and thus collapse these values into a larger, more inclusive category.

For those measures that have a separate value for different moments of the sign (i.e. Plane, Location, Handshape, Orientation), each moment counts as a separate field. Thus, specifying a particular handshape for both the initial and final moment will return only those signs that start and end with that handshape. To find those signs in which the desired handshape appears (beginning or end) the option “At any moment in the sign” must be used. For fields which contain multiple values (e.g. Lexical field), if a given value is selected in the search, all entries which include that value (and may have additional values) will be returned. In summary, the greater the number of fields with specifications, the more restrictive the search; the greater the number of values specified for a given field, the less restrictive the search.

Search results

The results table is designed to provide a visual overview of the search results, and includes graphic information where possible (the values for Handshape, Orientation and the Non-manual fields use the cartoon symbols; for Location textual descriptions are used as the graphic would be too small to be informative). Additionally, the results include a Preview video of each sign that can be viewed by clicking on the play button for that sign in the table (see Fig. 4). The video includes the front and side views of the sign (see Fig. 5).

Fig. 4
figure 4

Screenshot of the results table, showing the default view which gives an overview of the general properties of a sign plus the option to view the video of the sign using the play button on the right

Fig. 5
figure 5

Screenshot of the video preview of a sign, showing the front and side views

Furthermore, the results table can be adapted by the user to show as much information as desired. The number of items displayed is 25 per page. If the there are more than 25 results, the user can browse page by page or increase the number of results displayed per page to 50, 100, or all the results. As the database contains a great amount of information for each entry (over 50 fields), showing all the fields at once would be unwieldy. By default, the table shows eight columns that give an overview of the most general properties of the sign: Leme, Gloss, Initial Location, Final Location, Initial (Dominant) Handshape, Final (Dominant) Handshape, Path movement, and Preview video (see Fig. 4). However, the user can control the number of fields displayed by using the “Filter fields” option, which displays a list of all the available fields and lets the user select which ones should be displayed (see Fig. 6). This provides much greater control over the visual display of the results and allows the user to focus on the specific categories that are of interest.

Fig. 6
figure 6

Screenshot of the Filter fields window, which allows the user to select which fields are displayed in the results table

Additionally, the user can export the results in text format in order to save a record, or to import the results into an environment that allows further manipulation and filtering, such as R or MS Excel. The “Export” button creates a text file with all the results on the current page, including all the fields (not just the visible fields). Since some of the text fields contain symbols that are typically used as separators, such as the comma or semicolon, the exported text file uses the vertical bar | as a separator. Empty values are blank. Values that have graphic displays in the results table are converted to text in the exported text file: in the case of Handshape and Location, corresponding number values are given; for the Non-manual values, the corresponding text description is used. This functionality complements the on-line results table – designed to provide an at-a-glance overview of the results – with the possibility of obtaining a full record of the results that is machine readable.

Future directions

As mentioned in the introduction, the need to control for different variables is of utmost importance in experimental psycholinguistic research, and this task is particularly difficult when working with signed languages due to a lack of standardized resources. The LSE-sign database described here takes an important step towards the goal of controlling important variables in sign language by creating a large collection of stimuli with carefully controlled visual characteristics and an extensive set of associated data that provide a thorough description of the physical and linguistic properties of each item.

Other properties of lexical items that should be considered are sign frequency, familiarity, age of acquisition, but are not yet available in the current database. In general, very little information of this type is available for sign languages: although some corpora do exist (see http://www.signlanguagecorpora.org/ for current information on sign language corpora) only a handful of lexical frequency studies have been carried out (in New Zealand Sign Language, American Sign Language (ASL), Auslan and British Sign Language (BSL): see Fenlon, Schembri, Rentelis, Vinson, & Cormier, 2014, for an overview). For LSE there is currently no suitable corpus available that could provide lexical frequency measures. An alternative approach is to collect subjective ratings as a measure of familiarity or age of acquisition for a set of lexical items. This approach has been used for BSL (Vinson, Cormier, Denmark, Schembri, & Vigliocco, 2008), ASL (Mayberry, Hall, & Zvaigzne, 2014), and LSE (Carreiras et al., 2008; Gutiérrez, Müller, et al., 2012). The BSL study collected measures for three different indices (age of acquisition, familiarity, and iconicity) and draws attention to the fact that for sign languages measures that are not relevant to spoken languages, such as iconicity, may need to be taken into account when dealing with sign language material.

We intend to expand the LSE-sign database to include lexical indices of this type by collecting subjective ratings for various factors, such as age of acquisition, familiarity, imageability, concreteness, iconicity, and transparency. To perform this second step we will be starting with a subset of stimuli of similar size (300–400 signs) to that used in other sign languages.

Another line of work is to use the database to examine the phonological characteristics of the LSE lexicon. The encoded database represents a detailed snapshot of a substantial proportion of the LSE lexicon given that estimates for the number of lexemes in comparable sign languages are between 3,000 and 4,000 (Johnston & Schembri, 1999). This makes it possible to measure the occurrence of different values of phonological parameters, such as marked or unmarked handshapes (cf. Henner, Geer, & Lillo-Martin, 2013), and to test empirically proposed phonological constraints, such as Battison’s (1978) Dominance Constraint. We have already carried out preliminary work along these lines (Costello & Carreiras, 2013).

Conclusion

LSE-Sign is a free online search tool that offers a flexible and highly visual way of selecting experimental stimuli from 2,400 Spanish Sign Language signs and 2,700 related nonsigns, based on detailed grammatical, phonological, and articulatory information. The interface is designed to allow the user to create customized searches and to control how the results are displayed. The use of such well-controlled stimuli in experiments will help to tease apart which properties of signed languages influence lexical access and their temporal course for providing insight into the current theories of human language and also contribute to better categorizing and identifying the neural bases of sign language processing.