Behavior Research Methods

, Volume 49, Issue 5, pp 1748–1768 | Cite as

Using leap motion to investigate the emergence of structure in speech and language

  • Kerem Eryilmaz
  • Hannah Little
Open Access


 In evolutionary linguistics, experiments using artificial signal spaces are being used to investigate the emergenceof speech structure. These signal spaces need to be continuous, non-discretized spaces from which discrete unitsand patterns can emerge. They need to be dissimilar from—but comparable with—the vocal tract, in order tominimize interference from pre-existing linguistic knowledge, while informing us about language. This is a hardbalance to strike. This article outlines a new approach that uses the Leap Motion, an infrared controller that canconvert manual movement in 3d space into sound. The signal space using this approach is more flexible than signalspaces in previous attempts. Further, output data using this approach is simpler to arrange and analyze. Theexperimental interface was built using free, and mostly open- source libraries in Python. We provide our sourcecode for other researchers as open source.


Artificial language learning Language evolution Leap Motion Python Signal space proxies Combinatorialstructure 


In evolutionary linguistics, artificial language learning (ALL) experiments are becoming increasingly commonplace (Scott-Phillips and Kirby 2010). How humans in the laboratory learn, use and transmit artificial languages can inform our knowledge of how linguistic structure came about via transmission and communication. Previously, these experiments have focused on the emergence of structure on a morphosyntactic level using artificial mini-languages composed from small discrete building blocks (e.g., Kirby et al. 2008). However, there is a growing body of work that is using artificial signaling paradigms to investigate the emergence of combinatorial structure; the level of structure where meaningless building blocks combine to make morphemes or words. Within this work, it does not make sense to initially construct artificial signals from discrete building blocks, as it is the emergence of discrete building blocks, which is of interest. Accordingly, this work uses continuous signal spaces to investigate how combinatorial building blocks emerge in linguistic signals.

In this paper, we will briefly review existing continuous paradigms before presenting the functionality of our own paradigm, which uses the Leap Motion sensor to produce auditory feedback. Further, we present explicit instructions for implementation as an appendix. You can find the download instructions in “Getting the framework”.

Artificial signal spaces

The ideal artificial signal space to investigate the emergence of combinatorial categories and structure is one that prevents interference from pre-existing linguistic knowledge, whilst having a continuous space from which discrete elements can emerge. Here, we will briefly outline issues with continuous signal spaces that have been used in artificial language experiments to investigate the emergence of combinatorial structure. Specifically, we will discuss the problem of interference from iconicity, the problem of data, which is difficult or labor-intensive to analyze, and the problem of having restrictions on the shape and size of artificial signal spaces. This breakdown will hopefully help illustrate how our own paradigm improves on previous work.

Limiting opportunity for iconicity

The use of graphical signals to investigate trends in how communication systems emerge and evolve started with the use of graphical symbols by Healey et al. (2002), and has since grown into its own field of research: “experimental semiotics” (for a review see Galantucci and Garrod, 2011; Galantucci et al., 2012). Experimental Semiotics has significant overlap with ALL experiments in language evolution and is not confined to only graphical signals. However, many of these experiments investigate the effects of communication and transmission by using graphical pictionary-style communication tasks where participants are given a concept to communicate without the use of words (e.g., Garrod et al., 2007; Fay et al., 2008). These experiments are useful for investigating processes such as conventionalization. However, experiments investigating the emergence of combinatorial structure become difficult to design with graphical paradigms, as participants are very familiar with presenting content graphically, both using written language and creating iconic representations via drawing. In addition to this, graphical interfaces make it easy to utilize iconicity, which has been shown to affect the emergence of combinatorial structure (see for instance Roberts et al., 2015 and Verhoef et al., 2015a). Further, different levels of iconicity are possible using different linguistic modalities (Fay et al. 2014), meaning that trends demonstrated with paradigms offering a lot of available iconicity (drawing) may not be extrapolatable to communication mediums with less available iconicity (e.g., speech).

In order to combat these issues of iconicity, Galantucci (2005) developed an approach that used a graphical interface, but had constraints on what participants could do using the apparatus. The interface is a stylus that writes on virtual paper that has a constant downwards drift, so participants can only control signals on the horizontal dimension. Using this approach, Galantucci has conducted social coordination experiments to investigate how communication systems with combinatorial structure can emerge (Galantucci 2005; Roberts and Galantucci 2012), and more specific experiments have been done, which have looked at how rapidity of fading (how quickly signals disappear after production) affects combinatorial structure in signals (Galantucci et al. 2010), and how the potential for iconicity affects the emergence of combinatorial structure (Roberts et al. 2015). There have also been experiments that have used the apparatus in an iterated learning paradigm, where participants’ outputs were fed to other participants in transmission chains to investigate whether combinatorial structure emerges more reliably via vertical transmission (learning) in contrast with horizontal transmission (communication) (Del Giudice 2012).

Ease of analysis

Verhoef has done several experiments that use slide whistles as a proxy for an articulation space (e.g., Verhoef et al., 2014). Her experiments involve participants learning pre-recorded whistles and reproducing them from memory. This has been implemented in an iterated learning experiment to see if learning biases within transmission chains could influence the emergence of combinatorial structure within an inventory of whistles. Verhoef has implemented this in conditions with meanings to investigate the role of iconicity (Verhoef et al. 2015a) and without meanings, to isolate only the role of learning biases (Verhoef et al. 2014). Results have clearly shown that within transmission chains, combinatorial structure emerges and signals become more learnable. However, the output from these experiments is audio recordings of the acoustic signals, meaning that quite a bit of processing is required in order to extract the relevant data (the pitch values at each time frame) before analysis can start. Further, the relationship between stopper-movement and signal-pitch is not linear, and manipulating the mapping between stopper and pitch is not possible. This not only restricts possible experimental designs, but also complicates analysis if the experiment wishes to calculate something like the stopper position from the pitch data.

Since these initial experiments, a computational alternative to the slide whistle has been developed, which can work with a mouse on a screen, or via touch pads on tablets. In one experiment, participants created signals by placing their finger on a virtual slide whistle app on a tablet (Verhoef et al. 2015b). This experiment explored whether some meanings being more easily mappable than others would facilitate communicative success. This has undoubtedly solved the problem of having data that can be analyzed quickly without much processing1 Verhoef has since used a digital signaling apparatus but without auditory feedback, simply having visual signals represented by a bubble that can be moved up and down using a touchscreen (Verhoef et al. 2016).

Both (Galantucci 2005) and Verhoef’s slide whistle experiments have used gaps in the signals to measure structure. The analysis (e.g., in Roberts et al., 2015) starts with a signal already segmented into “forms” where signal parts are separated by a gap (e.g., the stylus lifting off, or the participant stops blowing into the whistle). In other words, the participants were allowed to leave marks at segment boundaries. These boundaries are problematic because they are pre-conventionalized markers for structure. Instead of relying on statistical regularities or negotiating a boundary marker themselves, they are given an explicit way to segment the signals. Comparable cues are available in written language (such as spaces between words), but not necessarily in human speech, and certainly not at the phonemic level.

Flexibility of the signal space

Using slide whistles, the signal space is difficult to manipulate in its shape. An attempt has been made to affect the size of the signal space using slide whistles by putting a stopper on the plunger (Little and De Boer 2014). Reducing the range of pitches that could be produced using the whistle was hypothesized to make combinatorial structure emerge more quickly. Using the stylus paradigm, it is also difficult to manipulate the size, shape or dynamics of the signal space because it is confined to one dimension, causing researchers to manipulate the meaning space, rather than the signal space, to investigate the effect of things such as mappability (e.g., in Roberts et al., 2015).

Using digital slide whistles, it is easier to manipulate the shape of a signal space, and an experiment has been done that looks at the effects of different biases created by nonlinear mappings between the signal-space and the auditory feedback (Janssen et al. 2016). However, being on a flat surface, participants are still often tempted to produce signals as if they were graphical, focusing on the articulation space, rather than the auditory aspects of the signal. Because of this, the digital paradigms mentioned here have so far been limited to a 1 dimensional space (usually pitch). With a 2 dimensional space on a flat surface, participants are even more tempted to just “draw” their referent, as happened in de Boer and Verhoef (2012).

The Leap Motion framework

The Leap Motion framework uses a commercially available, inexpensive, USB-powered sub-millimeter precision infrared hand-tracking device called Leap Motion (Holz 2014). It is a small rectangular box that sits on a desk with an upward-facing camera. It works by building a skeletal hand model from the infrared images it takes, converting each image to a data frame representing the hand(s) it was able to detect in that image. It is able to keep track of individual hands and the associated fingers, as well as their positions, orientations, and velocities.

The proxy we have developed using Leap Motion is conceptually similar to the musical instrument, the theremin. As the participant moves one hand above the sensor, the position of their hand is translated into an auditory tone. In our experiments, participants were only allowed to use one hand and we tracked their palm location, but there is nothing in the framework to limit the number of hands or tracking of specific fingers. The framework is flexible in what features of each Leap frame (or group of frames) would modulate aspects of an auditory signal. Experiments (or “Extherements”) are not necessarily limited to using the position of hands or fingers, but could also use, for example, the angle of the palm with the horizontal plane, or any arbitrary function of one or more data frames.

Advantages of the Leap Motion approach

We have developed a signal space proxy, which we feel improves on the issues raised with previous methods above. Namely, opportunities for iconicity can be controlled, it improves the ease of analysis and, most importantly, is flexible in its geometry, size and in the nature of the signals it can produce. Further, the framework itself is flexible, allowing for use in many different experimental paradigms. Below is a list of all of the crucial features that the framework has, making it fit for purpose.

Continuous signal space

As already mentioned, a continuous signal space is needed to allow for the discretization of signal building-blocks and the emergence of combinatorial structure.

Minimizes interference from pre-existing linguistic knowledge

The Leap Motion framework converts hand movement into auditory feedback. While it is true that gesture is used in communication almost ubiquitously, the gesture that produces signals using this framework is not similar. For one, precise placement of the palm of one hand is not an important feature of co-speech gesture or gesture in sign language. Further, the use of hand-placement to generate precise auditory feedback is not something that occurs in natural language. However, both visual and acoustic signaling may help contribute to the ecological validity of experiments using the framework.

Limits opportunity for iconicity

The Leap Motion framework generates auditory signals that are less iconic than graphic signals. The framework still has a visual element (i.e., the hand position in front of the participant) and, as a result, participants still use this information to try and generate iconic signals. However, it is the auditory signals that are transmitted, not the visual ones, which makes iconicity a less salient feature in the transmitted signals. Iconicity in signals could be combated by transforming mappings between hand-position and auditory feedback, or by designing the signal space to be less intuitive with a given meaning space. Importantly, the framework offers flexibility to make the opportunity for iconicity more or less possible.

Ease of analysis

The signals in our framework start as raw, numeric data that can be converted into perceivable signals such as sounds, eliminating a whole step in the data analysis pipeline. This also makes it possible to automate analyses once the experimental session is over. The capabilities of data analysis using the Leap Motion framework are covered more extensively in “Data Output”.

Flexible signal space

It is possible to change the shape and dimensionality of the signal space using the Leap Motion framework. Having a flexible signal space makes it easier to make a signal space more or less like a signal modality used in natural languages. For example, slide whistles are more constrained than speech, but the results from these experiments are extrapolated to be relevant to speech. The current framework allows for the signal space to be made to be more or less like speech, or more or less like gesture, in order to answer whether previous signal spaces have ecological validity. Further, it allows for comparison of data from signal spaces that differ in a feature, perhaps relevant to the differences between the spoken and manual modalities or differences between previously used signal space proxies.

Ease of deployment

Experimental tools should be easy to deploy. Most researchers are not software developers, and thus prefer relatively simple, pre-configured frameworks that works out of the box. The Leap Motion framework comes with a default configuration that runs a peer-to-peer experiment for two participants, and it pre-packages all dependencies it legally can. It is designed for the use of non-programmers, and configuration changes are made by editing plain text files.

Flexible experimental structure

Our framework not only allows replicating the experiments that we have done utilizing it, but also allows easy implementation of new peer-to-peer artificial language learning tasks by extending the framework. Care has been taken to keep the implementation as modular and configurable as possible, so that the structure and the flow of the experiment (e.g., how interacting agents are chosen, in what order they interact, what are the phases etc.) can be altered with minimal impact to the rest of the functionality.

Open access and extensibility

A critical property, which functions as a precondition to some of the properties above, is the codebase being open access. This not only enables extensibility of the framework by third parties without asking the original developers, but also frees the user base from having to rely on the original developers for bug fixes. Our framework is open source and is free to use and modify.

Getting the framework

The framework has been developed using a Python-based library to serve as an experimental workbench. The framework requires an ordinary modern desktop or laptop computer to run the experiments. As many computers as there are clients are required for practical reasons, and another computer to run the server, if the experimenter would like to keep track of the experiment throughout on a separate machine, though there’s no reason the server cannot be run on the same machine as a client. The framework does not have any specific hardware requirements apart from a Leap Motion sensor.

The code is available at It is open source, and the readers can use, modify, and distribute it as they wish, as well as contact the authors with questions or suggestions. The GitHub page also links to a quick start implementation video on YouTube.

The signals

Auditory signals are produced by moving one hand above the Leap Motion. The signals can be manipulated along any of the dimensions or features that the Leap Motion can detect. The paradigm records hand position above the Leap in three spatial dimensions and the duration of the signal can also be measured or controlled. Different auditory dimensions can be paired with the different spatial dimensions above the Leap Motion (see Fig. 1). For example if the experimenter wishes to have signals be manipulated on the dimension of pitch, then the pitch of signals can be affected by moving the hand left and right, back and front or up and down. Not every dimension needs to be paired with an auditory dimension, or even to auditory output at all (if one wanted to do a purely gestural or visual experiment for instance).
Fig. 1

The Leap Motion controller showing the spatial dimensions used in one of our experiments. There is also the possibility to use the front-back dimension

The mapping between hand position and feedback can be linear, but does not need to be. Signals can be manipulated by pitch (e.g., low to high from left to right) or volume (e.g., quiet to loud from up to down). In our experiments, we have not used mappings that were linear, because in pilot experiments participants found it much more difficult to perceive signal differences at the quiet and high ends of the signal space when the mappings were linear (the transformations used are available in Appendix “A.4 Signals”). It is also possible to produce the signal based on a non-linear combination of multiple dimensions or features, as long as they can be calculated from the Leap frames.

In the current framework, there is a strategy whereby when a hand is withdrawn from the signal space during an experiment, the amplitude of the signal gradually diminishes to zero within less than half a second, instead of cutting it off right away. This dramatically reduces auditory artifacts such as clicks, and makes the resulting signal sound more continuous-sounding.

There is functionality to have duration constraints on the signals in experiment, and to have a progress bar show participants time elapsing if there is a time constraint on signals (see Fig. 2, also “A.4.2 Manipulation of duration” in the Appendix).
Fig. 2

Signal creation screen for an experiment with limited signal durations. The progress bar indicates the time left until maximum duration is reached, though not all experiments will have a time limit on signals

The Meanings

Usually, in artificial signaling experiments, signals refer to meanings (though this is not always the case, see Verhoef et al., 2014). Within the current framework, any image can serve as a meaning. As long as the meaning files follow a naming convention that follows their features, it is easy to select a subset of the meanings based on specified features.

Experimental paradigms

Artificial language learning experiments generally come in three flavors: 1. individual learning or signal creation experiments, where individual participants create or learn and reproduce signals (e.g., Little et al., 2015; Little et al., 2016b), 2. iterated learning experiments, where the reproductions of one individual are taught to the next participant in a transmission chain (e.g., Kirby et al., 2008; Verhoef et al., 2014), and 3. communication games (e.g., Garrod et al., 2010; Roberts et al., 2015), where two or more participants use a created language to communicate with one another about a set of meanings. Several studies have even included a combination of these, e.g., having generations of communication games (e.g., Kirby et al., 2015; Verhoef et al., 2015b).

From a practical, interface-building perspective, nearly all these paradigms can be characterized as being made up, either partially or wholly, from components of peer-to-peer communication games where peers try to invent, learn or transmit signals through pairwise interactions with one another. All such games typically require at least one participant that produces signals and one that recognizes them, but it is perfectly possible for a single participant to assume both roles. For instance, individual learning tasks are communication games where a single participant acts both as the speaker and the hearer, effectively functioning as their own partner or peer in a pairwise interaction. They produce signals, and later, they are asked to recognize their own signals. Iterated learning tasks can also be described as identical-peer communication games, but with the difference that the peer (who once again is both the hearer and the speaker) is changed several times during the experiment to model vertical transmission. Therefore, throughout the current paper we shall review the communication game as the most general paradigm. However, all instantiations of ALLs are possible with the current framework.

Several experiments have already used the Leap Motion framework. These have mostly been individual signal creation experiments, for example (Little et al. 2015), which looked at the differences in signal structure between signals for meanings that differed along continuous dimensions compared to discrete differences, and (Little et al. 2016a), which looked at the effects of different signal dimensionalities on signal structure. There are also upcoming publications on a communication game (some details given in this manuscript), comparing structure and iconicity in signals created in communication or an individual signal creation task.

Structure of experiments

Different experiments need to be structured in different ways, but for the most part, individual learning, iterated learning and communication experiments have a finite number of possible parts to the experiment. They usually need a window to create or reproduce signals, one to recognize signals, and one to provide feedback. The creation/reproduction tasks may be presented in batches or interleaved with the recognition tasks. In the current implementation, both are possible.

Experiments within the framework operate by exchanging message objects back and forth between a server and the client(s). Both the server and client are limited to sending and receiving a single line at a time, where that single line is a serialized form of the message object. There are mechanisms within the framework to ensure that the system fails when it receives unexpected, out-of-order messages to ensure the experiment is flowing exactly the way it should. The framework allows the experimenter to extend the functionality and flexibility of experiment designs by implementing new message classes without interfering with the rest of the messaging scheme.


Within the framework, the experiment design may be aided by the use of phases (blocks of tasks within an experiment that may be repeated). Between phases, different meaning spaces and different signal spaces can be used. For example, the meaning space might need to grow between phases after a set number of meanings have been seen, or a set number of interactions have happened. In our communication game, whether the meaning space grew was dependent on how successful participants had been at communicating the meanings they had seen up until that point. The idea being, that if they hadn’t established signals for existing meanings, they were unlikely to deal with new meanings well.

Each session can keep track of how successful participants are at communicating each meaning. We call “established meanings” any meaning that has been successfully communicated at least twice in a row. All other meanings, as well as established meanings that have recently been communicated incorrectly, are not “established”. By default, the meaning space expands if and only if all the current meanings are “established”, but the experimenter can add any criteria for progression to a new phase of the experiment. It is possible to add arbitrary logic that changes the signal space (e.g., add dimensions, swap dimensions, expand, shrink, etc.).

The framework also allows the probability of choosing a particular meaning as the topic at a particular round to be dependent on whether or not a meaning is established. At each communication round, there is a probability that an unestablished meaning will be picked (see phases section in appendix for details on how to set the appropriate parameters). For example, the experimenter may want 50 % of topics to be unestablished and 50 % to be established. This is a crucial functionality, as if there are far more established meanings than unestablished one, and all meanings are equally likely to be seen, then it will take a long time for unestablished meanings to become established so that the experiment can progress.

Client side interface

The server must be running when the client is launched. Upon launch, the client immediately starts a connection to the server. At first, participants will see a welcome screen that contains instructions, ideally detailing the structure of the experiment and explaining how to use the Leap Motion device, though we also recommend a live demonstration. The auditory feedback is available during the introduction screen so that the participant can practice and become familiar with the device.

In the communication game example, once there are two participants both of whom confirm they have read and understood the instructions by clicking an onscreen button, the experiment starts. The participants can then take turns in being the speaker and the hearer, producing and recognizing signals, respectively. However, whose turn it is to be speaker can also be assigned randomly, or dependent on other events in the experiment.

Participants may receive feedback about their communicative success between the turns, as well as information on the meaning that was intended by the speaker, and the meaning chosen by the hearer.

The experiment ends when participants have either successfully communicated all meanings, or the experimenter manually ends the session (see “Server side interface”), at which point participants are shown a message telling them the session is over and giving the appropriate instructions, e.g., leave the experiment booth.

Client screens

Screens seen by the participants are designed to be as intuitive, fail-safe and user friendly as possible, even if participants have not read the instructions. All screens are displayed full screen without any window decorations such as minimize or exit buttons.

Simple text screens

Simple text screens can be added at the beginning of the experiment, or once before each experimental phase, if needs be.

Signal creation screen

On a signal creation screen, typically a participant will see a meaning for which they must create or reproduce a signal. This screen features the image, and a “Record” button, which can be pressed to start recording a signal. This turns into a “Stop” button, which can be pressed to stop the recording when the participant is done creating their signal. After the participant has created a signal, the button turns into a “Rerecord” button for if the participant is unhappy with their first recording. Participants can also play back a signal they have just created by pressing “Play”. When participants are happy with their signal they can then press “Submit”.

There is also a volume dial, so participants can readjust the volume should it be at an uncomfortable level. This has no bearing on the experimental data being logged, since that data consists of Leap Motion frames (hand position), not yet converted into audio.

Signal recognition screen

In communication game experiments, communicative success is measured by participants’ ability to correctly identify the meaning referred to by the signal of their partner. Signal recognition screens are also helpful in individual signal creation experiments. If the participant knows that they will be tested on their own signals, this creates an incentive for them to create signals that are distinct from one another. In this situation, if a participant is very bad at recognizing their own signals (at chance level) then this may be an indication that the participant is not taking the experiment seriously.

The signal recognition phase screen features one or two lines of instructions. Typically “Choose the image you think the signal refers to”, or similar. There is a “Play” button, which the participant can press to hear the signal to be recognized. There is also a set of possible meanings, which includes the target meaning. This set of images can have any number of elements, and can be chosen from the whole image collection, or from a finite subset of it. The size of this set can be modified. Finally, there is a “Submit” button to send the chosen meaning to the server.

This screen has certain restrictions to ensure valid responses. The participant is unable to press the submission button without listening to the signal at least once, preventing participants from guessing at random without knowledge of the signal. The participant must select an image before pressing submit, and only one meaning can be chosen before submission. Other dependencies are possible.

The volume dial is also present on this screen (see Fig. 3), but does not need to be.
Fig. 3

Signal recognition screen

Transition screens

When a participant is waiting for the other participant to finish a task (such as creating a signal or recognizing one), they are shown filler screens (see Fig. 4). Since in the communication game participants take turns, it is important to make sure they are aware when it is not their turn, and that they are not allowed to interact with anything until it is. This “screen” is technically a semi-transparent modal dialogue that covers the current window. These screens are particularly useful during issues in networking where message passing might be delayed, and the participants tend to be confused unless they see explicit instructions to wait for the other participant.
Fig. 4

The wait dialogue after signal recognition

Server side interface

On the server side interface, each client connection is listed along with the address and unique ID of the participant (see Fig. 5). Once participants have indicated that they are ready on the client side, a new session starts.
Fig. 5

The server side interface as seen by the experimenter. The red numbers that can be seen on the meanings are only seen on the server side, and are the file names of the meanings

The interface displays a list of rounds played in the experiment in real time, and clicking on each round displays a panel containing details such as the speaker’s intended meaning, hearer’s guess and speaker’s signal. It is also possible to playback the signal. Both the round list and the detail panel are updated live during the experiment.

It is possible to end the experiment at any moment using the provided End Experiment button. The server updates the log file at the end of every round, so ending the experiment early has no bearing on the recording of the data up until that point.

Data Output

The data output is based on a single log file generated by the server. This file is updated at the end of each round. So if a session is aborted half way through for whatever reason, a full log of the session (except the unfinished, last round) is saved.

Log files are created that contain signal data and another containing question data. These objects, can be exported in one of the many formats, including comma separated values (csv). See Tables 1 and 2 for the columns available in the data frames.
Table 1

Columns of the response DataFrame, as returned by toPandas p2p() (see Appendix 1)

Response column



Number of exchange, e.g., first exchange in the experiment is round 0.


Number of phase.


client i d of the signal creator.


What meaning was the signal produced for.


The index of this data point in the trajectory.


Hand position on the x-axis i.e., from left to right in millimeters


Hand position on the y-axis i.e., from down to up in millimeters


Hand position on the z-axis i.e., from front to back in millimeters


Frequency of the signal in Hertz. This is useful if you do not have a linear mapping between hand position and your auditory output.


Frequency of the signal in Mel Scale.


Volume of the signal in range [0,1]. This is useful if you do not have a linear mapping between hand position and your auditory output.

Other possible signal variables can be extracted from this output. For example, knowing the hand-position at each data frame within a signal will allow the experimenter to measure how much movement is in a signal or the mean pitch of a signal.2 Each frame is given an integer index starting from 0, so that the duration of each signal (i.e., the number of data frames) is simply the largest data_index value it is associated with.

Online playback experiments

While it is true that it is more difficult to be iconic with continuous auditory signals than it is with graphical signals, iconicity can still exist, and be measured, in signals produced using the Leap Motion framework. In experimental semiotics, one way to measure iconicity is to let naïve participants see or listen to signals and have them pair them with an array of possible meanings (e.g., Garrod et al., 2007). If participants can pair signals with their intended meanings without any knowledge of how they were established, then those signals can be said to be iconic. For the most part, these playback experiments have been conducted online to allow for a massive number of participants, which increases statistical power. In order to make it easy to integrate the signals produced using the Leap Motion paradigm with such online experiments, we have built an interface that allows the experimenter to convert the log files to .wav files (see Appendix 1A.10). It is a small GUI application that allows the user to select a log file, and export all or some of the signals as wave audio files, allowing the experimenter to only select, for example, signals from a specific phase if they would like to compare iconicity of signals produced at the beginning of the experiment to signals produced at the end of the experiment. The application also allows the user to modify the playback rate (see Appendix 1A.4).

Table 2

Columns of the question DataFrame, as returned by toPandas_p2p() (see Appendix 1A.9)

Question Column



Number of exchange, e.g., first exchange in the experiment is round 0.


Number of phase.


client_id of the signal recognizer.

image0, image1, image2, etc.

Which meanings were in the set being selected from.


The meaning that the signal was produced for.


The meaning selected by the participant.


Are the answer and given_answer the same

Limitations and further development

The primary limitation of this paradigm is its reliance on Leap Motion devices. Although the device itself is quite cheap (around $90 at time of writing), we cannot expect it to be widely available in people’s homes. This limits the applicability of the paradigm to laboratory settings where the hardware can be provided. Online experiments (which are becoming more and more prevalent) will not be feasible.

It is possible to use other, possibly native sensors (such as motion sensors of gaming consoles). However, that would require a major rewrite of the theremin component (see Appendix “A.1 Requirements”), but the component itself is quite small. Conceptually, the only change required is to make sure the callback method that receives data frames from Leap Motion receives the new data frames instead (see Appendix 1A.11 for details). Previous work has used infrared gaming sensors (such as the Xbox Kinect) to measure structure and conventionalization in gestural signals (Namboodiripad et al. 2016). The Leap Motion would be less suited to such work, as it has only one vantage point and doesn’t manage well when hands overlap.

The current framework implementation is limited in having exactly one exchange at a time. While this is often desirable, parallel interactions themselves can also be targets of research. This would require a major redesign of the current codebase to accommodate multiple interactions taking place at once. The main challenge would be to handle the parallel execution, and the increased complexity of tracking which meanings are established among which pairs. Moreover, each server instance can only handle a single session, but one can run as many server instances on a machine as possible, as long as each one listens at a different port.

The framework can accommodate more than two participants in a session, and its basic infrastructure can already initiate sessions with more than two participants. However, this has never been meaningfully tested since no such experiments have been implemented to date, and would require some additional customization to phases, e.g., in peer selection.

This framework is not necessarily tied to auditory signals mapping onto visual meanings: one can just as easily modify the framework to use visual representations of hand trajectories to serve as signals, and/or label sounds instead of images that serve as meanings. However, this would require extending the existing meaning- and signal-related classes and the UI to accommodate data from the new modality. The rest of the framework should work as outlined in this article. One restriction is that the meanings should already be present on all the computers participating in the session, so dynamically generated meanings are not supported by this framework. They need to be static and present from the onset.

Development within the Leap Motion framework has generally been progressing on an as-needed basis, and this is likely to continue to be the case. However, functionality for iterated learning experiments is a likely next step. The only significant change necessary is support for using the output of one session as the initial repertoire to reproduce in the next one, enabling the experiment to mimic generations of learners.


We have developed a new signal space paradigm for conducting artificial language learning experiments investigating the emergence of combinatorial structure. This paradigm has improved on previous paradigms, as it allows for manipulation of the availability of iconicity, generates data that is easy to analyze, and is very flexible in terms of the size and shape of the signal space and the nature of the feedback. Ongoing work by the authors is utilizing the signal space flexibility to investigate the effects of the physical aspects of a signaling space in order to understand modality effects, something that was difficult using previous paradigms. Other uses of the paradigm include comparing visual signals to auditory ones, or replicating previous studies in experimental semiotics using auditory signals.


  1. 1.

    1 In Verhoef et al. (2015b) the researchers analyzed some of the data as it was being produced in real time, which was especially useful because some of the data was collected as part of a public exhibition. Accordingly, it was good for participants or visitors to be able to see the data being produced and analyzed in real time.

  2. 2.

    Note that the data frames produced by Leap Motion contain considerably more information than included here, including timestamps for each frame; see the Leap Motion documentation for a comprehensive list of the types of information these frames contain.

  3. 3.

    3 As a side effect of using twisted for networking, client and server connections are not persistent objects. That’s why all persistent information about a session is kept on the relevant connection factory objects, which itself is always kept in object.factory for any connection object.

  4. 4.

    4 Javascript Object Notation, an extremely ubiquitous data exchange format.

  5. 5.

    5 If you change the constant N_OPTIONS, you need to modify toPandas_p2p() to produce the right number of columns.



Open access funding provided by Max Planck Society. This paradigm was built for experiments that are part of the European Research Council project, ABACUS (283435). The authors would like to thank three anonymous reviewers for their suggestions for the improvement of this paper. All errors belong to the authors.


  1. de Boer, B., & Verhoef, T. (2012). Language dynamics in structured form and meaning spaces. Advances in Complex Systems, 15(3), 1150021–11150021–20.CrossRefGoogle Scholar
  2. Del Giudice, A. (2012). The emergence of duality of patterning through iterated learning: precursors to phonology in a visual lexicon. Language and Cognition, 4(4), 381–418.CrossRefGoogle Scholar
  3. Fay, N., Garrod, S., & Roberts, L. (2008). The fitness and functionality of culturally evolved communication systems. Philosophical Transactions of the Royal Society B: Biological Sciences, 363(1509), 3553–3561.CrossRefGoogle Scholar
  4. Fay, N., Lister, C. J., Ellison, T. M., & Goldin-Meadow, S. (2014). Creating a communication system from scratch: gesture beats vocalization hands down. Frontiers in Psychology, 5, 354.CrossRefPubMedPubMedCentralGoogle Scholar
  5. Galantucci, B. (2005). An experimental study of the emergence of human communication systems. Cognitive Science, 29(5), 737–767.CrossRefPubMedGoogle Scholar
  6. Galantucci, B., & Garrod, S. (2011). Experimental semiotics: a review. Frontiers in Human Neuroscience, 5, 11.CrossRefPubMedPubMedCentralGoogle Scholar
  7. Galantucci, B., Garrod, S., & Roberts, G. (2012). Experimental semiotics. Language and Linguistics Compass, 6(8), 477–493.CrossRefGoogle Scholar
  8. Galantucci, B., Kroos, C., & Rhodes, T. (2010). The effects of rapidity of fading on communication systems. Interaction Studies, 11(1), 100–111.CrossRefGoogle Scholar
  9. Garrod, S., Fay, N., Lee, J., Oberlander, J., & MacLeod, T. (2007). Foundations of representation: where might graphical symbol systems come from?. Cognitive Science, 31(6), 961–987.CrossRefPubMedGoogle Scholar
  10. Garrod, S., Fay, N., Rogers, S., Walker, B., & Swoboda, N. (2010). Can iterated learning explain the emergence of graphical symbols?. Interaction Studies, 11(1), 33–50.CrossRefGoogle Scholar
  11. Healey, P.G., Swoboda, N., Umata, I., & Katagiri, Y (2002). Graphical representation in graphical dialogue. International Journal of Human-Computer Studies, 57(4), 375–395.CrossRefGoogle Scholar
  12. Holz, D. (2014). Systems and methods for capturing motion in three-dimensional space. US Patent, 8(638), 989. Scholar
  13. Janssen, R., Winter, B., Dediu, D., Moisik, S., Roberts, S., & McCrohon, L. (2016). Nonlinear biases in articulation constrain the design space of language. In Roberts, S.G., Cuskley, C., Barcel-Coblijn, L., Feher, O., & Verhoef, T. (Eds.) The Evolution of Language: Proceedings of the 11th International Conference (EVOLANG11), (pp. 448–451).Google Scholar
  14. Kirby, S., Cornish, H., & Smith, K (2008). Cumulative cultural evolution in the laboratory: an experimental approach to the origins of structure in human language. Proceedings of the National Academy of Sciences, 105(31), 10681–10686.CrossRefGoogle Scholar
  15. Kirby, S., Tamariz, M., Cornish, H., & Smith, K (2015). Compression and communication in the cultural evolution of linguistic structure. Cognition, 141, 87–102.CrossRefPubMedGoogle Scholar
  16. Little, H., & De Boer, B. (2014). The effect of size of articulation space on the emergence of combinatorial structure. In Cartmill, A., Roberts, E.S., Lyn, H., & Cornish, H. (Eds.) The Evolution of Language: Proceedings of the 10th international conference (EvoLangX), vol. 10, (pp. 479–481). World Scientific.Google Scholar
  17. Little, H., Eryilmaz, K., & De Boer, B. (2015). Linguistic modality affects the creation of structure and iconicity in signals. In Noelle, D.C., Dale, R., Warlaumont, A.S., Yoshimi, J., Matlock, T., Jennings, C., & Maglio, P. (Eds.) The 37th annual meeting of the Cognitive Science Society (CogSci 2015), (pp. 1392–1398). Austin, TX:Cognitive Science Society.Google Scholar
  18. Little, H., Eryilmaz, K., & de Boer, B. (2016a). Differing signal-meaning dimensionalities facilitates the emergence of structure. In Roberts, S., Cuskley, C., McCrohon, L., Barceló-Coblijn, L., Feher, O., & Verhoef, T. (Eds.) The Evolution of Language: Proceedings of the 11th International Conference (EVOLANG11), (pp. 182–190).Google Scholar
  19. Little, H., Eryilmaz, K., & de Boer, B. (2016b). Emergence of signal structure: Effects of duration constraints. In Roberts, S., Cuskley, C., McCrohon, L., Barceló-Coblijn, L., Feher, O., & Verhoef, T. (Eds.) The Evolution of Language: Proceedings of the 11th International Conference (EVOLANGX11), (pp. 468–470).Google Scholar
  20. Namboodiripad, S., Lenzen, D., Lepic, R., & Verhoef, T. (2016). Measuring conventionalization in the manual modality. Journal of Language Evolution. lzw005.Google Scholar
  21. Roberts, G., & Galantucci, B. (2012). The emergence of duality of patterning: Insights from the laboratory. Language and Cognition, 4(4), 297–318.CrossRefGoogle Scholar
  22. Roberts, G., Lewandowski, J., & Galantucci, B. (2015). How communication changes when we cannot mime the world: experimental evidence for the effect of iconicity on combinatoriality. Cognition, 141, 52–66.CrossRefPubMedGoogle Scholar
  23. Scott-Phillips, T.C., & Kirby, S. (2010). Language evolution in the laboratory. Trends in Cognitive Sciences, 14(9), 411–417.CrossRefPubMedGoogle Scholar
  24. Verhoef, T., Kirby, S., & Boer, B. (2015a). Iconicity and the emergence of combinatorial structure in language.Google Scholar
  25. Verhoef, T., Kirby, S., & De Boer, B. (2014). Emergence of combinatorial structure and economy through iterated learning with continuous acoustic signals. Journal of Phonetics, 43, 57– 68.CrossRefGoogle Scholar
  26. Verhoef, T., Roberts, S.G., Dingemanse, M., Warlaumont, A.S., & Jennings, C. (2015b). Emergence of systematic iconicity: Transmission, interaction and analogy. In Noelle, D.C., Dale, R., Yoshimi, J., Matlock, T., & Maglio, P. (Eds.) The 37th annual meeting of the Cognitive Science Society (CogSci 2015), (pp. 2481–2487). Austin, TX: Cognitive Science Society.Google Scholar
  27. Verhoef, T., Walker, E., & Marghetis, T. (2016). Cognitive biases and social coordination in the emergence of temporal language. In Papafragou, A., Grodner, D., Mirman, D., & Trueswell, J.C. (Eds.) The 38th annual meeting of the Cognitive Science Society (CogSci 2016), (pp. 2615–2620). Austin, TX: Cognitive Science Society.Google Scholar

Copyright information

© The Author(s) 2016

Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Authors and Affiliations

  1. 1.Vrije Universiteit Brussel, Artificial Intelligence LaboratoryBrusselsBelgium
  2. 2.Max Planck Institute for PsycholinguisticsLanguage and Cognition DepartmentNijmegenThe Netherlands

Personalised recommendations