Introduction

For humans, like many other species, survival depends on the ability to perceive what others are doing and predict what they may be intending to do. Biological motion provides a rich source of information in support of this skill (Blake & Shiffrar, 2007; Johansson, 1973). Human observers have no trouble identifying what an actor is doing in a given point-light display (e.g., Dittrich, 1993; Vanrie & Verfaillie, 2004). Even when the range of potential activities is quite large, they readily recognize individual actions and the associated emotions (Alaerts, Nackaerts, Meyns, Swinnen, & Wenderoth, 2011; Brownlow, Dixon, Egbert, & Radcliffe, 1997; Dittrich, Troscianko, Lea, & Morgan, 1996; Pollick, Paterson, Bruderlin, & Sanford, 2001; van Boxtel & Lu, 2011; Walk & Homan, 1984), are able to understand the intentions of the actor, and can detect a violation of his/her expectations (Runeson & Frykholm, 1983).

These findings highlight the importance of biological motion in the recognition of individual actions, i.e., actions performed by a single agent in isolation. Whether and how humans use biological motion to understand social interactions, however, is far less clear. In an influential study, Neri and colleagues (Neri, Luu, & Levi, 2006) first demonstrated that human observers integrate biological motion information from multiple individuals. Participants observed point-light displays of two agents fighting or dancing together. When the agents interacted in a meaningful synchronized fashion, visual detection of one agent was enhanced by the presence of the second agent. This suggests that the human visual system relies on the interaction dynamics between the two agents to retrieve information relating to each agent individually (see also Thurman & Lu, 2014). Subsequent studies extended these findings by showing that, even without any physical contact between the agents, the gestures of one agent can serve as a predictor of the actions of the second agent (Manera, Becchio, Schouten, Bara, & Verfaillie, 2011a; Manera, Del Giudice, Bara, Verfaillie, & Becchio, 2011b; Manera, Schouten, Verfaillie, & Becchio, 2013). Recent works have begun to explore perception of social interaction from biological motion in infants (Galazka, Roché, Nystrom, & Falck-Ytter, 2014) and pathological populations such as patients with autism spectrum disorder (Centelles, Assaiante, Etchegoyhen, Bouvard, & Schmitz, 2013; von der Luhe et al., submitted) and schizophrenia (Okruszek et al., 2015). The exact characteristics and the neural substrate supporting interpersonal action coding, however, remain unclear.

Although the point-light technique offers many advantages to researchers investigating perception of biological motion, the complexity of constructing stimuli depicting interacting point-light agents has so far limited the use of this technique in studying social interactions. Indeed, while several databases exist for the study of individual actions from point-light displays (Vanrie & Verfaillie, 2004; Ma, Paterson, & Pollick, 2006; Shipley & Brumberg, 2005), only few stimulus sets are available for the study of the actions of interacting agents (Manera et al., 2010; Zaini et al., 2013). Zaini and colleagues (Zaini, White, Fawcett, & Newman, 2013) recently published a database of point-light communicative hand gestures and non-communicative pantomimed hand actions. To the best of our knowledge, however, the only database to present two interacting agents, rather than just one, is the Communicative Interaction Database (CID; Manera, Schouten, Becchio, Bara, & Verfaillie, 2010).

The CID database contains 20 full-body communicative action sequences performed by two female and two male actors. Following Dekeyser, Verfaillie, and Vanrie (2002), stimuli were constructed by combining motion capture techniques and 3-D animation software to provide precise control over the computer-generated actions and allow the actions of the two agents to be independently manipulated.

In the present work, we describe the Communicative Interaction Database-5AFC format (CID-5), a new and updated version of the CID which addressed some limitations of the original CID (see Discussion section). The CID-5 consists of 14 communicative interactions selected from the CID – the best recognized stimuli based on the normative data (Manera et al., 2010) – and seven non-communicative individual actions performed by two agents, not included in the CID. For each action stimulus, we provide coordinate files and movie files depicting the action as seen from four different perspectives. Furthermore, for each action stimulus, we provide five action alternatives, including the correct action description, two communicative and two non-communicative individual alternatives to be employed in forced-choice response paradigms. In the following paragraphs we first describe the method used to construct the action stimuli and generate the response alternatives included in the 5AFC task, then we provide a detailed description of CID-5 database, including all the materials available for download. Finally, we provide normative data collected to assess the recognizability of the stimuli using the 5AFC format, and compare these results with those collected with an open-ended response format by Manera et al. (2010).

Stimulus construction

A detailed and extensive discussion of the technical method has been published previously (Dekeyser et al., 2002; Vanrie & Verfaillie, 2004; Manera et al., 2010), so we limit the description of the stimulus construction to a summary of the major steps in the process.

Motion capturing

The movements of four actors, two females and two males, each wearing 30 reflective spherical markers (placed on anatomical locations specified in Vicon’s Body-Builder 3.5 Manual; Oxford Metrics, 1997) were recorded using a Qualisys MacReflex motion capture system (Qualisys; Gothenburg, Sweden), consisting of six 30-Hz position units (i.e., six cameras and corresponding video processors). We recorded 14 communicative interactions and seven individual actions. For the communicative interactions, the two female and the two male actors worked in pairs and were assigned to a “communicator” and “responder” role. The communicator (agent A) always initiated the interaction by performing a communicative gesture; the responder (agent B) perceived the communicative gesture and acted in response. To ensure that the responder’s action matched the communicator’s gesture in all respects (e.g., timing, position, kinematics), interactions were captured in real time, with the actors facing each other, at a distance of approximately 2 m. Individual actions were performed by agent A acting in isolation. Objects (e.g., table, chair, coins, fruits) were present during the production of actions to aid the actors in producing natural movements.

Data processing

After the capture session, the 2-D data from all the position units were processed offline to calculate the 3-D coordinates of the markers. The data from the markers were imported in Character Studio (Autodesk, 1998), a software package that was created for use with 3D Studio MAX (Autodesk, 1997). This allowed us to animate a biped for each actor, consisting of a transparent skeleton and 13 bright dots attached to the center of the major joints (shoulders, elbows, wrists, hips, knees, and ankles) and the head. Next, 3-D coordinates for each of the 13 dots for all the frames constituting each actor’s action were extracted, and some manual smoothing was performed to avoid any remaining “jumpy” dot movements.

To create the actual movie files, the smoothed data were imported into 3D Studio as moving bright spheres, and all the frames of the action were rendered as .avi files from four different viewpoints. An orthographic projection was used, and there was no occlusion, so no explicit depth cues were available. To create the communicative action stimuli .avi files, data from the two actors of each couple were imported into the same 3D studio environment. To create the individual action stimuli .avi files, we imported into the 3D Studio environment the 3D coordinates for all the frames constituting the action of agent B (the “responder” in the communicative interaction; e.g., “sitting down”) together with the 3D coordinates for all frames constituting an individual, unrelated action performed by agent A (e.g., “drinking”). For each individual action stimulus, the individual action was chosen so as to match the duration of the original communicative action performed by the “communicator,” and was displayed according to the original timing. Additionally, the distance and position with respect to agent B were kept constant. Objects present in the scene during motion capturing were never visible in point-light displays.

An example of a communicative and an individual action stimulus is reported in Fig. 1.

Fig. 1
figure 1

Example of a communicative action stimulus (agent A asks agent B to squat down) and an individual action stimulus (A turns over, B squats down). The grey silhouettes depicting the human form are not visible in the stimulus display

Response alternatives

Based on the normative data collected by Manera et al. (2010), for each stimulus we selected the best recognized version, performed by either the male or the female couple. For each selected stimulus, we then generated five response alternatives, including the correct action description (based on the description provided to the actors) and four incorrect response alternatives, as reported in Table 2. The incorrect response alternatives were generated according to the following criteria. For each action stimulus (e.g., A asks B to walk away), two incorrect communicative alternatives (e.g., A opens the door for B; A asks B to move something) and two incorrect non-communicative alternatives (A stretches; A draws a line) were generated by modifying the description of the action of agent A. The descriptions provided by participants in the CID study (Manera et al., 2010) were used as a starting point, in order to ensure that the predetermined response alternatives included responses people would spontaneously give. All alternative action descriptions were constructed to be physically compatible with the action performed by agent A. For instance, if agent A performed an arm movement, then reference to arm movement was included in all incorrect response alternatives describing the action stimulus. Finally, to avoid that for communicative stimuli the correct alternative was selected simply based on the congruence between the actions of the two agents (i.e., agent A asks B to perform an action, and agent B responds accordingly), for each action stimulus, one of the incorrect communicative alternatives always described a congruent interaction between the two agents. The description of the action of agent B was the same for all response alternatives.

The CID-5 Database

The database consists of a .rar archive that contains 21 folders, one folder for each of the actions listed in Table 1. Each action folder contains coordinate files and movie files depicting the action as seen from four different perspectives. Furthermore, the archive contains a text file with a list of the action alternatives provided for every action stimulus. The database can be retrieved from the supplementary materials of this article, or from the website of the Biology of Social Behavior Laboratory, University of Turin (http://bsb-lab.org/research/).

Table 1 Description and features of the action stimuli included in the CID-5. Please note that according to the taxonomy of speech acts (Searle, 1969, 1979), all the communicative (com) actions included in the CID-5 are classified as directive, meaning that the act of the communicator is intended to cause the responder to perform a certain action. Individual (Ind) action stimuli were not included in the CID

Actions

A brief description of each action stimulus is reported in Table 1. For each stimulus, we report the stimulus classification (communicative vs. individual), a brief description of the actions of agent A and agent B, the stimulus duration (number of frames and milliseconds), the actors’ gender, and the presence of (invisible) objects in the original scene. Furthermore, for the communicative stimuli, we also report the type of act (Searle, 1969, 1979) and the social motivation, representing why an agent performs a specific communicative gesture (e.g., sharing, requesting, asking for information; Tomasello, Carpenter, & Liszkowski, 2007), together with the original stimulus name in the CID database.

Coordinate files

The coordinate files are listings of the 3-D coordinates of the point lights, saved in two different file formats: .txt and .pdf. Each action folder contains two coordinate files, representing the actions of the two actors. The filenames have the following structure: action_role, where action describes the specific action (communicative or individual) performed by agent A (listed in Table 1), and role is the role of the actor in the couple (A or B; in the communicative stimuli, A is the communicator performing the communicative gesture, B the responder acting in response).

The first line in the files provides the values of three parameters: the number of frames making up the action (dependent on the action), the number of markers (always 13), and the frame rate (always 30 Hz). The remainder of the file consists of the 3-D coordinates of the 13 markers for each frame of the action (Z-up coordinate system). The first 13 lines give the three coordinates of each of the 13 markers in the first frame, with the first number of each line indicating the width (x, oriented left to right), the second number indicating the depth (y, oriented towards the viewer), and the last number indicating the height (z, oriented bottom to top). The order of the point lights in the file is as follows: head, shoulders, elbows, wrists, hips, knees, and ankles. The subsequent 13 lines provide the coordinates for the second frame, and so on. The coupled actions of agent A and B always have the same number of frames. The 3-D coordinates are centered. That is, the averages of the horizontal, vertical, and depth coordinates of A and B coincide with the center of the coordinate system (0,0,0).

Movie files

The movie files consist of point-light displays depicting the actions of the two agents in four different depth orientations: two lateral views (A positioned on the right [90°] and A positioned on the left [270°]), in which the coronal plane of the two actors is more or less perpendicular to the projection plane, and two three-quarter views (A on the right and seen from the front [125°] and A on the left seen from the back [305°]; see Manera et al., 2010 for further details). The filenames have the following structure: action_ orientation, where action describes the specific (communicative or individual) performed by agent A (listed in Table 1), and orientation describes the perspective (90°, 125°, 270°, and 305°). The two point-light figures, white against a black background, are approximately equidistant from the center of the screen. Movie files are .avi files with a resolution of 640 × 480 pixels and a frame rate of 30 frames/s.

List of the response alternatives

The ‘Alternatives.doc’ file is a text file reporting the list of the five response alternatives for each action stimulus (see also Table 2).

Table 2 Response alternatives and results collected from 95 participants (percentage of participants who responded correctly to question 1, and who reported each of the action alternatives)

Collection of normative data

In order to validate the 5AFC format of the CID-5 Database, we examined how well each stimulus was recognized by naïve participants and then compared these results with those collected by Manera et al. (2010) on CID stimuli employing an open response format.

Participants

One hundred and thirteen students from the Faculty of Psychology at the University of Turin (Italy) volunteered to take part in this study, and received course credits for their participation. Eighteen participants were excluded from data analysis due to missing responses, leaving the final sample to 95 participants (ten of them male and 85 female; mean age= 22.9 years, SD= 5.6, age range= 19–58). All the participants were naive as to the purpose of the study and had no previous experience with point-light displays.

Methods

Participants were tested in group in a conference room (250 seats arranged in ten rows), with a central projection screen. The 21 action stimuli were presented in a randomized order. Each action stimulus consisted of the same action repeated from two different perspectives (90° and 125°), with the two videos separated by a 500-ms fixation cross. After the second repetition of each video, participants were asked to decide whether the two agents were communicating as opposed to acting independently of each other (question 1), and then to select the correct action description among the five response alternatives, presented in a randomized order (question 2).

Results

For each question and for each stimulus, we calculated whether the proportion of correct responses differed from chance level – that is, from .5 for question 1 (corresponding to 50 % of correct responses) and .2 for question 2 (corresponding to 20 % of correct responses) –by employing binomial tests. Bonferroni corrections were applied to adjust for multiple comparisons (α = .05/21, = .0023).

The percentage of participants who responded correctly to each action stimulus is reported in Table 2. The “Com vs. Ind” column indicates the percentage of correct responses to question 1 (classification of the action as communicative vs. individual). The column “Action” indicates the percentage of responses provided for each of the five response alternatives. The first action alternative (in bold) reports the correct description.

Classification as communicative versus individual

On average, action stimuli were correctly classified as communicative versus individual (question 1) by 91 % of the participants (SD= 8 %; range= 65–100 %; communicative stimuli, M= 90 %, SD= 10 %; individual stimuli, M= 92 %, SD= 6 %). The action that was least consistently recognized was “Move this down” (correctly classified as communicative by 65 % of the participants). Classification of the action as communicative versus individual was above chance level for 20 out of the 21 action stimuli (all ps ≤ .001). ). For the action “Move this down,” the binomial test just approached statistical significance (p = .004).

Action identification

Concerning the selection of the correct action description (question 2), on average, action stimuli were correctly identified by 77 % of the participants (SD= 17 %; range= 26–99 %). Examples of very well recognized stimuli are “Stop” (97 %) and “Imitate me” (99 %) for the communicative action stimuli, and “Jump” (99 %) and “Turn over” (91 %) for the individual action stimuli. Although the overall recognition rate was high, some consistent misidentification occurred. For example, for “Go out of the way,” the incorrect action description “A asks B to come closer. B moves over” was selected by 62 % of the participants. Similarly, 52 % of the participants misidentified “Look under the foot” as “A stretches. B moves something.” Binomial tests revealed that participants performed significantly better than the chance level for 20 out of 21 action stimuli (all ps < .001). Performance was not above the chance level only for the action “Go out of the way” (p = .084).

For action stimuli identified by less than 60 % of the participants and for which more than 20 % of the participants selected the same incorrect action description (n =3; “Go out of the way,” “Sit down,” and “Look under the foot”), in Table 2 we provide an alternative action description to be used as a substitute for the confusable incorrect description. These new action descriptions were employed in another study (Manera et al., submitted), and were only rarely (or never) selected by healthy adults, thus suggesting that they were indeed less misleading compared to the alternatives tested in the present study. [Please refer to Manera et al. (submitted) for more details about results and procedures.]

Comparison between the CID and the CID-5 databases

The main differences in the materials included in the CID database and the CID-5 database are summarized in Table 3.

Table 3 Summary of the materials provided in the CID and CID-5

In order to investigate differences in action identification between the forced-choice paradigm employed in the CID-5 database and the open-ended response format employed in the CID database (where participants were asked to generate a description of each action stimulus, see Manera et al., 2010), action identification accuracy for each action stimulus was submitted to separate chi-square tests, with Group (CID vs. CID-5) as the between-subjects factor. A Bonferroni correction was applied to adjust for multiple comparisons (α = .05/21, = .0023). The results revealed no significant difference between the open response format and the forced-choice format for 19 out of 21 action stimuli (χ2 ranging from .00 to 6.61, φ ranging from − .21 to .19, all ps > .011). A significant difference was found for the actions “Go out of the way” (χ2 = 53.94, φ = −.60, p < .001) and “Look under the foot” (χ2 = 13.94, φ = −.31, p < .001). Specifically, the percentage of correct responses for those actions was significantly lower for the CID-5 forced-choice format (26 % for “Go out of the way,” 48 % for “Look under the foot”) compared to the CID open-ended response format (89 % for “Go out of the way”, 80 % for “Look under the foot”), thus confirming that some response alternatives in the 5AFC task were very misleading.

Taken together, these results indicate that the 5AFC employed in the CID-5 and the open response format employed in the CID (Manera et al., 2010) yield comparable levels of accuracy in action identification for most of the action stimuli.

Discussion

In the present paper we describe the CID-5, a database of 21 full-body point-light stimuli depicting two agents engaged in communicative interactions (N=14) or performing non-communicative individual actions (N=7) as seen from different viewpoints. For each stimulus, we provided five plausible response alternatives (only one being correct), and we collected normative data on 95 naive participants to assess stimulus recognizability. Results confirm that stimuli included in the CID-5 are highly recognizable, thus suggesting that information contained in these point-light displays is sufficient to distinguish communicative interactions from non-communicative individual actions, and to recognize the specific communicative or individual actions performed by the agents.

We are convinced that our stimulus set may represent a useful tool for researchers working on communication and social cognition, since an important factor in the advancement of those studies is the availability of suitable stimulus material. In particular, the CID-5 has several advantages compared to the existing databases of communicative actions, such as the CID (Manera et al., 2010). First of all, the CID-5 contains communicative interactions and non-communicative individual actions directly comparable in terms of the method used for stimulus construction, stimulus format (size, distance between the actors, available viewpoints), and action duration. The presence of individual control stimuli is crucial to test whether human observers are able to discriminate between communicative and non-communicative action stimuli, i.e., to test whether they are sensitive to interaction dynamics (Manera et al., 2011a; b; 2013). Furthermore, the presence of well-matched control stimuli represents a key advantage for studies aiming to investigate the neural correlates of social interaction (Centelles et al., 2011).

Second, in the CID-5 we provided for each action stimulus five possible response alternatives (the correct action description, two incorrect communicative alternatives, and two incorrect non-communicative alternatives) to create a forced-choice response paradigm. Asking participants to provide a free description of the stimuli can capture the spontaneous intention attribution process and is sometimes the best option. However, use of open-ended responses may be problematic in certain experimental setups and populations. For instance, open-ended responses may be difficult to answer for children with developmental expressive language disorder (Simms & Schum, 2011), patients with acquired brain damage (Angeleri et al., 2008), patients with neurodegenerative pathologies such as Alzheimer’s disease (Henry, Crawford, & Phillips, 2004), and patients with autism and schizophrenia (Groen, Zwiers, Vandergaag, & Buitelaar, 2008; Delisi & Lynn, 2001). An additional advantage of the forced-choice questions is that they decrease the number of missing responses, as the participant can be prompted to select one of the alternatives. Furthermore, the forced-choice format can facilitate response scoring, a process that can sometimes be challenging with the open-response format, especially when employing visually degraded stimuli (such as point-light animations) representing complex human actions.

Third, we reported normative data collected on a large sample of participants (N=95), specifying not only the percentage of correct responses for each action stimulus, but also the percentage of participants that selected each of the incorrect response alternatives. These data may be used as a guideline by researchers interested in creating easier or more challenging versions of the task. On the one hand, a 3AFC format may simplify the intention recognition task, especially if the more misleading alternatives are removed from the response list. Similarly, easier versions of the task may be obtained by randomizing the alternatives across different action stimuli, so that the proposed alternatives are physically incompatible with the displayed action. On the other hand, more challenging versions may be obtained by increasing the number of alternatives. A 7AFC task, for example, may be more challenging than the original 5AFC task, especially if the new alternatives are similar to the correct response or to the misleading alternatives.

Limitations and future research directions

Despite the important advantages of the CID-5 compared to the existing databases of communicative actions, some limitations should be noted. First of all, the CID-5 contains a limited number of non-communicative individual stimuli (N=7). This may represent a problem for studies needing to employ a larger number of stimuli (e.g., neuroimaging studies). However, it should be noted that in the CID-5 we also provided listings of the 3-D coordinates of the point lights for each stimulus and for each actor. This allows researchers to create different non-communicative individual stimuli by combining different actions performed by agent B (the respondent in the communicative interactions).

Another limitation lies in the paradigm employed for the collection of normative data assessing stimulus recognizability. Specifically, participants were presented with the same action stimulus as seen from two different perspectives, a lateral view and a three-quarter view, and were asked to select the correct response alternative only after the second repetition of the video. However, we know from previous studies that the representations of both individual actions and social interactions are viewpoint-dependent (Daems & Verfaillie, 1999; De la Rosa, Mieskes, Bülthoff, & Curio, 2013; Verfaillie, 1993, 2000) and that viewpoint modulates the cortical response to visually presented actions (e.g., Kilner, Marchant, & Frith, 2006; Perrett et al., 1989). It would thus be important to collect normative data assessing separately the recognizability of each action perspective. These data may represent further guidelines to be employed for stimulus selection.