Abstract
EMOKINE is a software package and dataset creation suite for emotional full-body movement research in experimental psychology, affective neuroscience, and computer vision. A computational framework, comprehensive instructions, a pilot dataset, observer ratings, and kinematic feature extraction code are provided to facilitate future dataset creations at scale. In addition, the EMOKINE framework outlines how complex sequences of movements may advance emotion research. Traditionally, often emotional-‘action’-based stimuli are used in such research, like hand-waving or walking motions. Here instead, a pilot dataset is provided with short dance choreographies, repeated several times by a dancer who expressed different emotional intentions at each repetition: anger, contentment, fear, joy, neutrality, and sadness. The dataset was simultaneously filmed professionally, and recorded using XSENS® motion capture technology (17 sensors, 240 frames/second). Thirty-two statistics from 12 kinematic features were extracted offline, for the first time in one single dataset: speed, acceleration, angular speed, angular acceleration, limb contraction, distance to center of mass, quantity of motion, dimensionless jerk (integral), head angle (with regards to vertical axis and to back), and space (convex hull 2D and 3D). Average, median absolute deviation (MAD), and maximum value were computed as applicable. The EMOKINE software is appliable to other motion-capture systems and is openly available on the Zenodo Repository. Releases on GitHub include: (i) the code to extract the 32 statistics, (ii) a rigging plugin for Python for MVNX file-conversion to Blender format (MVNX=output file XSENS® system), and (iii) a Python-script-powered custom software to assist with blurring faces; latter two under GPLv3 licenses.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Summary & background
Summary
EMOKINE is a software and dataset creation framework for highly controlled kinematic datasets of emotionally expressive full-body movements. The primary novelty of this framework is that it provides the research community with a set of 12 readily computed kinematic features (32 statistics in total) that can be used out-of-the-box by researchers in order to model emotional expressivity in full-body movement. For the first time, these 12 features are presented together: speed, acceleration, angular speed, angular acceleration, limb contraction, distance to center of mass, quantity of motion, dimensionless jerk (integral), head angle (with regards to vertical axis and to back), and space (convex hull 2D and 3D). A pilot dataset accompanies this article, of realistic full-body movement stimuli, which have been designed and instantiated with all other parameters controlled so that most of the variability stems from the different expressed emotions. The pilot data for EMOKINE were recorded via the XSENS® system, however, the software is also appliable to data obtained from other motion-capture systems with minimal to no changes.
Based on the creation procedure of this pilot dataset, we describe a process by which such a dataset creation can be scaled up in the future, while ensuring mandatory levels of experimental control that are key for datasets to be used in experimental psychology and affective neuroscience with human participants (Christensen & Calvo-Merino, 2013; Christensen & Jola, 2015), and ensuring the technical detail required for datasets in computer vision and related disciplines. Importantly, instead of following the model of traditional datasets in emotion science that comprise emotional-‘action’-based stimuli, like hand-waving or walking motions, the EMOKINE pilot dataset contains complex sequences of movements: 6-s-long dance choreographies. A dancer choreographed a series of short, simple ballet-movement-inspired dance sequences of approximately 6 s in length, which corresponds to eight counts in dance notation. Then she performed the sequences six times each, expressing a different emotional intention at each repetition – namely anger, contentment, fear, joy, neutrality, and sadness. Classically, most datasets include only the ‘basic’ emotions proposed by Paul Ekman and colleagues (Ekman, 1973/2015; Ekman & Friesen, 1971) – namely, anger, fear, joy, neutrality, and sadness. We extended the spectrum of expressed emotions in the EMOKINE pilot dataset by also adding the emotion ‘contentment’, which is another positive-valence emotion like joy, but of low arousal; symmetrical to what anger (negative valence, high arousal) is to sadness (negative valence, low arousal).
The EMOKINE suite includes:
-
(a)
a detailed step-by-step procedure guide to create EMOKINE datasets at scale;
-
(b)
a pilot dataset (recorded with the XSENS® system) with four different visual presentations for each video stimulus – namely (i) avatars, (ii) full-light displays (FLDs) with blurred face, (iii) point-light displays (PLDs), and (iv) silhouettes;
-
(c)
the raw kinematic data for each stimulus of the pilot dataset, obtained via the XSENS® motion capture (LINK) system (via 17 sensors distributed on the body, recorded at 240 frames/second);
-
(d)
the code to obtain 32 statistics from the 12 kinematic features;
-
(e)
human observers’ emotion recognition rates and value judgments, which confirm the intended emotional categories of the pilot dataset.
The pilot dataset of EMOKINE is available for download on the Zenodo repository and the software on GitHub. Releases on GitHub include:
-
(a)
an extensive repository of code to extract the 32 statistics of the 12 kinematic features – namely, speed, acceleration, angular speed, angular acceleration, limb contraction, distance to center of mass, quantity of motion, dimensionless jerk (integral), head angle (with regards to vertical axis and to back), and space (convex hull 2D and 3D). Average, median absolute deviation (MAD) and maximum value were computed for each;
-
(b)
A MVNX rigging plugin for Python that allows Blender to convert MVNX files to a Blender-friendly format (MVNX = output file of the motion capture XSENS system);
-
(c)
Python-script-powered custom software to assist with the process of blurring faces. The latter two have been released under GPLv3 licenses, and all are available for download on GitHub (see data availability statement, Section 11).
The GitHub readme file includes an explanation on how to apply the EMOKINE software to data obtained from other systems. In particular, points (a) and (c) can be applied to any data including keypoint positions with minimal to no change. Point (b) naturally depends on the MVNX format (which is given by the XSENS system), but it can be ignored for other formats.
Background
Investigating emotion recognition competence is important due to its significance for psychosocial functioning (Cosmides & Tooby, 2000; Darwin, 1872/2009; Ekman, 1973/2015; Ekman & Friesen, 1971). In the fields of emotion psychology, affective neuroscience, and computer vision, such research often relies on datasets that comprise stimuli of pictures or videos of facial and bodily expressions of emotions. Yet, compared to the extensive existing research on the recognition of emotions from facial expressions (Byron et al., 2007; Elfenbein & Ambady, 2002; O'Boyle Jr et al., 2011; Rosete & Ciarrochi, 2005; Rubin et al., 2005; Scherer & Scherer, 2011; Walter et al., 2012; Zuskin et al., 2007), the bodily channel of emotional expression has received less empirical attention, despite important calls to extend emotion perception research to this domain (Aviezer et al., 2012; Bellot et al., 2021; de Gelder, 2006, 2009; Keck et al., 2022; McCarty et al., 2017; Vaessen et al., 2018). Less stimulus materials are available, and available datasets suffer from limitations (discussed in, e.g., Christensen & Calvo-Merino, 2013; Christensen & Jola, 2015; Smith & Cross, 2022). Besides, most full-body datasets of emotional expressions show actors performing different emotional actions; for instance, punching a fist in anger, sinking to the floor in sadness, jumping with joy (Atkinson et al., 2004; Crane & Gross, 2013; Dael et al., 2012; Gross et al., 2010). However, this approach likely measures the ability to recognize familiar prototypical actions indicative of different emotions, rather than perceptual sensitivity to emotional expressions (Shafir, 2016; Shafir et al., 2013). In order to avoid such confounding effects and investigate the manifestation of different emotions in one same human movement, a same-sequence approach can be a valid alternative, where a set of different movements are repeated several times, with each repetition corresponding to the expression of a different emotion. In other words, the expressor in every repetition performs the exact same reference movements, with the only source of intentional variability in the kinematics being due to the intended expression of a different emotion.
Therefore, walking, pointing, drinking, knocking, or throwing movements have recently been proposed as a valid movement alternative for emotion recognition and emotion perception research (Crane & Gross, 2007; Dekeyser et al., 2002; Heberlein et al., 2004; Krüger et al., 2018; Ma et al., 2006; Pollick et al., 2001; Roether et al., 2009; Vanrie & Verfaillie, 2004). Yet, such movements are rather simple and may be confounded with stereotypical assumptions about these movements. For example, in everyday life, walking patterns are typically associated with how much someone is rushing or with the existence of injuries. With the objective of reducing contextual cues from movement stimulus materials, and movement towards more varied patterns of movements, dance movements have recently been proposed as stimulus materials. Dance is, par excellence, an instance of emotionally expressive full-body movement (Boone and Cunningham, 1998; Boone and Cunningham, 2001; Christensen et al., 2019; Kirsch et al., 2016; Dittrich et al., 1996; Orgs et al., 2016; Van Dyck et al., 2017; Van Dyck et al., 2013; Van Meel et al., 1993). Dance choreographies can be created to be more varied than simple walking or throwing movements. Besides, compared to walking and throwing movements, using choreographies that are novel to participants in emotion perception research reduces possible familiarity effects. Besides, professional dancers are trained to express different emotional states with one same dance gesture and can therefore serve as models for stimulus materials (Christensen et al., 2019; Karin, 2016; Karin et al., 2016). It is relevant to note that we are not proposing that same-sequence stimuli should completely replace emotional-action-based stimuli in emotion research. We propose that these options are alternative stimuli materials and should be chosen depending on the research question. Here we focus on the advantage of same-sequence stimuli materials.
This same-sequence-different-emotional expressivity in dance movements is comparable to language. The same sentence can be spoken to sound either angry or happy to a listener, depending on how the expressor modulates their voice with their breathing and the muscles of their vocal tract (Bänziger et al., 2009; Scherer & Scherer, 2011; Scherer et al., 2017). How a dancer performs one same dance movement sequence, at several repetitions, with different emotional intentions, has previously been found to convey these intended emotional states to human observers, even to those without dance experience (Christensen et al., 2017; Christensen et al., 2019; Christensen et al., 2014; Christensen et al., 2023). In classical emotion-recognition tasks with dance movements, average recognition rates are, generally, above chance level, and vary between 18.04% (Christensen et al., major revisions), and 48% (Christensen et al., 2023; Smith & Cross, 2022).
With the advances of filming and motion-capture technologies of the past decades, new horizons have opened up for the digitization and analysis of full-body movement.
By combining these recent technologies with the same-sequence-different-emotion approach, the EMOKINE framework offers a novel route for emotion perception research with full-body movement. The stimuli design is based on previous datasets of dance movements that did, however, not include motion-capture technology (Christensen et al., 2023; Christensen et al., 2017; Christensen et al., major revisions; Christensen et al., 2019; Christensen et al., 2014; Smith & Cross, 2022).
Objectives
We had four objectives with EMOKINE:
-
(1)
To create a pilot dataset of simple dance movement sequences, with each sequence performed by a dancer six times, each with a different intended emotion out of a pool of six possible emotions – namely anger, contentment, fear, joy, neutrality, and sadness. The novelty of this work is that this same-sequence–different-emotion experimental design was complemented by time-resolved, whole-body kinematics data, measured by motion-capture technology. Please note that this dataset was created to enable the development of the EMOKINE suite and contains portrayals from only one dancer. To ensure generalizability in future research using the EMOKINE software, datasets should include portrayals from more than one dancer.
-
(2)
To render these pilot stimuli in four visual presentations for further study: (i) videos showing an avatar performing the movements, extracted from the XSENS propriety software (avatar videos); (ii) videos showing the dancer in full light, but with blurred face, to avoid emotion recognition from the face (full-light displays; FLDs); (iii) videos showing point-light displays (PLDs), which have been rendered with a Blender-based algorithm (Blender Community, 2018); and (iv) videos showing black-and-white silhouettes, where the movement has been extracted from a greenscreen background to show only a white silhouette moving in front of a black background (silhouette videos).
-
(3)
To compute a total of 32 statistics of 12 kinematic features, for the first time, in one single dataset with same-sequence stimuli materials and make the software freely available.
-
(4)
To obtain emotion-recognition judgments and beauty ratings from human participants for all the created stimuli mentioned above. The emotion-recognition judgments were obtained to validate the pilot dataset in terms of the intended emotional expression of the dancer. Beauty judgments were obtained to encourage the use of aesthetic judgment as an implicit emotion recognition task with future datasets. Previous research has shown that while emotion-recognition rates for a stimulus set may be low, significant differences can usually be found between stimuli intended to express different emotions (Christensen et al., 2023; Christensen et al., 2019), making aesthetic judgment (e.g., beauty, liking, etc.) an interesting implicit emotion-recognition task for the future.
Method
Ethical approval for the experiment was in place through the Umbrella Ethics approved by the Ethics Council of the Max Planck Society (Nr. 2017_12). For the observer experiment (performed online via the platform Prolific®; Peer et al., 2021; Stanton et al., 2022), informed consent was obtained from all participants, and was given online through a tick-box system. All methods were performed in accordance with the relevant guidelines and regulations.
Open science statement
In accordance with the framework for open and reproducible science (Munafò et al., 2017), all measures that we collected in the study are reported here.
Participants
Participant pilot dataset creation (one dancer)
One female former professional dancer with 20 years of ballet dance experience choreographed and performed the dance movement sequences.
Participants’ online experiment
In total, 172 participants (57 male, one other) participated in the human emotion-recognition task (mean age = 35.89 years, SD = 11.93, range, 18–65). From the original sample, 22 participants were excluded due to technical issues (the video stimuli did not play), or for not passing attention checks (on two of the emotion-recognition trials, cartoon videos were shown with very obvious emotional expressions (Sponge Bob crying a river of tears; correct response: sad; and Mikey Mouse’s head turning red and exploding; correct response: angry). The experiment took approximately 50 min and participants were paid via the Prolific platform (£8/h). Participants had an average of 1.5 years of hobby dance experience (SD = 5.05, range, 0–40). We had set the Prolific® filter to return only participants whose first language was English, to ensure complete comprehension of study instructions.
To determine the sample size, we used G*Power 3.1. (Faul et al., 2007). Because the stimuli of the EMOKINE dataset were presented to participants in four different types of visual presentations (avatars, full-light displays (FLDs), point-light displays (PLDs), silhouettes), there was a total of 216 stimuli. Rating this many stimuli could have led to participant fatigue. To avoid this, we opted for dividing stimuli randomly into four sets and determined the sample size for each group of participants for these four sets of stimuli. Subsequently, we confirmed that the percentage of correct responses given by participants in these four different groups to the stimuli was equivalent. We choose a threshold for a large effect size of d = .80 (Cohen, 1988) because large effect sizes indicate that the research finding has practical significance. We had initially planned to compare the percentage of correct responses between the four groups with an independent t test. As a result, the suggested sample size calculation for independent samples t test (effect size = .80; alpha = .05; power = .90) was 28 per group. However, we tested at least 30 participants on each of the four sets to ensure full randomization (30 can be divided by six emotions, 28 cannot). Due to technical difficulties, the final number of participants in each group was; group 1: N = 36; group 2: N = 32; group 3: N = 33; group 4: N = 31.
Materials
Hardware
Motion capture was performed by means of the MVN Link system (XSENS®, 2020, 2023). Motion capture in this context is the act of recording the motion through time of a set of landmarks (also called keypoints) that are representative of a full-body human pose. This technology has matured mainly via two different approaches: optical solutions, in which markers on the body allow to locate the keypoints, and inertial/magnetic solutions, in which a set of sensors is placed on the body. Both approaches have advantages and disadvantages, but it is generally understood that while optical systems provide very high positional precision, inertial systems are more robust, versatile, and provide more stable acceleration readings (Lim et al., 2016; Skogstad et al., 2011). We are using the latter method in the current research.
The XSENS® system combines inertial and magnetic sensors with advanced algorithms and biomechanical models to provide highly reliable and accurate readings with high spatio-temporal resolution (Schepers & Giuberti, 2018). This allows an optimal assessment of kinematic parameters for complex movement such as dance.
In its full-body configuration, the MVN Link system provides kinematic information from 23 keypoints via a set of 17 wireless sensors embedded on different parts of a spandex suit that fits the dancer’s body. This setup is designed to allow for highly free and complex motion.
The keypoints are called “segments” in the XSENS Manual (see Sections 7.2.5 and 15.4 for more details; XSENS Manual, 2020). For an overview of the 17 sensors and how they translate into information about the 23 keypoints, see Table 1 and Fig. 1 (reproduced from the XSENS manual, Section 15.4. (XSENS Manual, 2020), and the XSENS fact sheet about the biomechanical model; XSENS, 2023).
The XSENS® recording and filming took place in the ArtLab foyer of the Max Planck Institute for Empirical Aesthetics in Frankfurt am Main, Germany, in front of a standard 6 × 3m chroma-key greenscreen background (LTT Junior Truss system with Premium green Buehnenmolton). This allowed for the creation of additional visual preparations of the stimuli, such as silhouette videos. For this, dedo-stage lights (AX3 light drop; 15W RGBW CREE LED) were used to illuminate the entire greenscreen and to minimize shadows.
To produce additional visual presentations of the dataset (FLDs and silhouettes), the dancer was also filmed using a camera Canon EOS 5D Mark IC camera with a Canon EF 24–105 mm f/4 L IS USM lens (settings: e.g., framerate (raw) at 50 fps and framerate (output) at 25 fps. White balance: 5000k, shutter speed: 1/100 sec, and ISO: 400. Video format: H.264, aspect ratio: 16:9, and resolution: 1920 × 1080).
Postproduction of the video footage was done on a 15-inch MAC Book Pro (2017; Processor: 2.9-GHz Quad-Core Intel Core i7; Memory: 16 GB 2133-MHz LPDDR3; Graphics: Radeon Pro 560 4 GB; Intel HD Graphics 630 1536 MB.
Software
The XSENS® company provides proprietary software that allows calibrating and monitoring the setup during recordings, as well as reading the sensors with a framerate of up to 240 Hz, and also the ability to edit and export the recorded data in various formats including video, and to MVNX (a form of XML; see the XSENS Manual (2020), chapters 6 to 10, for more details). The MVNX provides raw sensor data, as well as refined readings for positions, accelerations, and angles of the full-body keypoints (for more details, see the XSENS Manual, 2020, chapter 15). This data format contains all essential information and is open, so it can be further processed without any proprietary restrictions.
The point-light display (PLD) stimuli were rendered using Blender (Blender Community, 2018), an open-source 3D rendering engine that allows flexible creation and editing of scenes, including positioning and configuration of camera viewpoints, and recording of sequential data into various formats (including video). Specifically, we developed a Python plugin that allows Blender to process the MVNX sequences and convert them into a Blender-friendly format, based on hierarchical relationships between different movements in a human skeleton (a process called “rigging”). We have released our MVNX rigging plugin to GitHub under GPLv3 license (see data availability statement, Section 11).
We also developed custom software to assist with the process of blurring faces, in the form of a series of Python scripts that make use of third-party, open-source deep learning models to first detect the dancer's head (Wu et al., 2019), and then identify the pixels that correspond to the face (more details are provided in the following sections). We have also released these scripts to GitHub under GPLv3 license, with the hope that they can be useful to the community (see data availability statement, Section 11).
Finally, the software Adobe After Effects 2019 and Adobe Premiere Pro 2019 were used for rendering the video clips in postproduction. The online survey tool Limesurvey® was used for the observer experiment, and the experiment was launched via the Prolific® online platform.
Overall procedure
The recording of the pilot stimuli was carried out by a team of five researchers and three filmmakers. The recording procedure followed the recommended standard practice by the XSENS Company (see the XSENS Manual, 2020); chapters 7 and 8), including body measurements and a calibration routine, before the start of each recording session. At the end of each calibration, the dancer was instructed which sequence to perform and which emotion to express, based on an a priori established list of choreographies and emotion orders. The dancer proceeded to say the name of the sequence and the emotion out loudly. Then, clapped twice to signal the beginning of the sequence and to secure the alignment of XSENS®, video and audio recordings. As specified in more detail in section "Stimuli (of the pilot dataset)", 12 separate choreographies were created for the EMOKINE pilot dataset (i.e., 12 different sequences of movements). Each of these 12 choreographies was then performed by the dancer six times, to express a different emotion at each repetition. Hence, each emotion was expressed once for each of the 12 choreographies. The order of the ‘emotion takes’ for each sequence was always: neutrality, joy, contentment, sadness, fear, and anger. If the dancer was not satisfied with the performance (e.g., made a mistake in the choreography, or felt that the emotion was not expressed), a second (or third) ‘emotion take’ was performed of the same movement until the dancer agreed with the performance. If the dancer made a mistake in the choreography, the stimulus was discarded without further analysis. If the dancer felt the emotion was not expressed, the “best” sequence of these duplicate takes was chosen by the dancer.
Subsequently, the recordings were rendered in the four different visual presentations. This coarse parameterization was devised in order to enable research into how emotion recognition and beauty ratings are affected by these four different levels of information detail in the representation of emotional kinematics.
Procedure for human observer experiment
Four separate experiments were set up to allow the 216 stimuli to be rated by four separate groups of participants. Stimuli were divided randomly, but equally between the experiments, to include the same number of stimuli of each type of visual presentation. To ensure that ratings would be equivalent across all four experiments, one sequence (i.e., 6 stimuli × 4 visual presentations = 24 stimuli) was selected and presented in all four experiments, for an interrater reliability check. For the order of stimuli presentation during the experiment, visual presentation was blocked, but stimuli and blocks were randomized across participants.
Participants watched the stimuli one by one, and rated, first, what emotion they recognized in the movement (forced choice task; anger, contentment, fear, joy, neutrality, sadness), and then, how beautiful they found the movement (Likert scale; 0 = not beautiful; 100 = very beautiful). Participants could only watch each stimulus once and were then asked to provide their rating. As this was an online experiment operated via Prolific®, viewing angle and distance were not controlled. However, filters on Prolific were set so that the experiment could only be performed on a computer desktop (not on a tablet or mobile phone). Figure 2 sets out one trial of the observer experiment.
After the four blocks (with breaks in between), participants were asked to fill in three questionnaires: the Aesthetic Responsiveness Assessment (AReA) (Schlotz et al., 2020), the Interpersonal Reactivity Index (IRI) (Davis, 1980), and demographics questions. The questionnaire data are not presented here.
Pilot dataset specifications
Stimuli (of the pilot dataset)
Originally, 12 dance sequences were created by the dancer. However, three of these were deemed not good enough by the dancer and therefore discarded before any further analysis, yielding a total of nine sequences included in the subsequent computations presented here. Each sequence was performed six times to express a different emotional intention at each repetition, namely, anger, contentment, fear, joy, neutrality, sadness; i.e., 9 sequences × 6 emotions = 54 emotional dance movement stimuli. In addition, for each sequence, the dancer did a seventh repetition of the sequence, during which she explained the movements while doing them, like an instruction video for a dance class; yielding nine explanation videos (used elsewhere; Schmidt et al., 2023), yet the videos are provided here as part of the full EMOKINE dataset). Therefore, the total number of stimuli in the EMOKINE dataset is 63 (see Fig. 3).
Seven of the nine sequences that had been choreographed involved only arm movements, while the lower body was held relatively still (only one step to the side). The two remaining sequences involved some side steps (“second-position” in the ballet syllabus). However, as a first step towards quantifying the kinematics of emotional expressivity in dance, the dancer kept most of the emotional expressivity in the EMOKINE dataset to the arms, following previous dataset creations (that did not use motion capture), which has focused on the arms only (Sawada et al., 2003) (see Table 2 for an overview of the choreographies of each sequence’s movements).
In post-production, all 63 pilot stimuli (the 54 emotional stimuli plus the nine explanation videos) were rendered in four different visual presentations (avatars, full-light displays (FLDs), point-light displays (PLDs), silhouettes). None of the stimuli contains facial information, there is no costume, color, or music in the clips. Each clip was faded in and out and contains one full dance phrase (eight counts in dance theory) (see Fig. 4 for the four visual presentations).
The EMOKINE pilot dataset is available for download online, and a selection of stimuli has been used in published work (Schmidt et al., 2023). However, please note that this dataset was created to enable the development of the EMOKINE software and contains portrayals from only one dancer. To ensure generalizability of results from future research using the EMOKINE software, new datasets should be created that include portrayals from more than one dancer. We hope that the details about the stimuli creation procedure set out above may help this endeavor.
Full-light displays (FLDs) with blurred face
For the full-light displays (FLDs) with blurred face, videos were rendered by importing them into Adobe Premiere Pro. The videos were trimmed to the start and end points of the movements with the help of a dancer (academic dance sequences have specific start and end points that are only detectable for the expert). Each clip was rendered into a separate file in an uncompressed format and the title was added, as specified verbally by the dancer during the recording. In this saving procedure, the sound track (ambient noise) of the clips was removed. Then, all rendered files were imported to Adobe After Effects. The “Keylight” effect was used to set the background to be a shade of grey.
Blurring the face required locating the pixels that correspond to the face, which can be a very time-consuming task if done manually for video datasets. To speed up the process, we developed rigging software for a semi-automated pipeline. Each video was split in consecutive, deinterlaced image frames that were processed separately. For each image, the Detectron2 deep learning model for human keypoint estimation was used (Wu et al., 2019). Since we only had a single person on static background, averaging any detected keypoints for the nose, eyes and ears provided a very robust estimation of the head position, and given that the dancer was always more or less centered and at the same distance from the camera, a fixed-length frame of 140 × 140 pixels around the estimated head position was used, in order to extract a patch containing the head. This allowed the face segmentation model by Nirkin and colleagues (2017) to produce a binary mask that accurately matched the actual face at pixel level. This binary mask was then translated from the head-patch back to the main image. The resulting masks for the whole dataset were then grouped and paired with the corresponding videos in order to blur the faces at the regions where the masks were active.
Silhouettes
To render the footage into silhouette dancer videos, all footage was imported into Adobe Premiere Pro as before. Here, the “Keylight” effect was added, and settings adapted to remove the background from each clip, and the “Level” effect (setting: output black = 255) was added to each clip to color the extracted foregrounds white (the visible dancer silhouette). “Opacity” keyframes were then added to the beginning and the end of each clip to allow for a fade-in and fade-out of each clip (eight frames). Finally, each clip was rendered as a separate file in H264 format (see Fig. 2).
Point-light displays (PLDs)
The point-light display (PLD) videos were created using the XSENS output data, MNVX (Blender Community, 2018; XSENS Manual, 2020). The MNVX file contains information about the skeleton (bone geometry and connections), and each “frame” (240 frames per second) contains kinematic information about the position and angle of the bones. We wrote a custom Blender plugin (Blender Community, 2018) that read each MVNX file and created a skeleton with the corresponding geometry and connections. Then, the module read the frame information and created an animation. Based on previous models for marker placement, namely the frontal view of the Plug-In Gait Model (Kainz et al., 2017; Piwek et al., 2016), we identified a series of key landmarks on the skeleton, and attached a white sphere to each landmark, in order to create the “light points” that convey the information about the movements. For the video rendering, we positioned a virtual camera “in front of” the PLDs with an angle, position and focal length that closely resembled the data obtained with the video camera (for an example of the result, see Fig. 3). Then, the skeleton was made transparent (making it black on a black background) and the spheres bright white (increasing the contract), allowing to extract the pixel-position of each point in the rendering. Videos were faded in and out.
XSENS avatar dancer
The XSENS® avatar dancers were extracted from the propriety software of the XSENS® system (XSENS Manual, 2020).
Data formats (of the pilot dataset)
Beyond the already-discussed four modalities for the stimuli, we include with the EMOKINE pilot dataset, several modalities of data records that we recommend to use in the future, as they may help with downstream tasks. The pilot files are available on Zenodo (see Section 11; Data Availability Statement), and consist of the following file formats MVNX, comma-separated values, and camara position (CamPos). Extensive details are provided in the readme files along with the data and software on Zenodo and GitHub.
MVNX files
We include the raw MVNX motion capture recordings, as produced by the XSENS® software for the pilot dataset.
Comma-separated values (CSV) files
To facilitate easy integration with other data analysis tools, we recommend converting a subset of the MVNX files into comma-separated value (CSV) files. For each sequence and emotion, we extract per-keypoint time series for position, orientation, velocity, angular velocity, acceleration, angular acceleration, center of mass and foot contacts. In the EMOKINE software package, we provide the script and instructions to perform this conversion.
Camera position (CamPos) files
While the positional data in the MVNX files is provided in a global three-dimensional frame of reference, the stimuli are rendered from a specific camera perspective. We use Blender to extract the positions of the bones and PLD spheres relative to the camera, as x/y/depth coordinates, where x goes from 0 (leftmost pixel) to 1 (rightmost pixel), y from 0 (bottom pixel) to 1 (top pixel) and depth is provided in meters. The result, dubbed here CamPos (for camera positions), is provided as JSON files containing the time series in frames at 60 Hz, where each frame contains the camera-relative positions. This can be useful for example in analyzing kinematic features from the perspective of the observer. In the EMOKINE software package, we also provide the script and instructions to produce these files.
Using the EMOKINE software with data obtained from other motion-capture systems
Although the EMOKINE pilot dataset was recorded via the XSENS system, most of the EMOKINE software provided here can be directly applied to data obtained from other motion capture systems with little to no modification. Specifically, only the files “1a_mvsn_to_csv.py” and “1b_mvnx_blender.py” are relevant to the MVNX formatted data. These data are then converted to tabular format (see GitHub repository for examples), which is then consumed by the remaining scripts (together with plain video data whenever needed, e.g., for the file “2b_face_blur.py”). Researchers intending to use this software with motion capture data from other systems simply need to ensure that their data follows the same tabular format, and that they have video data available whenever needed (e.g., for face blur or silhouette extraction).
Kinematic features
Making use of the Silhouette, MVNX, and CamPos data modalities, we compute a series of kinematic features. We extracted 32 statistics from 12 kinematic features. We group the extracted kinematic features in the following categories: Speed and Acceleration (speed, acceleration, angular speed and angular acceleration; Section "Speed and acceleration"), Expansion/Contraction (limb contraction, distance to center of mass; Section "Expansion/contraction"), Movement Activity (quantity of motion, QoM, ratio; Section "Movement activity"), Fluidity/Smoothness (dimensionless jerk (integral); Section "Fluidity/smoothness"), Body Tilt (head angle, with regards to vertical axis and with regards to back; Section "Body tilt"), and Space (convex hull 2d and 3D; Section "Space"). For each of these, we computed the per-sequence average, median absolute deviation (MAD) and maximum value as described above.
The resulting computed features for each sequence and emotion are provided in the EMOKINE dataset, and the script and instructions to compute them from the raw data are included in the EMOKINE software package. In Section "Validation of kinematic features and results of observer experiments", we demonstrate the usefulness of these features and the meaningfulness of the EMOKINE data through a series of quantitative and qualitative analyses.
Before we outline each kinematic feature in detail, we give an overview of the math behind the kinematic features. More formally, for each sequence \(s\left(t\right)\in \mathcal{S}\) (from the 63 total in the EMOKINE dataset \(\mathcal{S}\)), we have \(\{{K}_{i}^{\left(s\right)}\left(t\right){\}}_{i=1}^{12}\) nonnegative scalar features, where \(t\in \{0,\dots ,{T}_{s}\}\) indicates discrete time with a duration of \({T}_{s}\) frames. Thus, \({K}_{i}^{\left(s\right)}\left(t\right):\hspace{0.17em}\mathcal{S}\mapsto {\mathbb{R}}_{>0}^{{T}_{s}}\) is the kinematic feature of sequence \(s\) out of a total of 12 kinematic features, represented by a nonnegative vector of dimension \({T}_{s}\).
Some of the kinematic features were extracted directly from the MVNX files as provided by the XSENS software (see Section "Pilot dataset specifications" for more details), while others were extracted from the CamPos data (see Section 3.3.3.) and the silhouette stimuli videos (see Section "Silhouettes"). In the following sections, we describe in detail how kinematic features were extracted and/or computed. This information is summarized in Tables 3 and 4.
We aggregate each kinematic feature across time \(t\) to obtain a single scalar statistic that summarizes the kinematics of each sequence \(s\). The following aggregation techniques are used in multiple features:
-
Average: \({\overline{K} }_{i}^{\left(s\right)}:=\frac{1}{{T}_{s}}{\sum }_{t=1}^{{T}_{s}}{K}_{i}^{\left(s\right)}\left(t\right)\)
-
Median absolute deviation (MAD): \({\widetilde{K}}_{i}^{\left(s\right)}:=\underset{t}{{\text{median}}}\left(\underset{t}{| \mathit{median}}\left({K}_{i}^{\left(s\right)}\left(t\right)\right)-{K}_{i}^{\left(s\right)}\left(t\right)|\right)\)
-
Maximum:\({\widehat{K}}_{i}^{\left(s\right)}:=\underset{t}{{\text{max}}}\left({K}_{i}^{\left(s\right)}\left(t\right)\right)\)
Qualitatively, the \({K}_{i}^{\left(s\right)}\left(t\right)\) features tell us “how much” of a given feature is given at each timepoint. Then, the main difference between these three aggregations is their sensitivity to outliers: \({\widehat{K}}_{i}^{\left(s\right)}\) is the most sensitive, and \({\widetilde{K}}_{i}^{\left(s\right)}\) is the least sensitive. The average: \({\overline{K} }_{i}^{\left(s\right)}\) lies inbetween. Varying sensitivity to outliers is important if a sequence relies on punctual strong phenomena to convey crucial information (e.g., a short burst in velocity in a generally slow sequence will still have a large maximum), or conversely, to recover the underlying information in cases, where the sequence is exposed to outliers (e.g., if a movement is supposed to be smooth, but is slightly shaky, or contaminated with noise).
Speed and acceleration
Speed is one of the most frequently explored kinematic parameters, and research suggests that it plays a substantial role in an observer’s ability to distinguish between specific kinds of emotional expressivity. The majority of this work suggests that slow movements are associated with sadness, and in some cases with expressions of neutrality and fear, while fast movements are typically associated with happiness (or joy) and anger (Bernhardt & Robinson, 2007; Crane & Gross, 2007; Crane & Gross, 2013; Gross et al., 2010; Halovic & Kroos, 2018; Masuda et al., 2010; Montepare et al., 1999; Roether et al., 2009; Smith & Pollick, 2022).
Acceleration is less studied in relation to observer judgements of emotional expressivity in the kinematics literature. But in a study conducted by Sawada and colleagues, a similar pattern emerged across these movement features. Namely, they found that high acceleration in arm movements was associated with anger, and low acceleration was associated with sadness (Sawada et al., 2003). We here provide a series of speed and acceleration related features for to enrich emotional kinematics research in the future; speed (Section "Speed"), acceleration (Section "Acceleration"), angular speed (Section "Angular speed") and angular acceleration (Section "Angular acceleration").
Speed
Velocity is a vector provided by the MVNX system that points in a specific 3D direction, and speed is the “length” of the vector. This length tells us how fast is a given keypoint moving in that direction, in meters per second. More formally, if the position in meters of a given joint \(j\) in the 3D space at time \(t\) is:
Where \(x\) is aligned (and pointing to) the magnetic north, \(y\) is aligned (and pointing to) the west, and \(z\) is pointing up (for more details, see the XSENS Manual (2020; section 23.8). Then, the velocity \(v\) is the derivative of the position with respect to time (\({\nabla }_{t}\)), and our speed feature is the Euclidean norm of the velocity, i.e. \({\Vert {{\text{v}}}_{{\text{j}}}\left({\text{t}}\right)\Vert }_{2}\):
Note that, in discrete time, this quantity could be approximated by computing \(\frac{p\left(t+{\Delta }_{{\text{t}}}\right)-p\left(t\right)}{{\Delta }_{t}}\), where \({\Delta }_{{\text{t}}}\) is a small amount of discrete time (e.g., one frame), but in this case it is not necessary since it is provided directly by the MVNX file, and estimated by the XSENS system using a proprietary algorithm; see XSENS Manual (2020; section 23.8). We provide the average, MAD and maximum velocity for each joint \(j\) and sequence \(s\) in EMOKINE.
Acceleration
For each joint and timepoint we define the acceleration \({a}_{j}\left(t\right)\) as a three-dimensional vector, encoding the rate of change in the speed with respect to time. Our acceleration feature is then the Euclidean norm of that vector i.e., \({\Vert {a}_{j}\left(t\right)\Vert }_{2}\) with:
The acceleration vectors \({a}_{j}\left(t\right)\) are also estimated through the XSENS proprietary algorithm (XSENS Manual, 2020; section 23.8) and provided directly through the MVNX files. The acceleration is conceptually associated to the “force” applied to a joint. As with the speed feature, the Euclidean norm does not convey information about the directionality. We provide the average, MAD and maximum acceleration for each joint \(j\) and sequence \(s\) in EMOKINE.
Angular speed
Joints not only have positions, but they also have orientations. A joint can change position without changing orientation (e.g., walking with a straight head), and vice versa (rotating the neck while standing still). The angular speed focuses on the orientation: It measures the change of “angle” as a function of time, so instead of meters per second, we have radians per second. If a dancer is rotating a full circle per second, then the angular speed of their body would be \(2\pi\) radians per second (\(2\pi\) radians = 360 degrees).
In the XSENS system, each keypoint is considered the beginning of a segment (can be thought of as a "bone") with its own, local three-dimensional coordinate system. When the subject stands in T-pose, all local coordinate systems are aligned with the global system (see Fig. 92 in section 23.5 of the XSENS Manual, 2020). Then, the segment rotations follow the Z (flexion/extension), X (abduction/adduction), Y (internal/external) convention (see XSENS Manual, 2020; section 23.6, for exhaustive details). More formally, for each segment \(\upxi\), the orientation in radians \({\upomega }_{\upxi }\left({\text{t}}\right)\) is given as a three-dimensional vector in Euler representation, which varies as a function of time:
Then, the angular speed feature is the Euclidean norm of the derivative of \({\omega }_{\xi }\left(t\right)\) with respect to time, i.e., \(\dot{{{\text{w}}}_{\upxi }}\left({\text{t}}\right)={\Vert {\nabla }_{{\text{t}}}{\upomega }_{\upxi }\left({\text{t}}\right)\Vert }_{2}\), given in \(\frac{rad}{s}\). Like the rest of quantities presented so far, this quantity is estimated by the XSENS proprietary algorithm, and directly provided via the MVNX file. Analogously to the linear velocity previously discussed, the Euclidean norm retains the information about the amount of angular speed, but does not contain information about the specific directions. We provide the average, MAD and maximum angular velocity for each joint \(j\) and sequence \(s\) in EMOKINE.
Angular acceleration
Analogously to the case of linear acceleration, angular acceleration is the second derivative of angle with respect to time, i.e., \(\ddot{{w}_{\upxi }}\left(t\right)={\Vert {{\nabla }^{2}}_{t}{\omega }_{\xi }\left(t\right)\Vert }_{2}\). Like the rest of quantities presented so far, this quantity is estimated by the XSENS proprietary algorithm, and directly provided via the MVNX file. The Euclidean norm retains the information about the amount of angular acceleration but does not contain information about the specific directions. We provide the average, MAD, and maximum angular velocity for each joint \(j\) and sequence \(s\) in EMOKINE.
Expansion/contraction
Body expansion and contraction is another kinematic feature of movement commonly explored in emotion perception research. However, unlike with speed, the results in this area do not present such a clear pattern of associations, likely because many datasets used in this area focus on emotional actions (instead of the same-sequence approach proposed in EMOKINE).
Of the available literature, most research in this area seems to agree that expansion is associated with happiness or joy, and some suggest that it also leads to the perception of anger (Camurri et al., 2003; Gross et al., 2010; Gross et al., 2012; Masuda et al., 2010; Montepare et al., 1999; Shafir, 2016; Shikanai et al., 2013; Wallbott, 1998). Castellano and colleagues’ study is a notable exception in that they found anger to be associated with contraction instead (Castellano et al., 2007). However, contraction is more commonly noted to align with the perception of fear and sadness (Camurri et al., 2003; Masuda et al., 2010; Shafir et al., 2016; Shikanai et al., 2013; Wallbott, 1998), and in some cases with neutral expressivity (Montepare et al., 1999). For body expansion/contraction, the EMOKINE framework includes limb contraction (Section "Limb contraction") and distance to center of mass (Section "Distance to center of mass (CoM)").
Limb contraction
Limb contraction regards the positions of five keypoints: head, right hand, left hand, right toe and left toe (Poyo Solanas et al., 2020). Respectively: \(\left({p}_{h}\left(t\right),{p}_{a}\left(t\right),{p}_{b}\left(t\right),{p}_{c}\left(t\right),{p}_{d}\left(t\right)\right)\). Then, at each timepoint , this metric consists in the mean Euclidean distance between each of the four extremity endpoints and the head, i.e.,
This metric is a proxy for body contraction, with the idea that contracted bodies tend to have shorter distances between the limb endpoints and the head, while expanded poses tend to have longer distances. We provide the average and MAD limb contraction for each joint \(j\) and sequence \(s\) in EMOKINE.
Distance to center of mass (CoM)
At each timepoint \(t\), and together with the joint positions \({p}_{j}\left(t\right)\), the XSENS system also estimates and retrieves the position of the person's center of mass \(\upmu \left(t\right)\), in meters, which is a “weighted average” among all the points in the body, thus representing the idea of its “central point”. For each sequence in EMOKINE and for each keypoint \(j\), we compute the CoM distance between each keypoint position \({p}_{j}\left(t\right)\) and the CoM as follows:
We then compute and retrieve the average and MAD.
Movement activity
Movement activity has been examined in a number of ways in the kinematics literature depending on the particular movement stimuli used. For example, in studies exploring the kinematics of walking motions it is typically measured via step frequency (e.g., Crane & Gross, 2007). More often, however, it is quantity of motion that is assessed. Research in this area, as with speed, suggests that high movement activity is typically associated with the portrayal of joy and anger and low movement activity is associated with sadness, fear, and sometimes neutral expressivity (Bernhardt & Robinson, 2007; Camurri et al., 2003; Crane & Gross, 2007; Crane & Gross, 2013; Gross et al., 2010; Halovic & Kroos, 2018; Masuda et al., 2010; Montepare et al., 1999; Roether et al., 2009; Shikanai et al., 2013). Wallbott (1998) is an exception to this, in that they found variations in movement activity to distinguish between happiness with different levels of intensity; high activity was associated with “elated joy”, but the more general “happiness” was associated with low movement activity. Based on this research, the EMOKINE framework includes Quantity of Motion (QoM) as a measure of movement activity (Section "Quantity of motion (QoM)").
Quantity of motion (QoM)
Unlike the other quantities presented so far, the QoM is not extracted from the MVNX sequential data. Instead, the input are silhouette videos at 25 fps, where each frame is a Boolean matrix \({\text{f}}\left({\text{t}}\right)\in \{\mathrm{0,1}{\}}^{1080\times 1920}\), with pixel values of 0 corresponding to the background, and values of 1 to the dancer. The QoM is a time-dependent feature that can be intuitively understood as a ratio between how much has the silhouette moved in the recent past, in proportion to how big the silhouette is right now (Castellano et al., 2007). More formally, given a time span of \(\updelta\) frames, we make use of the boolean operations of pixel-wise union (\(\vee\)), intersection (\(\wedge\)), negation (\(\neg\)) and sum (\(|\cdot {\left.\right|}_{1}\)) to define the QoM \(q\left(t\right)\) as follows:
The Boolean array \({Q}_{\updelta }\left(t\right)\in \{\mathrm{0,1}{\}}^{1080\times 1920}\), also called the silhouette motion mask, is active for the pixels that were active immediately before the current time and are not currently active. Thus, it encodes the “recent activity”: it is all zeros if there is no movement, and it contains more active pixels as movement increases.
Then, the QoM is the ratio between the sum of active pixels in \({Q}_{\updelta }\left(t\right)\) and the sum of currently active pixels. Note that the QoM is a full-body quantity and does not depend on a given joint. Therefore, for each sequence, we provide only one scalar average, MAD, and integral QoM. While the average QoM is trivially the integral divided by the number of frames, we included both here for convenience, since the integral depends on the sequence length and is a quantity of interest in the literature.
Fluidity/smoothness
Comparatively, there is less research examining the role of movement fluidity in the perception of emotional expressivity, but there does exist some evidence to suggest that movement fluidity is associated with happiness or joy, while stiff or low-fluidity motion is associated with anger (Montepare et al., 1999) and other negative valence emotions like grief and fear (Camurri et al., 2003). For the EMOKINE framework, we computed dimensionless jerk (integral), as a measure of movement fluidity/smoothness (Section "Dimensionless jerk (integral)").
Dimensionless jerk (integral)
Based on Hogan & Sternad (2009), this feature proposes a variation of the jerk, which is the time-derivative of acceleration (Hogan & Sternad, 2009):
In order to quantify smoothness in trajectories from \({t}_{1}\) to \({t}_{2}\) the integral of the square jerk is typically used in the literature, i.e., \({\int }_{{t}_{1}}^{{t}_{2}}\dot{{a}_{j}}{\left(t\right)}^{2}\). One major problem with this, as pointed out by Hogan & Sternad (2009), is that this yields a high-order polynomial unit of \(\frac{lengt{h}^{2}}{tim{e}^{5}}\) being sensitive to noise and changes in scale. They propose a dimensionless variant, where the integral is multiplied by \(\frac{{\left({t}_{2}-{t}_{1}\right)}^{5}}{{\Delta }_{p}^{2}}\) , where \({{\varvec{\Delta}}}_{{\varvec{p}}}\) is the extent of the length achieved between \({t}_{1}\) and \({t}_{2}\) (i.e., if in a sequence the individual moves more, \({{\varvec{\Delta}}}_{{\varvec{p}}}\) will be larger for that sequence). The result yields a notion of movement smoothness that is normalized against duration and size, and (by design) it is also void of any units, hence the dimensionless characterization. For each sequence s, we provide the dimensionless jerk between \({t}_{1}=0\) and \({t}_{2}={T}_{s}\), i.e.:
Note that this feature is full-body and does not depend on specific joints.
Body tilt
Most of the research exploring tilt or angularity within body motion or positioning has focused on the head. There is compelling evidence to suggest that a downward or forward-tilted orientation of the head is uniquely associated with the portrayal of sadness (Crane & Gross, 2007; Masuda et al., 2010; Shafir et al., 2016; Wallbott, 1998), and these works suggest other patterns that may emerge. Wallbott (1998) suggests that having the head oriented backwards is associated with elated joy, raised shoulders are associated with both elated joy and hot anger, and raised arms are associated with each of the following: elated joy, cold anger, hot anger, and terror. Masuda et al. (2010) found that reclined posture is associated with pleasure, and that a mixture of reclined and straight posture is associated with relaxation. Shafir and colleagues (2016), on the other hand, found that a reclined tilt to the upper body is associated with fear. For the EMOKINE framework, we computed the head tilt with regards to the back (Section "Head tilt with respect to back") and with regards to the vertical axis (Section "Head tilt with respect to vertical").
Head tilt with respect to back
For the computation of this kinematic feature, we consider the three-dimensional positions for three keypoints: T8 vertebra, neck and head, dubbed here \({p}_{a}\left(t\right)\), \({p}_{b}\left(t\right)\), and \({p}_{c}\left(t\right),\) respectively. Then, we define the unit vectors going from T8 to the neck, and from neck to head, as:
Then, the head tilt with respect to the back \(\mathrm{\alpha }\left(t\right)\) is the angle between \({u}_{ab}\left(t\right)\) and \({u}_{bc}\left(t\right)\), which can be computed as: \(\alpha \left(t\right)=co{s}^{-1}\left({u}_{ab}{\left(t\right)}^{\mathrm{\top }}{u}_{bc}\left(t\right)\right)\), in radians, since the dot product between 2 unit vectors yields their cosine. This cosine is always non-negative, since we do not expect an angle larger than 90 degrees. We provide the average and MAD of \(\alpha \left(t\right)\) across time.
Head tilt with respect to vertical
This feature is similar to the head tilt with respect to the back, but instead we measure the angle between \({u}_{bc}\left(t\right)\) and the global vertical:
This yields our desired feature, in radians: \(\beta \left(t\right)=co{s}^{-1}\left({u}_{\uparrow }{\left(t\right)}^{\mathrm{\top }}{u}_{bc}\left(t\right)\right)\). We also provide the average and MAD as aggregation statistics.
Space
The kinematic features in relation to space, in the context of this paper, refer to the physical area used by the dancer when performing the movement sequences. Previous research exploring space in this way as a kinematic parameter suggests that large travel distances (i.e., using a high proportion of the movement space) is associated with joy, and low travel distances (i.e., using a low proportion of the space) is associated with sadness (Sawada et al., 2003). It has also been suggested that movement in a variety of directions (i.e., a wider range in the movement space used) is associated with anger (Masuda et al., 2010). As measures of space within the EMOKINE framework, we provide the convex hull 2D (Section "Convex hull 3D") and 3D (Section "Convex hull 2D").
Convex hull 3D
Given the \({p}_{j}\left(t\right)\) locations for all body keypoints \(\mathcal{J}\) at a given time \(t\), the convex hull is the smallest convex envelope that contains all points. For example, if a person is extending their arms and legs in a t-pose and we take a frontal image of their keypoints, the corresponding 2D convex hull would be a convex polygon going from the head, to the hands, then from each hand to its respective foot, and then connecting the feet. In 3D, it follows the same principle but it also takes depth into account, yielding a convex polytope, where the vertices are the \({p}_{j}\left(t\right)\) keypoint positions. The convex hull can be used as a proxy for how much space is the person effectively occupying.
Formally, given the set of all keypoints \(\mathcal{J}\), the 3D convex hull can be defined as the set (Boyd & Vandenberghe, 2004):
And the \({c}_{3D}\left(t\right)\) feature we provide is the volume of \({\mathcal{C}}_{3\mathcal{D}}\left(t\right)\), in cubic meters \({m}^{3}\) (since the MVNX input positions \({p}_{j}\left(t\right)\) are given in meters). We used the SciPy Python library to compute this feature. Apart from the average and MAD aggregated statistics, we provide the following 2 aggregations:
• Global convex hull \({c}_{3D}\left(1,\dots ,{T}_{s}\right)\): This is the convex hull obtained from all points in a given sequence (as opposed to a specific timepoint), i.e., it covers all the locations where any keypoints has been at any time.
• Union of convex hulls \({\bigcup }_{t=1}^{{T}_{s}}{c}_{3D}\left(t\right)\): The main difference with the global convex hull, is that the union of convex hulls is a subset of the global convex hull, and is not necessarily convex: if the dancer jumps from the bottom left corner of the screen to the bottom right corner, the bottom center of the screen is part of the global convex hull, but not of the union of convex hulls. The reason is that there is no single timepoint where the bottom center is being covered, but if we consider all timepoints at once, we need to connect the left and right corners through the bottom center, thus making it part of the global convex hull.
Convex hull 2D
We compute this feature analogously to the 3D convex hull described before. The difference for this version is that we use the CamPos two-dimensional coordinates as source for this feature, i.e., two-dimensional vectors \({\uprho }_{j}\left(t\right)\in {\left[\mathrm{0,1}\right]}^{2}\) given in coordinates relative to the camera, going from \(\left(\mathrm{0,0}\right)\) for the bottom-left corner to \(\left(\mathrm{1,1}\right)\) for the top-right corner (i.e., it is a dimensionless ratio). The resulting convex hull is then a convex polygon (Boyd & Vandenberghe, 2004):
And the computed feature \({c}_{2D}\left(t\right)\) is the surface of this polygon. Since the horizontal and vertical CamPos coordinates go from 0 to 1, the result is itself a (dimensionless) ratio between 0 and 1, telling how much of the total screen is covered by the convex hull of the dancer.
As with \({c}_{2D}\left(t\right)\), we provide four aggregations: average, MAD, global convex hull and union of convex hulls. We compute the surface of the 2D convex hull using the Python Shapely library.
Validation of kinematic features and results of observer experiments
A series of validations of our kinematic and observer data are provided in this section. First, we provide a series of illustrations of the distributions of our kinematic features across the stimuli of the dataset. This is to compare the alignment of the stimuli between each other, to ensure that they are equivalent (e.g., between different visual presentations), and that differences emerge, where differences were expected (e.g., between different intended emotional expressions) (Section "Computational tests"). Second, we provide statistical tests of the observer ratings. In particular, we test that emotion recognition is above chance level (i.e., that the stimuli transmit the emotions that they were intended to transmit by the dancer), for all visual presentations, and we confirm that the beauty ratings vary as a function of the intended emotions by the dancer (Section "Results of observer experiments").
Computational tests
We here provide a series of illustrations of computational validations, using either silhouette-dependent images or the keypoint-dependent data (that we retrieved from the XSENS® sensors) from our stimulus set. We use:
-
foreground statistics to show that stimuli are balanced and within frames (Section "Foreground statistics").
-
qualitative histograms (silhouettes) to show that stimuli are aligned with each other (Section "Qualitative histograms (silhouette-dependent kinematics)").
-
kinematic histograms to show that the stimuli and features yield meaningful signal, not random noise (Section "Qualitative histograms (keypoint-dependent kinematics)").
Foreground statistics
The foreground statistics describe the distribution of space occupied by the dancer (= the foreground) across videos. Using the silhouette images of the stimuli, we see a very homogenous distribution of the foreground throughout all videos for our three metrics (foreground ratios, and camera position limits for horizontal and vertical distributions). These results indicate that the videos of the stimuli set are equivalent in terms of foreground distribution (see Fig. 5 for illustration and a short description of the findings).
Qualitative histograms (silhouette-dependent kinematics)
We use qualitative histograms (silhouette images of the videos) to show that the stimuli are aligned with each other. We computed the frequencies of frames occupied for the silhouettes, convex hull, point-light displays (PLDs), and (D) the avatar stimuli, which all depend on silhouette images. The Silhouette occupies most space, then convex hull, avatar, PLDs. The histograms show a nice alignment across all four modalities. However, they also show a defect of the avatar (extracted from XSENS® software) – the software is automatically adjusting the camera position. In videos 7–9, the dancer turns the upper body, which is corrected by the software so the camera position remains frontal throughout. This results in the very symmetric histograms in that column, while the other columns show movement also to the sides. This means that as long as no turns are in the movement, the four modalities are aligned in terms of the frequencies of frames occupied by the dancer in space (see Fig. 6 for illustration and a short description of the findings for two of the nine sequences; sequence 1 and 7). The illustrations for the remaining sequences are in the appendix of the paper (see Figs. 15, 16, 17, 18, 19).
We use another series of qualitative histograms to explore visually the kinematics of average limb contraction, mean head angle (with respect to vertical and back), average quantity of motion (QoM), mean convex hull 2D and mean convex hull 3D. We plotted these separately for each emotion (angry, content, fearful, happy, neutral, sad), across all videos (see Fig. 7 for illustration and a short description of the findings).
Qualitative histograms (keypoint-dependent kinematics)
For the keypoint-dependent kinematics (retrieved from the XSENS® sensors), we provide illustrations for the distance to the center of mass (Fig. 8) and average acceleration (Fig. 9). The figures show histograms for each of the 23 keypoints across the six emotions (angry, content, fearful, joy, neutral, sad). The distance to center of mass figure shows that the further away from the center of mass the keypoint is (pelvis = very close; hands = very far), the distribution changes. We also see that the legs remain relatively stable throughout, which is in accordance with the choreographies set out in Table 2 above: the movements were mostly confined to the arms, with little leg movement. Hence, these distributions confirm the choreographies. We also observe that the distributions vary across emotions especially for the arms, which again is in accordance with the intention of the dancer during stimuli creation, where the intention was to confine the expressivity mostly to the arms (see Figs. 8 and 9 for illustrations and a short description of the findings).
The remaining keypoint-dependent kinematics are provided in the appendix; for dimensionless jerk, see Fig. 20, for angular acceleration, see Fig. 21, and for angular velocity, see Fig. 22.
Results of observer experiments
As described in Sections "Participants’ online experiment" and "Procedure for Human Observer Experiment", human observers performed an emotion recognition task and an aesthetic judgment task on all pilot stimuli. We here provide a technical test to ensure that:
-
Intended emotional expression was recognized above chance level (Sections "Chance level analysis: Visual presentation", "Chance Level Analysis: Emotion Category", and "Chance Level Analysis: Visual Presentation x Emotion Category").
-
Beauty ratings depended on the intended emotional expression of the dancer (Section "Observer experiment beauty ratings").
Chance level analysis: Visual presentation
Chi-square t tests were used to determine whether observer recognition rates were above chance for the four visual presentations (four levels; avatars, full-light displays (FLDs), point-light displays (PLDs), silhouettes; chance level: 100%/6 emotion categories = 16.67%). Results showed that the stimuli of all four visual presentations had been recognized above chance level (all ps < .001).
To explore whether there was a difference between the four visual presentations in terms of emotion recognition accuracy, we performed a 1 x 4 RM ANOVA with the factor visual presentation (four levels; avatars, FLDs, PLDs, silhouettes), and the dependent variable percent of correct responses (‘correct responses’ = when observers guessed the emotion that the dancer was intending while performing the movements). There was a main effect of visual presentation (F(3,393) = 21.352, p < .001, partial η2 = .140). Estimated marginal means showed that FLD videos were recognized best (EMM = 39.82%; SE = 1.12), followed by avatar videos (EMM = 36.45%; SE = .98), then silhouette videos (EMM = 36.20%; SE = 1.10), and finally PLDs (EMM = 30.77%; SE = .98). Bonferroni corrected pair-wise comparisons showed that some of these differences were significant. FLDs were recognized above all others (all ps > .018). Avatars and silhouettes were recognized equally well (p = 1.00), and emotions expressed in the PLDs were recognized below all other visual presentations (all ps > .001). These results are illustrated in Fig. 10.
Chance level analysis: Emotion category
Chi-square t tests were used to determine whether observer recognition rates were above chance for the six emotion categories (chance level: 100% / 6 emotion categories = 16.67%). The stimuli of the emotion categories anger, content, joy, neutral, sad (regardless of visual presentation) had been recognized above chance level (all ps > .014), while fear had not (p = .645).
To explore whether there were differences between recognition rates as a function of the different emotion categories intended by the dancer, we performed a 1x6 RM ANOVA with the factor emotion category (six levels; anger, content, fear, joy, neutral, sad), and the dependent variable percent of correct responses (‘correct responses’ = when observers guessed the intended emotion). There was a main effect of emotion category (F(5,655) = 99.457, p < .001, partial η2 = .432). The observer recognition rates were highest for angry videos (EMM = 53.91%; SE = 1.98) > neutral videos (EMM = 49.432%; SE = 2.17) > sad videos (EMM = 48.04%; SE = 1.86) > joyful video (EMM = 26.39%; SE = 1.32) > content videos (EMM = 19.76; SE = 1.24) > fearful videos (EMM = 17.30%; SE = 1.36). Bonferroni corrected pairwise comparisons showed that recognition rates for angry, neutral and sad stimuli were highest and did not differ between each other (all ps < .227) but differed significantly from all other emotions (all ps > .001). Further, emotion recognition rates for joyful videos was higher than for fearful videos (p > .001). All comparisons are illustrated in Fig. 11.
Chance level analysis: Visual presentation x emotion category
For each visual presentation, Chi-square t tests were used to compare emotion recognition rates for each emotion against the chance level of 16.67%. For the avatars, all emotions were recognized above chance (all ps > .050), except for fear (p = .363). For the FLDs, all emotions were recognized above chance (all ps > .016), except for content (p = .071). For the PLDs, emotions were recognized above chance (all ps > .001), except for content (p = .226), fear (p = .315), and joy (p = .898). For the silhouettes, emotions were recognized above chance (all ps > .050), except for fear (p = .643). For an overview of the results see Fig. 12.
Confusion matrices emotion ratings
Four confusion matrices were computed, one for each visual presentation. They represent the observers’ emotion judgments as a function of intended and decoded emotion. The advantage of confusion matrices is that the ‘confused’ responses (i.e., the wrong emotion judgments) for a stimulus can be compared across all emotion categories at a glance (see Banse & Scherer, 1996; Scherer & Scherer, (2011), for a detailed explanation). These matrices are set in Tables 3 and 4 for avatars, in Table 5 for FLDs, in Table 6 for PLDs and in Table 7 for silhouettes.
Observer experiment beauty ratings
A 1 × 4 RM ANOVA was conducted with the factor visual presentation (four levels; avatars, full-light displays (FLDs), point-light displays (PLDs), silhouettes). The dependent variable was ‘Beauty rating’ on a scale from 0 (not beautiful) to 100 (very beautiful). There was a main effect of visual presentation (F(3,393) = 35.336, p < .001, partial η2 = .212). Estimated marginal means (EMM) showed that silhouette stimuli were rated as most beautiful (EMM = 54.35; SE = 1.33), followed by FLD stimuli (EMM = 53.91; SE = 1.33), then avatar stimuli (EMM = 50.17; SE = 1.34), and, finally, PLDs (EMM = 47.93; SE = .93). Bonferroni corrected pair-wise comparisons showed that FLDs and silhouette beauty ratings did not differ significantly between each other (p = .280), but all other comparisons were significant (all ps > .001). FLDs and silhouettes being rated as more beautiful than avatars and PLDs, PLDs being rated the least beautiful (see Fig. 13).
A 1 × 4 RM ANOVA was conducted with the factor emotion (six levels; anger, content, fear, joy, neutral, sad). The dependent variable was ‘beauty rating’ on a scale from 0 (not beautiful) to 100 (very beautiful). There was a main effect of emotion (F(5,655) = 32.562, p < .001, partial η2 = .199). Descriptively, the observers’ beauty ratings were highest for sad stimuli (EMM = 54.46; SE = 1.33) > content stimuli (EMM = 53.23; SE = 1.31) > joyful stimuli (EMM = 52.46; SE = 1.30) > fearful stimuli (EMM = 52.10; SE = 1.33) > anger stimuli (EMM = 49.32; SE = 1.31) > neutral stimuli (EMM = 48.01; SE = 1.31). Bonferroni corrected pair-wise comparisons showed that there were significant differences in beauty ratings between most categories (all ps > .001, except for the comparison joy-sad, which was p = .017; Bonferroni corrected). There was no significant difference between anger and neutral, nor between fear and joy. Also, the beauty ratings to the emotion contentment did not differ from those for fear, joy, and sadness (see Fig. 14 for an illustration of these results).
Discussion and conclusion
We provide the EMOKINE software, computational framework, and pilot dataset of emotional movement for research in experimental psychology, affective neuroscience and computer vision. The key contribution of the project to the wider community is a computational framework comprising a detailed plan, software, and code for creation of highly controlled emotional body movement datasets at scale in the future. Comprehensive procedure instructions and kinematic feature extraction code are provided via releases on GitHub. The pilot dataset and its renderings into four different visual presentations (avatars, full-light-displays, point-light-displays, and silhouettes), along with observer ratings and the kinematic data are available on Zenodo under a Creative Commons Attribution 4.0 International License.
A series of computational validations and an observer experiment confirmed the validity of the EMOKINE pilot dataset and the creation procedure. Besides these validations provided here, the dataset has been shown to be useful to assess research questions in health psychology, for example, about how dance breaks during work hours may improve mood and motivation (Schmidt et al., 2023). Datasets created following the EMOKINE suite may be particularly useful for addressing questions about which kinematic features drive high emotion recognition and/or misclassifications. Yet for future large scale experiments, we remind researchers that the EMOKINE pilot dataset was only created with a single dancer and only contains nine movement sequences. For generalizability and scaling up, creating datasets with several dancers as models and more sequences would be advisable.
Foreground statistics showed that stimuli were balanced and within frames, qualitative histograms (silhouettes) confirmed that stimuli were aligned with each other, kinematic histograms indicated that the stimuli and features yielded meaningful signals, not random noise.
The observer experiment confirmed that emotional expression was recognized above chance level in the pilot dataset. Emotion recognition was highest for full-light displays (FLDs), than for all other visual presentations, while point-light displays (PLDs) had the lowest emotion recognition rates. Observer recognition rates for avatars and silhouettes were higher than recognition rates of PLDs. Observer recognition rates for avatars and silhouettes did not differ between each other. With regards to the emotion categories, observer emotion recognition rates were highest for angry videos (> neutral videos > sad videos > joyful video > content videos > fearful videos). We present confusion matrices, one for each visual presentation, which represent the observers’ emotion judgments as a function of intended and decoded emotions. Confusion matrices allows to compare ‘confused’ responses (i.e., the wrong emotion judgments) for a stimulus across all emotion categories at a glance, following previous work in the field (Scherer & Scherer, 2011). With regards to aesthetic judgment, observer beauty ratings were highest for silhouette stimuli followed by FLDs, followed by avatar stimuli, and, finally, PLDs. Furthermore, aesthetic judgment was highest for sad stimuli (> content stimuli > joyful stimuli > fearful stimuli > anger stimuli > neutral stimuli).
Pilot stimuli intended to express fear were hardest to recognize for observers, and average recognition across fearful stimuli was not above chance for any of the visual presentations except in the FLDs; an effect that has previously been reported with other stimuli sets (Atkinson et al., 2007; Camurri et al., 2003; Christensen et al., 2023; Christensen et al., major revisions; Dahl & Friberg, 2007; Pasch & Poppe, 2007; Smith & Cross, 2022). Contentment, the emotion added to this dataset was recognized above chance in the visual presentations avatar and silhouettes, but surprisingly not in FLDs and PLDs.
Importantly, as described in previous work, also in the EMOKINE pilot dataset, aesthetic judgment (i.e., beauty) differed significantly between all emotions. This adds another datapoint to previous findings that suggest that aesthetic judgment can be an implicit emotion recognition task (Christensen et al., 2019; Christensen et al., 2023; Christensen et al., major revisions).
Creating future datasets based on the procedure set out above has three main advantages. First, as shown with the EMOKINE pilot dataset creation procedure, we propose to use complex movements. A dancer repeated several choreographies six times each, maintaining the same movements, but expressing different emotional intentions at each repetition. Traditionally, emotional ‘actions’ are often used in emotion datasets (e.g., jumping of joy or recoiling in fear), which makes the emotion rather obvious. For EMOKINE, the dancer used exactly the same dance choreography to express six different emotional intentions, thus, increasing the usefulness of the dataset to assess subtle kinematic features in emotional movement that is not emotional actions. Second, the EMOKINE dataset creation procedure proposes to include more emotional intentions. Here, we included six emotional intentions, namely anger, contentment, fear, joy, neutrality, and sadness. Classically, datasets rarely contain the emotion contentment (Ekman, 1973/2015; Ekman & Friesen, 1971), which increases the usefulness of EMOKINE. ‘Contentment’ is another positively valenced emotion like joy, yet of low arousal; symmetrical to what anger (negative valence, high arousal) is to sad (negative valence, low arousal). Third, the EMOKINE software provides, for the first time, thirty-two statistics from twelve kinematic features that can be obtained from one same dataset, namely, speed, acceleration, angular speed, angular acceleration, limb contraction, distance to center of mass, quantity of motion, dimensionless jerk (integral), head angle (with regards to vertical axis and to back), and space (convex hull 2D and 3D). Average, median absolute deviation (MAD) and maximum value were computed for each.
Future iterations of the dataset creation plan may take into account that the four visual presentations were not parametrically varied, but could be, using the kinematic data to vary the visual presentation of the stimuli, and also to control the exact length of the stimuli. Further, the XSENS avatar rendering was not 100% overlapping with the other visual presentations because the positioning of the legs was not fixed by the software, causing the avatar to move slightly unnaturally with less common arm and leg movement combinations. Finally, we acknowledge the WEIRD focus of this dataset creation, and suggest exploring non-Western dance with the same procedure as e.g., Christensen et al. (major revisions).
The EMOKINE software and pilot dataset is the outcome of a proof-of-principle dataset creation procedure for highly controlled kinematic video datasets of emotionally expressive full-body movement sequences. The pilot data for EMOKINE was recorded via the XSENS® system, however, with small alterations that we have outlined above and on the GitHub repository, the software can be used with data obtained from other motion capture systems too. The novelty of EMOKINE lies in the successful integration of the experimental control requirements for psychology and affective neuroscience research involving human participants and, simultaneously, ensuring the technical intricacies required for datasets in computer vision and related fields.
Data availability
The dataset and all materials can be downloaded from Zenodo: https://zenodo.org/record/7821844 (Zenodo DOI: https://doi.org/10.5281/zenodo.7821844 ). The software is available on GitHub: https://github.com/andres-fr/emokine Comprehensive Readme files accompany the data and software. None of the experiments was preregistered.
References
Atkinson, A. P., Dittrich, W. H., Gemmell, A. J., & Young, A. W. (2004). Emotion perception from dynamic and static body expressions in point-light and full-light displays. Perception, 33(6), 717–746. https://doi.org/10.1068/p5096
Atkinson, A. P., Tunstall, M. L., & Dittrich, W. H. (2007). Evidence for distinct contributions of form and motion information to the recognition of emotions from body gestures. Cognition, 104(1), 59–72. https://doi.org/10.1016/j.cognition.2006.05.005
Aviezer, H., Trope, Y., & Todorov, A. (2012). Body cues, not facial expressions, discriminate between intense positive and negative emotions. Science, 338(6111), 1225–1229. https://doi.org/10.1126/science.1224313
Banse, R., & Scherer, K. R. (1996). Acoustic profiles in vocal emotion expression. Journal of Personality and Social Psychology, 70(3), 614–636. https://doi.org/10.1037/0022-3514.70.3.614
Bänziger, T., Grandjean, D., & Scherer, K. R. (2009). Emotion recognition from expressions in face, voice, and body: The Multimodal Emotion Recognition Test (MERT). Emotion, 9(5), 691–704. https://doi.org/10.1037/a0017088
Bellot, E., Garnier-Crussard, A., Pongan, E., Delphin-Combe, F., Coste, M.-H., Gentil, C., Rouch, I., Hénaff, M.-A., Schmitz, C., Tillmann, B., & Krolak-Salmon, P. (2021). Blunted emotion judgments of body movements in Parkinson’s disease. Scientific Reports, 11(1), 18575. https://doi.org/10.1038/s41598-021-97788-1
Bernhardt, D., & Robinson, P., September). (pp. ). Berlin, Heidelberg. (2007). Detecting affect from non-stylised body motions. In International conference on affective computing and intelligent interaction In International conference on affective computing and intelligent interaction Berlin.
Blender Community, B. O. (2018). Blender - a 3D modelling and rendering package. Stichting Blender Foundation. Amsterdam. Retrieved from. http://www.blender.org. Accessed 20 May 2022
Boone, R. T., & Cunningham, J. G. (1998). Children’s decoding of emotion in expressive body movement: The development of cue attunement. Developmental Psychology, 34(5), 1007–1016.
Boone, R. T., & Cunningham, J. G. (2001). Children’s expression of emotional meaning in music through expressive body movement. Journal of Nonverbal Behavior, 25(1), 21–41. https://doi.org/10.1023/a:1006733123708
Boyd, S., & Vandenberghe, L. (2004). Convex Optimization. Cambridge University Press.
Byron, K., Terranova, S., & Nowicki, S., Jr. (2007). Nonverbal emotion recognition and salespersons: Linking ability to perceived and actual success. Journal of Applied Social Psychology, 37(11), 2600–2619. https://doi.org/10.1111/j.1559-1816.2007.00272.x
Camurri, A., Lagerlof, I., & Volpe, G. (2003). Recognizing emotion from dance movement: comparison of spectator recognition and automated techniques. International Journal of Human-Computer Studies, 59(1–2), 213–225.
Castellano, G., Villalba, S., & Camurri, A. (2007). Recognizing human emotions from body movement and gesture dynamics. Lecture Notes in Computer Science, 4738, 71.
Cheng, B., Xiao, B., Wang, J., Shi, H., Huang, T. S., & Zhang, L. (2019). HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation. arXiv:1908.10357 [cs.CV].
Christensen, J. F., Bruhn, L., Schmidt, E. M., Bahmanian, N., Yazdi, S. H. N., Farahi, F., Sancho-Escanero, L., & Menninghaus, W. (2023). A 5-emotions stimuli set for emotion perception research with full-body dance movements. Scientific Reports, 13(1). https://doi.org/10.1038/s41598-023-33656-4
Christensen, J. F., & Calvo-Merino, B. (2013). Dance as a subject for empirical aesthetics. Psychology of Aesthetics, Creativity, and the Arts, 7(1), 76–88. https://doi.org/10.1037/a0031827
Christensen, J. F., Gaigg, S. B., & Calvo-Merino, B. (2017). I can feel my heartbeat: Dancers have increased interoceptive accuracy. Psychophysiology, 55(4), 1–14. https://doi.org/10.1111/psyp.13008
Christensen, J. F., & Jola, C. (2015). Towards ecological validity in empirical aesthetics of dance. In M. Nadal, J. P. Huston, L. Agnati, F. Mora, & C. J. Cela-Conde (Eds.), Art, Aesthetics, and the Brain. Oxford University Press.
Christensen, J. F., Frieler, K., Vartanian, M., Khorsandi, S., Yazdi, S. H. N., Farahi, F., Smith, R.A., Walsh, W. (major revisions). A joy bias: Perception of expressive body language is modulated by enculturation.
Christensen, J. F., Lambrechts, A., & Tsakiris, M. (2019). The Warburg Dance Movement Library—The WADAMO Library: A validation study. Perception, 48(1), 26–57. https://doi.org/10.1177/0301006618816631
Christensen, J. F., Nadal, M., Cela-Conde, C. J., & Gomila, A. (2014). A norming study and library of 203 dance movements. Perception, 43(2/3), 178–206. https://doi.org/10.1068/p7581
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Lawrence Erlbaum Associates Inc.
Cosmides, L., & Tooby, J. (2000). Evolutionary psychology and emotions. In M. L. J. M. Haviland-Jones (Ed.), Handbook of emotions (pp. 91–115). Guilford.
Crane, E., & Gross, M. (2007). Motion Capture and Emotion: Affect Detection in Whole Body Movement. Affective Computing and Intelligent Interaction, Berlin, Heidelberg.
Crane, E. A., & Gross, M. M. (2013). Effort-shape characteristics of emotion-related body movement. Journal of Nonverbal Behavior, 37, 91–105. https://doi.org/10.1007/s10919-013-0144-2
Dael, N., Mortillaro, M., & Scherer, K. R. (2012). Emotion expression in body action and posture. Emotion, 12(5), 1085–1101. https://doi.org/10.1037/a0025737
Dahl, S., & Friberg, A. (2007). Visual perception of expressiveness in musicians’ body movements. Music Perception, 24(5), 433–454. https://doi.org/10.1525/mp.2007.24.5.433
Darwin, C. (1872/2009). The Expression of the Emotions in Man and Animals. Oxford University Press, Anniversary edition.
Davis, M. H. (1980). A multidimensional approach to individual differences in empathy. JSAS Catalog of Selected Documents in Psychology, 10(85).
de Gelder, B. (2006). Towards the neurobiology of emotional body language. Nature Reviews Neuroscience, 7(3), 242–249. https://doi.org/10.1038/nrn1872
de Gelder, B. (2009). Why bodies? Twelve reasons for including bodily expressions in affective neuroscience. Philosophical Transactions of the Royal Society London Series B: Biological Sciences, 364(1535), 3475–3484. https://doi.org/10.1098/rstb.2009.0190
Dekeyser, M., Verfaillie, K., & Vanrie, J. (2002). Creating stimuli for the study of biological-motion perception. Behavior Research Methods, Instruments, & Computers, 34(3), 375–382. https://doi.org/10.3758/BF03195465
Dittrich, W. H., Troscianko, T., Lea, S., & Morgan, D. (1996). Perception of emotion from dynamic point-light displays represented in dance. Perception, 25(6), 727–738.
Ekman, P. (1973/2015). Darwin and Facial Expression: A Century of Research in Review. Malor Books.
Ekman, P., & Friesen, W. V. (1971). Constants across cultures in the face and emotion. Journal of Personality and Social Psychology, 17(2), 124.
Elfenbein, H. A., & Ambady, N. (2002). Predicting workplace outcomes from the ability to eavesdrop on feelings. Journal of Applied Psychology, 87(5), 963–971.
Faul, F., Erdfelder, E., Lang, A.-G., & Buchner, A. (2007). G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39, 175–191. https://doi.org/10.3758/bf03193146
Gross, M. M., Crane, E. A., & Fredrickson, B. L. (2010). Methodology for Assessing Bodily Expression of Emotion. Journal of Nonverbal Behavior, 34(4), 223–248. https://doi.org/10.1007/s10919-010-0094-x
Gross, M. M., Crane, E. A., & Fredrickson, B. L. (2012). Effort-Shape and kinematic assessment of bodily expression of emotion during gait. Human Movement Science, 31(1), 202–221. https://doi.org/10.1016/j.humov.2011.05.001
Halovic, S., & Kroos, C. (2018). Not all is noticed: Kinematic cues of emotion-specific gait. Human Movement Science, 57, 478–488. https://doi.org/10.1016/j.humov.2017.11.008
Heberlein, A. S., Adolphs, R., Tranel, D., & Damasio, H. (2004). Cortical regions for judgments of emotions and personality traits from point-light walkers. Journal of Cognitive Neuroscience, 16(7), 1143–1158. https://doi.org/10.1162/0898929041920423
Hogan, N., & Sternad, D. (2009). Sensitivity of smoothness measures to movement duration, amplitude, and arrests. Journal of Motor Behavior, 41(6), 529–534. https://doi.org/10.3200/35-09-004-rc
Kainz, H., Graham, D., Edwards, J., Walsh, H. P. J., Maine, S., Boyd, R. N., Lloyd, D. G., Modenese, L., & Carty, C. P. (2017). Reliability of four models for clinical gait analysis. Gait Posture, 54, 325–331. https://doi.org/10.1016/j.gaitpost.2017.04.001
Karin, J. (2016). Recontextualizing dance skills: Overcoming impediments to motor learning and expressivity in ballet dancers. Frontiers in Psychology, 7. https://doi.org/10.3389/fpsyg.2016.00431
Karin, J., Haggard, P., & Christensen, J. F. (2016). Mental Training. In V. Wilmerding & D. Krasnow (Eds.), Dancer Wellness. Human Kinetics.
Keck, J., Zabicki, A., Bachmann, J., Munzert, J., & Krüger, B. (2022). Decoding spatiotemporal features of emotional body language in social interactions. Scientific Reports, 12(1), 15088. https://doi.org/10.1038/s41598-022-19267-5
Kirsch, L. P., Urgesi, C., & Cross, E. S. (2016). Shaping and reshaping the aesthetic brain: Emerging perspectives on the neurobiology of embodied aesthetics. Neuroscience & Biobehavioral Reviews, 62, 56–68. https://doi.org/10.1016/j.neubiorev.2015.12.005
Krüger, B., Kaletsch, M., Pilgramm, S., Schwippert, S. S., Hennig, J., Stark, R., Lis, S., Gallhofer, B., Sammer, G., Zentgraf, K., & Munzert, J. (2018). Perceived intensity of emotional point-light displays is reduced in subjects with ASD. Journal of Autism and Developmental Disorders, 48(1), 1–11. https://doi.org/10.1007/s10803-017-3286-y
Lim, S., Case, A., & D'Souza, C. (2016). Comparative Analysis of Inertial Sensor to Optical Motion Capture System Performance in Push-Pull Exertion Postures. Proceedings of the Human Factors and Ergonomics Society ... Annual Meeting. Human Factors and Ergonomics Society. Annual Meeting, 60(1), 970–974. https://doi.org/10.1177/1541931213601224
Ma, Y., Paterson, H. M., & Pollick, F. E. (2006). A motion capture library for the study of identity, gender, and emotion perception from biological motion. Behavior Research Methods, 38(1), 134–141. https://doi.org/10.3758/bf03192758
Masuda, M., Kato, S., & Itoh, H. (2010). A laban-based approach to emotional motion rendering for human-robot interaction. Paper presented at the Entertainment Computing - ICEC 2010, Berlin, Heidelberg.
McCarty, K., Darwin, H., Cornelissen, P. L., Saxton, T. K., Tovée, M. J., Caplan, N., & Neave, N. (2017). Optimal asymmetry and other motion parameters that characterise high-quality female dance. Scientific Reports, 7, 42435. https://doi.org/10.1038/srep42435
Montepare, J., Koff, E., Zaitchik, D., & Albert, M. (1999). The use of body movements and gestures as cues to emotions in younger and older adults. Journal of Nonverbal Behavior, 23(2), 133–152. https://doi.org/10.1023/A:1021435526134
Munafò, M. R., Nosek, B. A., Bishop, D. V. M., Button, K. S., Chambers, C. D., Percie du Sert, N., Simonsohn, U., Wagenmakers, E.-J., Ware, J. J., & Ioannidis, J. P. A. (2017). A manifesto for reproducible science. Nature Human Behaviour, 1(1), 0021. https://doi.org/10.1038/s41562-016-0021
Nirkin, Y., Masi, I., Tran, A. T., Hassner, T., & Medioni, G. (2017). On Face Segmentation, Face Swapping, and Face Perception. arXiv:1704.06729 [cs.CV].
O’Boyle, E. H., Jr., Humphrey, R. H., Pollack, J. M., Hawver, T. H., & Story, P. A. (2011). The relation between emotional intelligence and job performance: A meta-analysis. Journal of Organizational Behavior, 32(5), 788–818. https://doi.org/10.1002/job.714
Orgs, G., Caspersen, D., & Haggard, P. (2016). You move, I watch, it matters: Aesthetic Communication in Dance. In Sukhvinder S. Obhi & E. S. Cross (Eds.), Shared representations: Sensorimotor foundations of social life. Cambridge University Press
Pasch, M., & Poppe, R. (2007). Person or Puppet? The Role of Stimulus Realism in Attributing Emotion to Static Body Postures. Affective Computing and Intelligent Interaction, Berlin, Heidelberg.
Peer, E., Rothschild, D., Gordon, A., Evernden, Z., & Damer, E. (2021). Data quality of platforms and panels for online behavioral research. Behavior Research Methods, 1–20. https://doi.org/10.3758/s13428-021-01694-3
Piwek, L., Petrini, K., & Pollick, F. (2016). A dyadic stimulus set of audiovisual affective displays for the study of multisensory, emotional, social interactions. Behavior Research Methods, 48(4), 1285–1295. https://doi.org/10.3758/s13428-015-0654-4
Pollick, F. E., Paterson, H. M., Bruderlin, A., & Sanford, A. J. (2001). Perceiving affect from arm movement. Cognition, 82(2), B51-61. https://doi.org/10.1016/s0010-0277(01)00147-0
Poyo Solanas, M., Vaessen, M. J., & de Gelder, B. (2020). The role of computational and subjective features in emotional body expressions. Scientific Reports, 10(1), 6202–6202. https://doi.org/10.1038/s41598-020-63125-1
Roether, C. L., Omlor, L., Christensen, A., & Giese, M. A. (2009). Critical features for the perception of emotion from gait. Journal of Vision, 9(6), 15.11–32. https://doi.org/10.1167/9.6.15
Rosete, D., & Ciarrochi, J. (2005). Emotional intelligence and its relationship to workplace performance outcomes of leadership effectiveness. Leadership & Organization Development Journal, 26(5), 388–399. https://doi.org/10.1108/01437730510607871
Rubin, R. S., Munz, D. C., & Bommer, W. H. (2005). Leading from within: The effects of emotion recognition and personality on transformational leadership behavior. Academy of Management Journal, 48(5), 845–858. https://doi.org/10.5465/AMJ.2005.18803926
Sawada, M., Suda, K., & Ishii, M. (2003). Expression of emotions in dance: Relation between arm movement characteristics and emotion. Perceptual and Motor Skills, 97(3), 697–708.
Schepers, M., & Giuberti, M. (2018). Xsens MVN: Consistent Tracking of Human Motion Using Inertial Sensing. Technical Report.
Scherer, K. R., & Scherer, U. (2011). Assessing the ability to recognize facial and vocal expressions of emotion: Construction and validation of the Emotion Recognition Index. Journal of Nonverbal Behavior, 35(4), 305. https://doi.org/10.1007/s10919-011-0115-4
Scherer, K. R., Sundberg, J., Fantini, B., Trznadel, S., & Eyben, F. (2017). The expression of emotion in the singing voice: Acoustic patterns in vocal performance. J Acoust Soc Am, 142(4), 1805. https://doi.org/10.1121/1.5002886
Schlotz, W., Wallot, S., Omigie, D., Masucci, M. D., Hoelzmann, S. C., & Vessel, E. A. (2020). The Aesthetic Responsiveness Assessment (AReA): A screening tool to assess individual differences in responsiveness to art in English and German. Psychology of Aesthetics, Creativity, and the Arts, 15(4), 682–696. https://doi.org/10.1037/aca0000348
Schmidt, E.-M., Smith, R. A., Fernández, A., Emmermann, B., & Christensen, J. F. (2023). Mood induction through imitation of full-body movements with different affective intentions. British Journal of Psychology, 115(1), 148–180. https://doi.org/10.1111/bjop.12681
Shafir, T. (2016). Using movement to regulate emotion: Neurophysiological findings and their application in psychotherapy. Frontiers in Psychology, 7, 1451–1451. https://doi.org/10.3389/fpsyg.2016.01451
Shafir, T., Taylor, S. F., Atkinson, A. P., Langenecker, S. A., & Zubieta, J. K. (2013). Emotion regulation through execution, observation, and imagery of emotional movements. Brain and Cognition, 82(2), 219–227. https://doi.org/10.1016/j.bandc.2013.03.001
Shafir, T., Tsachor, R. P., & Welch, K. B. (2016). Emotion regulation through movement: Unique sets of movement characteristics are associated with and enhance basic emotions. Frontiers in Psychology, 6, 2030–2030. https://doi.org/10.3389/fpsyg.2015.02030
Shikanai, N., Sawada, M., & Ishii, M. (2013). Development of the Movements Impressions Emotions Model: Evaluation of movements and impressions related to the perception of emotions in dance [journal article]. Journal of Nonverbal Behavior, 37(2), 107–121. https://doi.org/10.1007/s10919-013-0148-y
Skogstad, S. A., Nymoen, K., & Høvin, M. (2011). Comparing Inertial and Optical MoCap Technologies for Synthesis Control. Proceedings of SMC 2011 8th Sound and Music Computing Conference “Creativity rethinks science, 421–426.
Smith, R. A., & Cross, E. S. (2022). The McNorm library: creating and validating a new library of emotionally expressive whole body dance movements. Psychological Research. https://doi.org/10.1007/s00426-022-01669-9
Smith, R. A., & Pollick, F. E. (2022). The role of dance experience, visual processing strategies, and quantitative movement features in recognition of emotion from whole-body movements. In C. Fernandes, V. Evola, & C. Ribeiro (Eds.) Dance Data, Cognition, and Multimodal Communication. London: Routledge.
Stanton, K., Carpenter, R. W., Nance, M., Sturgeon, T., & Villalongo Andino, M. (2022). A multisample demonstration of using the Prolific platform for repeated assessment and psychometric substance use research. Experimental and Clinical Psychopharmacology. https://doi.org/10.1037/pha0000545
Vaessen, M. J., Abassi, E., Mancini, M., Camurri, A., & de Gelder, B. (2018). Computational feature analysis of body movements reveals hierarchical brain organization. Cerebral Cortex, 29(8), 3551–3560. https://doi.org/10.1093/cercor/bhy228
Van Dyck, E., Burger, B., & Orlandatou, K. (2017). The communication of emotions in dance. In M. Lesaffre, P.-J. Maes, & M. Leman (Eds.), The Routledge companion to embodied music interaction (pp. 122–130). Routledge. https://doi.org/10.4324/9781315621364-14
Van Dyck, E., Maes, P.-J., Hargreaves, J., Lesaffre, M., & Leman, M. (2013). Expressing induced emotions through free dance movement. Journal of Nonverbal Behavior, 37(3), 175–190. https://doi.org/10.1007/s10919-013-0153-1
Van Meel, J., Verburgh, H., & De Meijer, M. (1993). Children’s interpretations of dance expressions. Empirical Studies of the Arts, 11(2), 117–133.
Vanrie, J., & Verfaillie, K. (2004). Perception of biological motion: A stimulus set of human point-light actions. Behavior Research Methods, Instruments, & Computers, 36(4), 625–629. https://doi.org/10.3758/BF03206542
Wallbott, H. G. (1998). Bodily expression of emotion. European Journal of Social Psychology, 28(6), 879–896. https://doi.org/10.1002/(SICI)1099-0992(1998110)28:6%3c879::AID-EJSP901%3e3.0.CO;2-W
Walter, F., Cole, M. S., van der Vegt, G. S., Rubin, R. S., & Bommer, W. H. (2012). Emotion recognition and emergent leadership: Unraveling mediating mechanisms and boundary conditions. The Leadership Quarterly, 23(5), 977–991. https://doi.org/10.1016/j.leaqua.2012.06.007
Wu, Y., Kirillov, A., Massa, F., Lo, W., & Girshick, R. (2019). Detectron2. Retrievend 16-May 2022 https://github.com/facebookresearch/detectron2.
XSENS. (2023). MVN Biomechanical Model. XSENS. Retrieved 16-May-2023 from https://www.movella.com/applications/entertainment
XSENS Manual, 2020). MVN User Manual. In Xsens (Ed.). https://www.xsens.com/hubfs/Downloads/usermanual/MVN_User_Manual.pdf
Zuskin, E., Schachter, E. N., Mustajbegovic, J., Pucarin-Cvetkovic, J., & Lipozencic, J. (2007). Occupational health hazards of artists. Acta Dermatovenerologica Croatica, 15(3), 167–177.
Acknowledgements
All authors of this project were funded by the Max Planck Society, Germany. In addition, AF was supported by the International Max Planck Research School for Intelligent Systems (IMPRS-IS). RAS was funded by The Economic and Social Research Council (ESRC), UK. We would like to express our gratitude to Professor Melanie Wald-Fuhrmann, director of the Music Department at the Max Planck Institute for Empirical Aesthetics (MPIEA), for her strong support and for lending us the XSENS® Motion Capture suit. We would also like to thank Rainer Pollack, Stefan Redeker, Holger Stenschke, Nikita Kudakov, Lena Blis, Stefan Strien, Nancy Schön, Norbert Bergermann, Patrick Ulrich, and the team from the MPIEA graphics department, all for invaluable support at various stages of this project. Finally, and importantly, we are indebted to the directors at the MPIEA, Professor Fredrik Ullén, and emeritus Professor Winfried Menninghaus for their kind support at different stages of this project.
Funding
Open Access funding enabled and organized by Projekt DEAL. International Max Planck Research School for Intelligent Systems (IMPRS-IS), Max-Planck-Gesellschaft, Economic and Social Research Council.
Author information
Authors and Affiliations
Contributions
Julia F. Christensen: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Resources, Data Curation, Writing – original draft, Writing – review & editing, Visualization, Supervision, Project administration, Funding acquisition. Andrés Fernández: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Resources, Data Curation, Writing – review & editing, Visualization. Rebecca A. Smith: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Resources, Writing – review & editing, Visualization. Georgios Michalareas: Conceptualization, Methodology, Software, Writing – review & editing. Sina H.N. Yazdi: Conceptualization, Methodology, Software, Validation, Resources. Fahima Farahi: Conceptualization, Methodology, Software, Validation, Resources. Eva-Madeleine Schmidt: Methodology, Formal analysis, Data Curation. Nasimeh Bahmanian: Methodology, Formal analysis, Data Curation. Gemma Roig: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Resources, Writing – review & editing, Visualization, Supervision.
Corresponding author
Ethics declarations
Open practices statement
The dataset and all materials can be downloaded from Zenodo: https://zenodo.org/record/7821844 (Zenodo https://doi.org/10.5281/zenodo.7821844). The software is available on GitHub: https://github.com/andres-fr/emokine Comprehensive Readme files accompany the data and software. None of the experiments was preregistered.
Competing interest
The authors have no conflict of interest to disclose.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
JFC and AF contributed equally to the article (shared first authorship).
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Christensen, J.F., Fernández, A., Smith, R.A. et al. EMOKINE: A software package and computational framework for scaling up the creation of highly controlled emotional full-body movement datasets. Behav Res 56, 7498–7542 (2024). https://doi.org/10.3758/s13428-024-02433-0
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.3758/s13428-024-02433-0