Introduction

Body movements convey a large amount of emotional information that is essential for social communication. Johansson (1973) first used 10–12 bright spots representing the main joints and showed that such point-light displays (PLDs) can present human walking, running, and dancing; such motion patterns are referred to as biological motion. This technique neatly separates body movements from body shape. Emotional biological motion stimuli, as depicted by PLDs, have been widely used in psychiatry (e.g., Jimenez et al., 2018; Nackaerts et al., 2012; Okruszek, 2018), developmental psychology (e.g., Ogren et al., 2019; Pavlova et al., 2001; Ross et al., 2012), psychophysics (e.g., Ye et al., 2019), and social neuroscience (e.g., Atkinson et al., 2012; Mazzoni et al., 2017; Pavlova, 2012).

Various early studies created sets in this field by placing reflective tapes on an actor dressed in black clothes and using a camera to record their emotional performance. For example, Walk and Homan (1984) recruited two female performers who were required to dance or walk. Twelve white cotton balls were attached to their shoulders, elbows, wrists, hips, knees, and ankles. All performances were recorded using a TV camera. Six emotional videos, each lasting 15 or 20 seconds and representing happiness, fear, surprise, disgust, anger, or sadness, were finally created, but this set did not place a light spot on the head. Dittrich et al. (1996) employed two dancers (one female and one male) and attached 13 light points to their main joint. Dancers were instructed to perform fear, anger, grief, joy, surprise, and disgust. Twenty-four emotional dance expressions with a duration of five seconds were recorded, comprising four types (upright versus inverted and full-light versus point-light). These studies originally explored individuals’ emotion recognition of biological motion, but the number of PLDs they used was too small for the investigators to make an adequate selection.

Atkinson et al. (2004) may have constructed the most popular material set in this field. Ten actors with masked faces and bodies were recruited to express their emotions (anger, disgust, fear, happiness, and sadness) and neutral states. Two sets (i.e., PLD and full-light display) of 150 emotional videos were made from the same digital material and contained static images corresponding to the peak of each video. The authors examined recognition accuracy for each condition and found that all emotions were correctly classified above the probability level (20%), except for disgust. They also reported that exaggerated body movements produced more accurate classification and higher emotional intensity ratings. Atkinson and his colleagues then optimized this material system (Atkinson et al., 2007; Atkinson et al., 2012). This set of dynamic and static biological motion has been widely used in neuroimaging research (e.g., Mazzoni et al., 2017; Peelen et al., 2010) and clinical populations (e.g., Philip et al., 2010; Strauss et al., 2015).

With the development of technology, motion capture systems have been widely used to collect human kinematic data (e.g., the position and rotation data of anatomical nodes) that can be processed by certain software (e.g., MATLAB) to produce PLDs. For example, Pollick et al. (2001) investigated the visual perception of PLDs of arm movements. Two actors were instructed to read emotional scenes and then to perform drinking and knocking movements to convey ten emotions, including afraid, angry, excited, happy, neutral, relaxed, sad, strong, tired, and weak. Dynamic three-dimensional data for the head, right shoulder, elbow, wrist, and first and fourth metacarpal joints were recorded. One hundred and twenty arm movements were performed. Pollick et al. (2001) also found that these emotional arm movements could be represented by a two-dimensional circumplex structure of activation and pleasure. However, this set did not include data based on the whole body. Ma et al. (2006) used a high-speed optical motion capture camera system and created a library of 4080 movements performed by 30 nonprofessional actors. This library incorporated 240 walking recordings, 3600 arm movements (knocking, lifting, and throwing), and 240 sequences of arm movements separated by walking, including happiness, neutral, and sadness. They also built three types of models in 3D Studio MAX, based on the dynamic data of 15 main joints in the body. Unfortunately, these two relatively large material sets did not publicly provide rating indicators (e.g., recognition accuracy) that are essential to experimental design and operation. The absence of indicators is also not conducive to cross-cultural study.

Recently, some researchers have created similar material sets when studying the relationship between emotional biological movements and other factors. For example, Alaerts et al. (2011) recruited one male and one female actor. Each actor was required to perform five actions (walking, jumping, kicking, drinking, wiping) and four emotions (neutral, happiness, sadness, anger), and 40 motion scenarios were created. During the production of PLDs, each video had three views (front, side, intermediate), resulting in 120 PLDs. They also found that the recognition accuracy of female participants was higher than that of male participants, whereas the response time of male participants was longer, revealing a female advantage in the recognition of emotional biological motion. Halovic and Kroos (2018b) created 423 PLDs with 17 point-lights, containing five emotional states (happiness, sadness, anger, fear, neutral), and observed the influence of the actor’s gender on participants’ emotion categorization. Ross et al. (2012) created 28 PLDs with 15 point-lights and four emotions (happy, sadness, fear, anger), and investigated the developmental changes in emotional biological motion recognition. They reported that the critical developmental period of emotion recognition is 8.5 years of age. Note that although these researchers created some materials in their studies, these sets were not public, and access information was not provided.

In our prior study, we also created a public emotional kinematic dataset (Zhang et al., 2020). Specifically, a portable wireless motion capture system with 17 wearable sensors was used, and 22 semi-professional actors performed movements reflecting happiness, sadness, anger, fear, disgust, surprise, and neutral based on standardized guidance and preferred daily events. We collected a total of 1402 recordings at 125 Hz that consisted of the position and rotation data of 72 anatomical nodes. Recently, Ghaleb et al. (2021) used our dataset to test the proposed model framework (i.e., graph convolutional networks and spatial attention mechanisms) and reported accurate performance. However, this dataset has not been visualized and cannot be used as psychological experiment material.

Therefore, the present study aimed to produce a novel, comprehensive stimulus set, including neutral and emotional PLDs, based on three views (i.e., frontal 0°, left 45°, and left 90°). There are several advantages of the Dalian Emotional Movement Open-source Set (DEMOS): (1) It contains 2664 videos based on the whole body, with identical proportions of female and male models, whereas most of the previous sets mentioned above comprise a smaller number. The relatively large amount of emotional biological motion material in the DEMOS will greatly increase the selectivity for researchers to conduct experiments. (2) There are three views of the whole body. In daily life, people do not always communicate face-to-face and sometimes need to identify each other’s emotions from a certain perspective. Previous studies have also highlighted the important influence of views on the recognition of human faces (Almeida et al., 2020; Foster et al., 2022; Goeleven et al., 2008; Thoma et al., 2013) and bodies (Foster et al., 2022; Gross et al., 2012; He et al., 2020; Moors et al., 2015; Pollux et al., 2019; Thoma et al., 2013). Although Alaerts et al. (2011) considered this factor of view, they did not collect kinematic data of the head—a key point in the perception of the human body (Arizpe et al., 2017; Brandman & Yovel, 2010; Minnebusch et al., 2009; Yovel et al., 2010) and emotional body movement (Witkower & Tracy, 2019). (3) The DEMOS has four mainstream indicators (i.e., recognition accuracy, emotional intensity, subjective movement, and objective movement) in the field of emotional biological motion, which is beneficial for researchers to directly consider and control in their studies. Some prior sets did not publicly provide these indicators or just reported recognition accuracy. Indeed, these indicators could be considered as extraneous variables that should be controlled. For example, recent research has maintained similar subjective movements across emotions (Mazzoni et al., 2017).

In summary, we describe the construction of the DEMOS and provide validation data for use. We also investigate the relationships between these indicators.

Methods

Construction stage

Raw kinematic data for creating biological motion videos were sourced from our recent work (Zhang et al., 2020). In the present study, we only selected happy, sad, angry, fearful, disgusted, and neutral recordings, and did not include surprised items because of their ambiguous emotional valence (Reisenzein et al., 2019). Therefore, 1190 recordings served as raw kinematic data.

All operations in the construction stage (see Fig. 1) were conducted by three senior psychology graduate students with extensive experience in the field of emotion. Their operations followed consistent standardized guidelines. First, the original data in RAW format were converted to FBX format, based on the synchronization of MotionBuilder (https://www.autodesk.com/products/motionbuilder) and Axis Neuron (https://www.neuronmocap.com/content/axis-neuron) software. Second, in-house Unity (https://unity.com) programs were used to build the light-point model with three views (i.e., frontal 0°, left 45°, and left 90°), 13 key white nodes (i.e., head, both shoulders, sides-upper arms, hips, knees, feet, and hands), and a black background for each recording, all of which were then exported to MP4 format. Finally, we used Adobe Premiere (https://www.adobe.com/products/premiere.html) to cut the video to two seconds. The length of these videos (i.e., two seconds) was in line with material used in some prior studies (Atkinson et al., 2012; Bellot et al., 2021; Mazzoni et al., 2017). The two-second clip should contain at least the onset and peak of the corresponding emotion in the original performance; otherwise, the video would be indicated as a “bad video” and then discussed by the three graduate students. If two of them voted it as not a bad video, this clip would be retained; otherwise, it would be excluded. Clips with distinctly low-quality signals were also excluded. Based on previous studies (e.g., Kret et al., 2011) and the features of software commonly used in experimental psychology (e.g., E-Prime and Psychtoolbox), the parameters of video materials were set as MP4 format, 720 × 540 pixels, 25 frames per second (fps), and approximately 100 KB file size. The DEMOS includes 2664 valid videos (see Table 1).

Fig. 1
figure 1

The construction flow of the DEMOS

Table 1 The number of videos under all conditions

Validation stage

Participants

As paid volunteers, 47 college students from Liaoning Normal University participated in the present validation stage after providing informed written consent. All participants were right-handed with normal or corrected-to-normal vision. They did not self-report any severe physical diseases or mental disorders. The study was approved by the Human Research Institutional Review Board at Liaoning Normal University, following the tenets of the Declaration of Helsinki (1991). Forty-two participants were included in the data analysis (26 female; aged 18–25 years old, M ± SD, 21.62 ± 2.00) because five participants dropped out.

Procedure

The validation procedure was conducted using E-prime 2.0 (Psychology Software Tools, Inc.) with a 6 (emotion: neutral, happiness, sadness, anger, fear, disgust) × 3 (view: 0°, 45°, 90°) within-subjects design. Participants were required to complete three sessions. Each session consisted of only one view (i.e., 888 0°, 45°, or 90° videos) and took 1.5–2 hours, at least 24 hours between each session to prevent fatigue. The order of three view sessions was counterbalanced across participants. In each session, 888 videos were randomly and equally assigned to eight blocks. Participants took a full break between each block. The participants were seated in a soundproof room with their eyes approximately 70 cm from a 19-inch screen of 1440 × 900 pixels. The screen had a 60 Hz refresh rate. All stimuli were displayed in the center of the screen.

For each trial (see Fig. 2), a 300–600 ms fixation was presented first, and the two-second emotional video then appeared. After that, participants were required to complete three tasks using the mouse with their right index finger: (1) six-alternative forced-choice task (neutral, happiness, sadness, anger, fear, disgust), (2) emotional intensity rating with a 9-point scale (1 = very low intensity, 9 = very high intensity), and (3) subjective movement rating (1 = very low movement, 9 = very high movement), referring to the perception of the amount of movement in the clips (Mazzoni et al., 2017; Vaessen et al., 2019). The inter-trial interval lasted 500 ms to 1000 ms. The order of six options in the forced-choice task and the order of the three tasks were counterbalanced across participants.

Fig. 2
figure 2

An example trial sequence in the validation stage. ITI, inter-trial interval

Before the formal experiment, a practice session was performed and comprised six trials to ensure that the participants had a good understanding of the task.

The low-level action feature (i.e., objective movement) may affect the neural difference between movement styles and has been considered in some brain neuroimaging studies (e.g., Cross et al., 2012; Pichon et al., 2008, 2009; Ross et al., 2019; Ross et al., 2020; Schippers et al., 2010; Williams et al., 2020). We used a custom MATLAB code from Cross et al. (2012) and Williams et al. (2020) to quantify the objective movement of the videos in DEMOS. Specifically, a difference image was calculated from two consecutive frames in each video. If the luminance change in any pixel exceeded ten units, it was labeled a “moving pixel.” We averaged the number of moving pixels per frame and video to represent the value of objective movement. The theoretical scale of objective movement in a video should range from 0 to 19,051,200 [720 pixels × 540 pixels × (2 s × 25 fps − 1)]. The minimum value indicates that there is no change in any pixels except the pixels of the last frame, and the maximum value indicates that all pixels have a change except the pixels of the last frame.

Data analyses

Multivariate analysis of variance (ANOVA) with a 6 (emotion type: neutral, happiness, sadness, anger, fear, disgust) × 3 (view: 0°, 45°, 90°) design was performed to test the recognition accuracy, emotional intensity, subjective movement, and objective movement. Pairwise or multiple comparisons were performed using Bonferroni correction. The one-sample t test against the chance level (i.e., 0.167 in the present experiment) in recognition accuracy for each condition was used to examine reliability.

To investigate the misclassification of the participants, for each stimulus emotion in each view condition, similar univariate ANOVA with five incorrect choices was conducted. Multiple comparisons were performed using Bonferroni correction. The one-sample t test against 0.167 in misclassification rate for each condition was used to examine whether the participants selected the confused emotion at a probability far higher than chance.

Pearson’s correlations were computed to explore the relationship among recognition accuracy, emotional intensity, and subjective movement under all conditions. To test the reliability of subjective movement rating, Pearson’s correlations were also used to investigate the relationship between subjective movement and objective movement under all conditions.

All statistical analyses were performed using SPSS 26.0 for Windows.

Results

Recognition accuracy

Mean recognition accuracies in the DEMOS were between 0.414 and 0.776 (see Table S1). The main effects of view [F(2, 2646) = 3.97, p = 0.019, \(\eta^{2}_{\mathrm{p}}\) = 0.003] and emotion type [F(5, 2646) = 84.49, p < 0.001, \(\eta^{2}_{\mathrm{p}}\) = 0.138] were both significant (see Fig. 3). Specifically, recognition accuracy of the 90° view (M ± SD, 0.591 ± 0.272) was lower than those of the 45° view (0.622 ± 0.266, p = 0.027) and the 0° view (0.620 ± 0.269, p = 0.043), but there was no significant difference in recognition accuracy between the latter two view conditions (p > 0.05). Neutral stimuli (0.755 ± 0.139) were the easiest to identify, followed by happy (0.672 ± 0.305), angry (0.670 ± 0.271), fearful (0.637 ± 0.247), disgusted (0.506 ± 0.252), and sad ones (0.455 ± 0.232). There were no significant differences in recognition accuracy among happiness, anger, and fear (ps ≥ 0.451); the other multiple comparisons were significant (ps ≤ 0.035). The interaction effect between view and emotion type was not statistically significant [F(10, 2646) = 1.37, p = 0.187, \(\eta^{2}_{\mathrm{p}}\) = 0.005]. The one-sample tests demonstrated that participants were able to reliably recognize all emotions above the chance level under all view conditions (ts ≥ 13.91, ps ≤ 0.001).

Fig. 3
figure 3

Violin plot of the distribution of recognition accuracy under all conditions. Dashed lines show the upper and lower quartiles. Solid lines depict median values. Dotted lines indicate the chance level (0.167)

Misclassification performance

The results of ANOVAs showed that all effects of incorrect choice were significant [Fs ≥ 3.93, ps ≤ 0.004, all \(\eta^{2}_{\mathrm{p}}\)  0.017]. Specifically, in all view conditions, happy expressions tended to be most easily misclassified as anger (ps < 0.001; see Fig. 4), sad expressions were most easily confused with neutral (ps < 0.001), and neutral expressions were also mostly confused with sadness (ps < 0.001). For angry (ps ≥ 0.060), fearful (ps ≥ 0.058), and disgusted (ps ≥ 0.068) stimuli, there were no most easily misclassified responses regardless of the view condition.

Fig. 4
figure 4

Mean probability of misclassification under all conditions. Red squares represent the probabilities significantly higher than the chance level. Diagonal entries denote recognition accuracies

The results of one-sample t tests showed that only the probabilities of misclassifying sad expressions as neutral were significantly higher than the chance level regardless of the view condition (ts ≥ 6.13, ps < 0.001), suggesting that sadness was quite highly confused with neutral.

Emotional intensity

The main effects of view [F(2, 2646) = 15.38, p < 0.001, \(\eta^{2}_{\mathrm{p}}\) = 0.011] and emotion type [F(5, 2646) = 355.10, p < 0.001, \(\eta^{2}_{\mathrm{p}}\) = 0.402] were both significant (see Fig. 5). Specifically, the emotional intensity of the 0° view (M ± SD, 5.81 ± 0.92) was rated higher than that of the 45° (5.69 ± 0.98, p = 0.001) and 90° (5.62 ± 0.94, p < 0.001) views; the emotional intensity of the latter two conditions did not differ significantly (p = 0.169). Happy stimuli (6.54 ± 0.98) were rated with the highest intensity, followed by angry (6.00 ± 0.81), fearful (5.98 ± 0.72), disgusted (5.59 ± 0.65), sad (5.19 ± 0.70), and neutral ones (4.58 ± 0.27); all multiple comparisons were significant (ps ≤ 0.001) besides the difference between angry and fearful conditions (p > 0.05). The interaction effect between view and emotion type was not statistically significant [F(10, 2646) = 1.41, p = 0.172, \(\eta^{2}_{\mathrm{p}}\) = 0.005].

Fig. 5
figure 5

Violin plot of the distribution of emotional intensity under all conditions. Dashed lines indicate the upper and lower quartiles. Solid lines represent median values

Subjective movement

The main effect of emotion type was significant [F(5, 2646) = 527.45, p < 0.001, \(\eta^{2}_{\mathrm{p}}\) = 0.499] (see Fig. 6). Specifically, happy stimuli (M ± SD, 6.11 ± 1.10) were rated with the highest movement, followed by angry (5.36 ± 0.91), fearful (4.98 ± 0.81), disgusted (4.41 ± 0.66), sad (3.88 ± 0.71), and neutral items (3.74 ± 0.55); all multiple comparisons were significant (ps ≤ 0.001) besides the difference in subjective movement between sad and neutral conditions (p = 0.333). Neither the main effect of view [F(2, 2646) = 1.02, p = 0.361, \(\eta^{2}_{\mathrm{p}}\) = 0.001] nor the interaction effect between view and emotion type [F(10, 2646) = 0.46, p = 0.919, \(\eta^{2}_{\mathrm{p}}\) = 0.002] reached statistical significance.

Fig. 6
figure 6

Violin plot of the distribution of subjective movement under all conditions. Dashed lines indicate the upper and lower quartiles. Solid lines represent median values

Objective movement

The main effect of emotion type was significant [F(5, 2646) = 315.95, p < 0.001, \(\eta^{2}_{\mathrm{p}}\) = 0.374] (see Fig. 7). Specifically, the objective movement of happy stimuli (M ± SD, 737.76 ± 143.94) were the highest, followed by angry (630.73 ± 133.76), fearful (621.80 ± 119.08), disgusted (520.13 ± 103.59), sad (463.51 ± 142.87), and neutral ones (453.20 ± 144.64); all multiple comparisons were significant (ps ≤ 0.001) besides the difference between angry and fearful conditions (p > 0.05) and the difference between sad and neutral conditions (p > 0.05). The main effect of view was also significant [F(2, 2646) = 27.48, p < 0.001, \(\eta^{2}_{\mathrm{p}}\) = 0.020]. Specifically, the objective movement of the 90° view (551.95 ± 154.94) was significantly lower than that of 0° view (592.69 ± 170.07; p < 0.001) and 45° view (591.25 ± 167.00; p < 0.001). There was no significant difference in objective movement between 0° view and 45° view (p > 0.05). The interaction effect between view and emotion type did not reach statistical significance [F(10, 2646) = 0.56, p = 0.846, \(\eta^{2}_{\mathrm{p}}\) = 0.002].

Fig. 7
figure 7

Violin plot of the distribution of objective movement under all conditions. Dashed lines indicate the upper and lower quartiles. Solid lines represent median values

Correlation analyses

We first conducted correlation analyses based on participants’ performance, and significant positive correlations were found among recognition accuracy, emotional intensity, and subjective movement for happy (rs ≥ 0.683, ps ≤ 0.001), angry (rs ≥ 0.414, ps ≤ 0.001), fearful (rs ≥ 0.241, ps ≤ 0.002), and disgusted (rs ≥ 0.245, ps ≤ 0.002) stimuli, regardless of views (see Fig. 8). For the sad videos under all views, recognition accuracy was significantly positively correlated with emotional intensity (rs ≥ 0.486, ps ≤ 0.001), emotional intensity was significantly positively correlated with subjective movement (rs ≥ 0.704, ps ≤ 0.001), but recognition accuracy did not significantly correlate with subjective movement (rs ≤ 0.138, ps ≥ 0.095). For the neutral videos under all views, recognition accuracy was significantly negatively correlated with emotional intensity (rs ≤ −0.267, ps ≤ 0.004), emotional intensity was significantly positively correlated with subjective movement (rs ≥ 0.218, ps ≤ 0.021), but recognition accuracy was not significantly correlated with subjective movement (rs ≤ 0.123, ps ≥ 0.193).

Fig. 8
figure 8

Correlation matrix representing the relationships among recognition accuracy, emotional intensity, and subjective movement under all conditions. Color scale indicates the value of correlation. ns = not significant, *p < 0.05, **p < 0.01, ***p < 0.001

The correlations between subjective movement and objective movement under all conditions were significantly positive (rs ≥ 0.603, ps ≤ 0.001; see Table 2)

Table 2 Correlations between subjective movement and objective movement under all conditions

General discussion

In summary, we developed and validated a new stimulus set of emotional biological motion comprising many high-quality videos of PLDs (i.e., DEMOS), each displaying happiness, sadness, anger, fear, disgust, and neutral. The DEMOS also has three views and four indicators.

Recognition accuracy is undoubtedly the most important indicator for the reliability of an emotional material set. The present validation study showed that recognition accuracy for each emotion and each view in the DEMOS were relatively high and significantly higher than the chance level (0.167). At first glance, mean recognition accuracies in the DEMOS were between 0.414 and 0.776, that seemed to be lower than that of PLDs in Atkinson et al. (2004) (0.6306–0.8417), Walk and Homan (1984) (71–96%), and Ross et al. (2012) (overall 81.1% in adults), but comparable to or higher than Pollux et al. (2016) (40–90% in young adults), Alaerts et al. (2011) (44.2–58.6%), and Halovic and Kroos (2018b) (9–26%). It should also be noted that the number of options provided for participants often depends on the kind of emotions in their studies, resulting in different chance levels. For example, we used a six-alternative forced-choice task, Atkinson et al. (2004) used five emotions (i.e., anger, disgust, fear, happiness, sadness), and Ross et al. (2012) used four emotions (i.e., happiness, sadness, scare, anger). In the DEMOS, the distributions of recognition accuracy under all conditions in Fig. 3 suggest that more than 75% of videos in the DEMOS can be accurately distinguished, higher than the chance level. Moreover, the loss of hand form information in PLDs should also be discussed. Previous work has found that the hands are used more in the recognition of some emotions compared to others, particularly when the hands are used for shields in fear, fists in anger, or holding the nose in disgust (Fridin et al., 2009; Ross & Flack, 2020). PLDs do not have the hand form information that contributes to emotion recognition (Johansson, 1973). Indeed, when looking at the data in Atkinson et al. (2004), the recognition rate significantly drops between point-light and full-light displays only for those three emotions. The lost hand form information should be considered. Therefore, the overall recognition accuracy of the DEMOS is comparable to that of other validation studies, and this set provides researchers with many emotional whole-body-based PLDs with a high recognition rate.

We also found that in addition to neutral PLDs, happy PLDs were recognized best. Regarding the difference in recognition accuracy among basic emotions, prior results were inconclusive. For example, Ross et al. (2012) did not observe significant differences in recognition scores for happiness, sadness, fear, or anger in either children or adults. Alaerts et al. (2011) found that angry PLDs were the easiest to recognize. However, the superiority of identifying happy PLDs in the present study is consistent with most previous sets, regardless of the number of forced-choice options (Atkinson et al., 2004; Halovic & Kroos, 2018a, 2018b; Lee & Kim, 2017; Pollux et al., 2016; Walk & Homan, 1984). Happiness had the highest subjective and objective movements and emotional intensity. Higher subjective movement ratings indicate that participants may experience higher speed, emotional intensity, and exaggeration when watching happy PLDs. This is partially supported by our correlation results, in which significant positive correlations existed among recognition accuracy, emotional intensity, and subjective movement for happiness. Previous studies have shown that subjective speed or body movement affects the intensity rating and recognition accuracy for emotional expressions by participants (Atkinson et al., 2004; Wallbott., 1998).

From the perspective of affective computing, researchers tend to quantify body movements based on form, kinematic (e.g., velocity), and dynamic (e.g., force) information (de Gelder & Poyo Solanas, 2021; Witkower & Tracy, 2019). This field has found that velocity, acceleration, and jerkiness strongly influence the perception of emotional arm movements (Paterson, 2001; Pollick et al., 2001; Sawada et al., 2003) and whole-body movements (Halovic & Kroos, 2018a; Poyo Solanas et al., 2020a; Poyo Solanas et al., 2020b; Roether et al., 2009; Vaessen et al., 2019). Although we only calculated a general value (i.e., objective movement) based on the luminance changes across frames, this index can denote the amount of motion within PLDs and comprehensively represent these above quantitative features (Cross et al., 2012; Pichon et al., 2008, 2009; Ross et al., 2019; Ross et al., 2020; Schippers et al., 2010; Williams et al., 2020). The current results also showed significant positive correlation between subjective movement and emotional intensity for happy PLDs. The brain basis of behavioral and computational features for emotional body movements has recently been explored (Poyo Solanas et al., 2020a; Vaessen et al., 2019). Therefore, the typical subjective and objective features discussed above might contribute to the recognition superiority for happy PLDs in the present study.

We also found that happy PLDs tended to be misclassified as anger, but not vice versa. Most previous PLD sets did not report the misclassification performance (e.g., Alaerts et al., 2011; Halovic & Kroos, 2018b; Pollux et al., 2016; Ross et al., 2012), so it was hard to compare our results with theirs. However, this result is partially consonant with those of Dittrich et al. (1996) and Atkinson et al. (2004), who found that participants often mixed up angry and happy dynamic PLDs in both directions. Angry and happy body expressions share similar features in the body coding system, such as arms out and fast and energetic movement (Witkower & Tracy, 2019). The current results also showed that happy and angry PLDs had more subjective and objective movements than other basic emotions, which supports the above explanation. Moreover, both angry and happy performances in the DEMOS frequently included fists shaking and arms forward or upward.

Sad expressions had the lowest recognition accuracy in the DEMOS. This finding was inconsistent with some previous PLD sets. Sad PLDs had the highest recognition accuracy in both Atkinson et al. (2004) and Ross et al. (2012) (in adults), had the second-highest recognition accuracy in young adults in Pollux et al. (2016), and were in the middle of the recognition accuracy rank in other studies (Alaerts et al., 2011; Halovic & Kroos, 2018b). The result of confusion responses showed that the participants misclassified sadness as neutral and the reverse above chance in all view conditions. This may be due to less movement conveyed by sad and neutral PLDs. As shown in Figs. 6 and 7, sadness and neutral had less subjective and objective movements than other emotions, and there were no significant movement differences between these two emotions, resulting in less available information to distinguish different emotions. Therefore, we observed relatively low recognition accuracy for sad PLDs.

We found that PLDs with the 90° view had the lowest recognition accuracy in general. Body orientation is an important social interaction cue (Foster et al., 2022; Moors et al., 2015). For instance, when someone faces us, they may want to communicate with us; otherwise, they may interact with other people. On the one hand, spatial overlap might contribute to the recognition inferiority for the PLDs with 90° view that we observed. Human body movements are often not three-dimensionally symmetrical in daily life. There is some spatial overlap in body movements perceived from different views, especially for the hands, arms, and feet, that affects the speculation and recognition of the performer's action intentions and emotions (Poyo Solanas et al., 2020a; Poyo Solanas et al., 2020b). Ghaleb et al. (2021) recently analyzed the raw kinematic data of DEMOS and revealed the greatest significance of hands and arms for emotion recognition. Some studies have indirectly indicated that the spatial overlap of body movements in the side 90° view is the highest (Dael et al., 2012a).

On the other hand, a disconnect between the observer and the actor in different view conditions could be considered. When the actor is directed at the camera, it is easier to simulate the emotion portrayed, as observers are an active participant in it (Ross & Atkinson, 2020; Wood et al., 2016). There might be a disconnect as well between directed emotions and side view compared to other views. For example, if the actor is angry, when they do not face the observer and the observer is hard to do anything to instigate the anger, it is difficult to recognize. Note that Alaerts et al. (2011) did not find significant differences in recognition accuracy for emotional PLDs among the same three views, presumably because their PLDs did not include the head, which is an important factor in the recognition of emotional body movement. The head plays a key role in holistic body processing (Arizpe et al., 2017; Brandman & Yovel, 2010; Minnebusch et al., 2009; Yovel et al., 2010) and emotional body coding (Witkower & Tracy, 2019). Head orientation also serves as a significant cue in the judgment of emotion (Dael et al., 2012b; Ekman & Friesen, 1967; Van Cappellen & Edwards, 2021). Taken together, these results suggest that future research should extract the dynamic spatial overlap of PLDs in the DEMOS and collect the participants’ feeling of disconnect to clarify our inference.

There was more similarity between fear and anger than between anger and happiness in terms of objective and subjective movement. This result seems to be inconsistent with prior literature based on the analysis of kinematic data. For instance, happy and angry point-light walkers display both increased arm swing and faster walking, but fearful walkers show fast short strides and less arm movement (Halovic & Kroos, 2018a). A possible explanation may be that the scripts used in angry and fearful PLDs have more similar emotional intensity. All performances in the DEMOS are based on standardized daily events (Zhang et al., 2020). We might provide the actors scripts in which anger and fear have similar emotional intensity during the performance phase, resulting in similar body movements as well. Indeed, the present result also did not show significant difference in emotional intensity ratings between angry and fearful PLDs (mean 6.00 versus 5.98), but other multiple companions reached statistical significance. Cross-cultural research on the emotion recognition of biological motion is also worth exploring in the future.

Regarding the relationships among these three subjective ratings, we first observed significantly positive correlations under most conditions. The more subjective movement the PLD had, the more easily it was perceived with high emotional intensity and recognized. This result is in line with Atkinson et al. (2004), who reported that more exaggerated body movements enhanced recognition accuracy in addition to sadness and also induced higher emotional intensity for PLDs. The subjective movement in the present study to some extent reflects the speed, power, or force of the PLD conveyed (Mazzoni et al., 2017; Vaessen et al., 2019). Although we did not directly extract these physical indicators of PLDs, the objective movement index might tell part of the story, because subjective movement correlated with objective movement under all conditions in the present study. A higher quantity of movement perceived by people might bring participants more information that represents action intention and emotion. Some studies have also observed positive correlations between emotional intensity rating and recognition accuracy for body expressions in both frontal and side views (Banziger et al., 2012; Banziger & Scherer, 2010).

For sad and neutral PLDs, recognition accuracy did not correlate significantly with subjective movement. In Atkinson et al. (2004), for expressions of sadness, increased movement reduced recognition accuracy. Sad body expressions are often characterized by the head tilting down, head in the hands, and less or slower movement (Halovic & Kroos, 2018a; Michalak et al., 2009; Witkower & Tracy, 2019). The distribution of subjective movement in Fig. 5 also shows that more than 75% of sad PLDs have low movement (i.e., less than the midpoint of 5). This subjective experience may be related to the present finding. For neutral videos under all view conditions, emotional intensity was significantly negatively correlated with recognition accuracy. Neutral body movements in the DEMOS consist of drinking, opening a door, squatting and standing up, tapping on both sides of the thighs, and marking time (Zhang et al., 2020), and have the lowest objective and subjective movement. Thus, they were easily identified in the present study. Neutral actions could not convey any emotional information and were rated with the lowest emotional intensity. Therefore, for neutral PLDs, the negative relationship between emotional intensity and recognition accuracy is not surprising.

We also found significant correlations between subjective movement and objective movement ratings under all conditions, suggesting good reliability of subjective movement ratings. The quantity of movement within PLDs is one of the low-level parameters and features that can affect brain activity when investigating the neural basis of body movements, and has been controlled in some functional magnetic resonance imaging and transcranial magnetic stimulation studies (Cross et al., 2012; Huis in't Veld & de Gelder, 2015; Mazzoni et al., 2017; Pichon et al., 2008, 2009; Ross et al., 2019; Ross et al., 2020; Williams et al., 2020). Prior studies have also reported a high correlation between these two indicators (Ross et al., 2019; Ross et al., 2020). Taken together, although several previous studies have explored the relationship between these indicators, the present study not only considers these mainstream indicators, but also expands their relationships to different views.

Several limitations of the DEMOS should be noted. First, the index of objective movement in this work may reduce the complexity of body movements and limits our conclusion. Although this feature is essential to the early general processing of the human body and has been taken into account in some neuroimaging research (e.g., Pichon et al., 2008, 2009; Ross et al., 2019; Ross et al., 2020), there are many studies which examine the distinct contributions of velocity, acceleration, jerkiness, and force to social cognition from the dynamic human body (e.g., Bronner & Shippen, 2015; Castellano et al., 2007; Gross et al., 2012; Poyo Solanas et al., 2020b). Reducing all these specific aspects of movements to a single value of “objective movement” may limit the usefulness of this work. The users can extract the original kinematic data of our PLDs in the DEMOS (Zhang et al., 2020) or use other software (e.g., OpenPose; Cao et al., 2017) to quantify these measures. Second, we did not collect physiological data from the participants. Physiological responses (e.g., heart rate, galvanic skin response) directly reflect people’s physical arousal and can be used to predict the perception of emotional body expressions (e.g., Huis in't Veld et al., 2014a, 2014b). Therefore, physiological measures for the DEMOS are worth investigating.

The DEMOS can be applied in many fields. Researchers can use our materials to study the dynamic weights of different joints or parts in emotional body movements from different views and to explore the interaction with other forms of emotional material (e.g., voice, face). The clinical application of DEMOS will also facilitate the investigation of emotional and social dysfunction in psychiatric disorders (Okruszek, 2018).

Conclusions

The DEMOS that we constructed and validated consists of 2664 emotional PLDs, comprising three views, five basic emotions, and neutral. We also provide users with four indicators. The DEMOS can be downloaded free of charge from https://osf.io/83fst/. To our knowledge, this is the largest multi-view emotional biological motion set based on the whole body. Researchers can choose appropriate materials based on standardized indicators to design and conduct experiments in many fields, including affective computing, social cognition, and psychiatry.