Introduction

Summary

Much emotion perception research has focussed on emotion recognition from the face1,2,3,4,5,6,7,8. However, several studies have shown that emotional expressions for faces and bodies are not always aligned9,10,11, and that interindividual differences modulate emotion recognition for faces versus bodies2,10. Therefore, there have been calls for more research into emotion recognition competence for full-body movements, both in clinical and in non-clinical settings12,13,14,15,16,17,18. And refined tests that measure individual differences in emotion recognition ability objectively, and which do not rely on self-report, are of broad interest. The usefulness of such tests to measure emotion recognition ability hinges on suitable stimuli materials.

We here present a novel type of full-body stimuli for experimental psychology of emotion: expressive dance movements. We created a stimulus set comprising 150 6-s long, high-quality videos of one dancer performing sequences of full-body movements (30 sequences of choreographed Western contemporary and ballet dance). The dancer repeated each of these 30 sequences five times each, with one of five different emotional intentions at each repetition (joy, anger, fear, sadness, and one neutral state; 30 sequences x five emotions = 150 stimuli). A validation experiment with a normative sample of N = 90 participants showed that the intended emotional expression of the dancer was recognized above chance in 139 of these stimuli. The stimuli set is open access and includes normative emotion recognition rates and subjective value judgments (aesthetic and emotional intensity ratings) for each stimulus. As we outline at the end of “Background literature” section, one novelty of the stimuli set is that the stimuli can be used both for explicit emotion recognition tasks (e.g., for forced-choice emotion recognition paradigms), as well as for implicit emotion recognition tasks (e.g., a distractor rating task that implicitly measures the sensitivity of the individual to the different emotion categories of the stimuli).

Background literature

Emotion recognition accuracy is commonly assessed by means of perceptual tasks where participants are asked to decode or guess the emotional intention of other people on stimuli showing faces, bodies, situations, stories, music, etc. (e.g., the Multi-Factor Emotional Intelligence Scale (MEIS)19 or the Diagnostic Analysis of Nonverbal Accuracy (DANVA2)20). ‘Accurate’ emotion recognition on these tasks refers to an objective test. A normative sample of participants is asked to guess the emotion intended by a person acting as expressor in the stimuli (e.g., through facial or bodily expression of emotion). For example, if the intended emotion by the expressor is “anger” and “anger” is guessed above chance by participants, then “anger” is taken as the ‘correct’ response for this stimulus, or, in other words, this stimulus “works”. Stimuli, where the recognition rate of the intended emotion by the expressor is below chance in a normative sample should be discarded from a stimulus set, as this would be evidence that the stimulus does not “work”. Subsequently, if a participant in a new experiment does not guess a stimulus as “anger” that was (a) intended by the expressor to express anger, and (b) was recognized as such above chance by a normative sample, their answer is, in consequence, defined as ‘wrong’. A single person’s emotion recognition accuracy across all stimuli can now be compared against the emotion recognition accuracy of the normative sample.

For example, the Geneva Emotion Recognition Test Short (GERT-S)21 comprises 42 video stimuli showing the upper body and face of actors expressing 14 different emotions with their facial expression while saying a nonsensical sentence with different emotional intonations. Similarly, the Emotion Recognition Index (ERI) measures emotion recognition accuracy for four emotions in face and voice stimuli6. It is based on the picture stimuli set by Ekman and Friesen22, and on voice recordings from a published corpus23.

Full-body emotion recognition research has, so far, to a large extent, relied on video stimuli of ‘emotional actions’ (e.g., bending the head down in sadness, clenching a fist in anger, jumping for joy, recoiling in fear, etc.). Available full-body emotion stimuli likely measure the cognitive recognition of emotional actions, rather than the sensitivity to the kinematics of emotional intentions in full-body movements, as discussed in previous work24,25. Besides, emotions are not always expressed as specific full-fleshed emotional actions (e.g., bending the head down in sadness, clenching a fist in anger, jumping for joy, recoiling in fear, etc.). Especially in the first stages of the development of an emotion, these are rather implied within subtle kinematics of an individual’s movements; a person can wave angrily, happily, sadly, etc. And, the ability to detect these subtle kinematic differences in full-body movements could be argued to be genuine emotion recognition ability.

A new line of research, therefore, focusses on requiring participants to recognise emotions from stimuli showing individuals performing the same simple transitive movements—walking or throwing—across different emotional intentions (e.g., joy, sad, fearful, angry, and a neutral state)26,27,28,29,30. Expanding this approach, we propose that it is possible to generate phrases of more complex full-body movements or full-body gestures. Choreographed sequences of dance movements afford exactly this. Dance is, in its essence, a kind of human expressive body movement31. And, professional dancers are ideal models for the creation of dance stimuli materials in emotion science because they are trained to express different emotional intentions, with one and the same dance movement32,33,34. Subtle variations in how a dancer performs a dance movement with different emotional intentions conveys these intentions to observers35,36,37.

This phenomenon is comparable to language, where a single sentence can be pronounced with different emotional qualities (intonation) (e.g., angry or happy), depending on how the expressor modulates their voice with their breathing and the muscles of their vocal tract. For instance, stimuli for the Multimodal Emotion Recognition Test (MERT)38, and the Test for Emotion Recognition from the Singing Voice39 were created with actors and singers that either spoke or sang a pseudo-linguistic sentence (“Ne-Kalibam-Sut-Molaine”) at several repetitions with different emotional intentions. Computational analyses of the physical speech contours of these utterings revealed that these voice stimuli vary according to specific physical parameters of the sound. These parameters are picked up by human listeners and the intended emotions accurately decoded40,41,42.

The Warburg Dance Movement Library (WADAMO Library)32 was the first movement stimulus library that was created following this rationale from the research on the perception of emotional speech, but with dance movements. It contains 245 6-s-long video clips of Western ballet and contemporary dance of two different expressive categories. Four dancers were instructed to perform several short dance choreographies of eight counts twice, once with, and once without emotional expressivity. Across several experiments, participants without dance experience accurately identified the dancers’ intended emotional expressivity (expressive versus neutral state, i.e., no expressivity)32,43. The McNorm dance movement library44 was the first library to contain five different emotional expressions for each dance movement sequence. One dancer performed Western contemporary dance movement sequences five times, with a different emotional expressive intention at each repetition (joy, sad, angry, fearful, and a neutral state). The neutral category consisted of the same movements, technically correct, but without any emotional expressivity. This latter neutral category is comparable to the “inexpressive” category of the WADAMO library, and to the “neutral” emotion stimuli category of all stimulus corpora since Ekman and Friesen22,45 (e.g., Atkinson et al.46). The McNorm library contained 73 video stimuli of varying lengths (6.6–42.8 s) and stimuli were rendered as point lights to maximally reduce visual information about the dancer. Average emotion recognition by participants was 48.96%.

Importantly, in addition to serving in an explicit emotion recognition task, the WADAMO library was also used to assess individuals’ sensitivity to the emotional expressiveness implicitly. Namely, different groups of participants were asked to make simple aesthetic judgments about the video clips (i.e., liking and beauty judgments). Participants systematically liked videos intended to be expressive more and found them more beautiful. Orlandi and colleagues (2020) used a similar approach, contrasting observers’ aesthetic judgment to emotionally expressive and inexpressive dance movement sequences47. Also here, participants rated the videos intended to be expressive as more beautiful than the inexpressive versions of the same sequences. The results of these experiments32,43,47 form the basis for the idea that dance movement stimuli could be used to assess emotion recognition accuracy implicitly. If observers—who are unaware of the intended emotional expressivity (i.e., they have not been told about the different intentions, like in an explicit emotion recognition task)—systematically provide higher aesthetic judgments (e.g., beauty or liking ratings) for expressive than for inexpressive versions of the same sequence, then the aesthetic judgment is an implicit measure of the person’s sensitivity to the intended expressivity in the movement.

Objectives

The objectives of this project were, first, to create a new stimulus set with a high level of experimental control. Dance movement sequences and visual characteristics of the stimuli were controlled, and stimuli length was equalized as much as possible to 6 s. Second, we set out to provide normative values of emotion recognition and aesthetic judgment for all created stimuli. Third, we identified the stimuli with highest emotion recognition rates and that were recognized above chance to provide a stimuli table with all values for future stimuli selection. Fourth, we explored interindividual differences in emotion recognition and beauty ratings (personality traits and aesthetic responsiveness).

The present study

We designed and created a new dance movement stimuli set based on the groundwork from previous stimulus creation procedures of dance stimuli sets32,37,48,49,50,51,52,53, which ensured requirements for experimental control31,54. During the subsequent norming experiment, 90 participants watched the stimuli (video clips of 6 s length), performed a forced-choice emotion recognition task and provided ratings for how beautiful and how intense they thought the stimuli were. A short video about the stimuli creation is available here: https://www.youtube.com/watch?v=Eij40jtw8WE.

See Fig. 1 for an illustration of the stimuli creation procedure and the norming experiment.

Figure 1
figure 1

Stimuli Creation Procedure. The stimuli creation procedure was based on previous work32,37,48,49,50,51,52,53, and respected requirements of experimental control for dance stimulus materials31,54. Choreography of the 30 sequences (of Western contemporary and ballet dance) took place prior to the recording session and was led entirely by the dancer in conversation with two of the authors with professional dance experience (JFC and LSE). Filming of the dance sequences took place at the Max-Planck-Institute for Empirical Aesthetics in Frankfurt/M. For filming, a Canon EOS 5D Mark IV camera was used, with a Canon EF 24–105 mm f/4 L IS USM lens (settings: e.g., framerate (raw) at 50fps and framerate (output) at 25 fps. White balance: 5000 k, shutter speed: 1/100 s, and ISO: 400. The video format: H.264, aspect ratio: 16:9, and resolution: 1920 × 1080). A standard 6 × 3 m chroma-key greenscreen background was used to allow for the creation of additional visual preparations of the stimuli, such as silhouette videos and blurred faces. For this, dedo-stage lights (7 dedo heads, dimmers and stands kit) were required to illuminate the entire greenscreen and to minimise shadows. Postproduction was done using Adobe After Effects 2019 and Adobe Premiere Pro 2019. All footage was trimmed to the exact start and end points of the movements. Each clip was rendered into a separate file in an uncompressed format and the title was added, as specified verbally by the dancer during the recording. Before saving, the sound tracks (speech and ambience noise) of the clips were removed. Using Adobe After Effects, “Keylight” effect was added to all files, and the background removed from each clip. The “Level” effect (setting: output black = 255) was further applied to each clip to colour the extracted foregrounds white (the visible dancer silhouette). “Opacity” keyframes were then added to the beginning and the end of each clip to allow for a fade-in and fade-out of each clip (8 frames). Finally, each clip was rendered as a separate file in H264 format. The dancer was Ms Anne Jung and her informed consent for publication of identifying information, images and film in an online open-access publication were obtained. A short video of the creation process is available here: https://www.youtube.com/watch?v=Eij40jtw8WE.

Results

Emotion recognition was calculated for all 150 stimuli as an objective test. “Correct” emotion recognition was set to be when the participant had selected the emotion that the dancer intended while dancing (see also “Background literature” section). Emotion recognition accuracy for each emotion was obtained for each participant. All data and code are available on the OSF: https://osf.io/uecg9/?view_only=e5a5661b89104701aca750101325d30f.

Preliminary data analyses

During stimuli creation, some sequences were performed more than once. These were cases, where the dancer was unsatisfied with her performance and asked to repeat the sequence. Therefore, the number of total stimuli was 173 stimuli (including 23 duplicates that were deleted once emotion recognition rates were obtained). The 173 stimuli were divided into three sets for three separate online experiments. Fifteen videos of the stimuli set were randomly selected and included in all three separate online norming experiments. To confirm that emotion recognition rates between the three sets of stimuli were equivalent, we performed comparative analyses. These showed equivalent emotion recognition rates and aesthetic judgment; hence, data from the three experiments was aggregated and duplicates were removed, based on the highest emotion recognition rate. These are set out in the supplementary materials (section 1).

Emotion recognition accuracy

Data were non-normally distributed and non-parametric tests were performed.

A one-sample Wilcoxon signed rank test was used to determine whether emotion recognition accuracy was above chance. On average, participants recognised the emotion intended by the dancer in 46.8% of trials (± 19.04), significantly above the chance level of 20% (100/5 emotions = 20%), across all emotions (V = 97,137, p < .001, h = .579). The same was true for each emotion separately (i.e., participants recognised above chance level when the dancer expressed joy (V = 3891, p < .001, h = .565), anger (V = 3969, p < .001, h = .615), fear (V = 3662.5, p < .001, h = .397), sadness (V = 4080, p < .001, h = .62), and neutral state (V = 3950, p < .001, h = .698). See Table 1 and Fig. 2.

Table 1 Summary emotion recognition accuracy for each emotion.
Figure 2
figure 2

Emotion recognition accuracy across intended emotions. Mean and variability of emotion recognition accuracies for each emotion, based on participant-specific emotion recognition rates p values are Bonferroni-corrected. Dotted line illustrates chance level (100%/5 emotions = 20%)—all emotion categories were recognized above chance level on average. Stimuli expressing fear were recognized significantly less well than all other emotional categories, but still above chance.

Subsequently, a Friedman’s ANOVA was used to determine whether emotion recognition accuracy differed between stimuli of different categories of intended emotions (joy, anger, fear, sadness, neutral state) (χ2(4) = 31.61, p < .001). Wilcoxon signed rank tests with Bonferroni correction (significance level at .005) were used to follow up the significant main effect. Emotion recognition accuracy for stimuli expressing fear was significantly lower than for all other emotional categories (joy (V = 813.5, p = .007, h =  − .168), anger (V = 687.5, p < .001, h =  − .218), sadness (V = 649.5, p < .001, h =  − .223) and neutral state (V = 722.5, p < .001, h =  − .301)). There were no significant differences between stimuli of any of the other intended emotion categories (all ps > .391). See Fig. 2.

Besides, we explored emotion recognition accuracy across the different categories of intended emotions in terms of correct and mis-classifications. The highest confusions between emotions were: Stimuli intended to express joy were most often misclassified as neutral state, i.e., in 23.1% of trials (correct classifications: 49.2%), anger as joy in 24.2% of trials (correct classifications: 50.5%), fear as sadness in 25.5% of trials (correct classifications: 39.5%), sadness as neutral state in 23.8% of trials (correct classifications: 47.5%), and neutral state as sad in 20.4% of trials (correct classifications: 52.61%). See Table 2 for a confusion matrix.

Table 2 Confusion matrix for emotion recognition accuracy across intended emotions.

Intensity ratings

A Friedman’s ANOVA showed a main effect of Intended Emotion on participants’ Intensity ratings (χ2(4) = 48.49, p < .001), suggesting differences between categories of intended emotion. Follow-up Wilcoxon signed rank tests with Bonferroni correction (significance level at .005) revealed that neutral state stimuli were rated as less intense than all other stimuli (joy (V = 812, p < .001, d = .383), anger (V = 562.5, p < .001, d = .494), fear (V = 1209, p = .007, d = .205), sadness (V = 1092, p = .001, d = .292). Besides, stimuli intended to express anger were rated as significantly higher in intensity than stimuli intended to express fear (V = 3207.5, p < .001, d = .294), and sadness (V = 2749.5, p = .048, d = .204). No other comparisons were significant (all ps > .142). See Fig. 3.

Figure 3
figure 3

Average intensity ratings for 5 emotion categories. Mean and variability of Intensity ratings for stimuli of each emotion category as intended by the dancer and the total. p values are Bonferroni-corrected.

Beauty ratings

A Friedman’s ANOVA showed a main effect of Intended Emotion on participants’ Beauty ratings (χ2(4) = 39.68, p < .001), suggesting that participants experienced the movements intended to express some emotions more beautiful than others. Follow-up Wilcoxon signed rank tests with Bonferroni correction (significance level at .005) revealed that stimuli intended to express joy were rated more beautiful than anger (V = 2901.5, p < .001, d = .208), fearful (V = 3260, p < .001, d = .241), and neutral state (V = 3071, p < .001, d = .241) stimuli. In addition, stimuli expressing sadness have higher Beauty ratings than fearful (V = 3088.5, p < .001, d = .241) and neutral state stimuli (V = 3152.5, p < .001, d = .242). See Fig. 4.

Figure 4
figure 4

Average beauty ratings for 5 emotion categories. Mean and variability of Beauty ratings of dance movements, shown for all emotions as intended by the dancer. p values are Bonferroni-corrected.

Subjective emotion recognition

To explore how intensity and beauty ratings were distributed when using participants’ subjective emotion judgment (i.e., participants’ subjective perception of emotion, regardless of intended emotional expression by the dancer), the above analyses were repeated with the subjective emotion perception as grouping variable. No large differences between the two types of classifications were observed, as subjective perception and intended expression mostly overlapped. See supplementary materials section 2, for those analyses.

Interindividual differences in emotion recognition and aesthetic judgement

We next explored how interindividual differences modulated emotion recognition, intensity ratings and aesthetic judgment. Only the personality trait conscientiousness predicted emotion recognition accuracy (conscientious individuals scored higher on the emotion recognition task). Intensity and beauty ratings were positively predicted by our overall engagement variable (“how interesting did you find this task?” 0 = not at all; 100 = very much; see “Methods” section). Beauty ratings were additionally predicted negatively by the personality trait negative emotionality. These regression analyses are set out in the supplementary materials (section 3). Regarding our variable dance experience, our sample had not been specifically recruited with this variable in mind. But because important previous research with dance professionals has shown links between dance experience and other neurocognitive processes35,55,56,57,58,59, dance experience data was collected as a means of experimental control. Participants’ average dance experience was very low (1.6 years; SD = 4.55), with many participants having none at all (81.1%, range = 0–30). As could be expected, this variable showed no effects neither on emotion recognition, nor on beauty or intensity ratings (see supplementary materials, section 3).

Technical test

As a ‘technical test’ of the stimuli, we proceeded to inspect the emotion recognition rate for each stimulus. Of the 150 final stimuli, 139 had been recognized above chance level of 20%. We propose that any stimulus that was not recognized at least at 20% should not be used in subsequent experiments.

For stimulus selection in subsequent experiments, to leave sequences intact (i.e., where all five stimuli of a sequence have been recognized above chance level), we provide a table with all information about each sequence and each of the stimuli composing a sequence. We propose that only sequences where the intended emotional expression of all five stimuli have been recognized above chance level should be included in an experiment. A total of 22 sequences include stimuli that where all recognized above chance level, i.e., a total of 110 stimuli.

Table 3 shows the N = 150 stimuli of the stimuli set with their average Emotion Recognition Accuracy, Intensity Rating and Beauty Rating. Emotion Recognition Accuracies of stimuli were tested against chance level of 20% (100/5 = 20) by Boolean testing “Average Emotion Recognition Accuracy > 20?”. Krippendorff’s alpha was computed for each sequence to assess interrater reliability. See Table 3 for this data.

Table 3 Emotion recognition accuracies per stimulus and per sequence.

Discussion

We created an emotional dance movement stimuli-set for emotion psychology and related disciplines. It contains 30 dance sequences performed five times each, with five different intended emotional expressivities at each repetition (joy, anger, fear, sadness, and a neutral state), i.e., a total of 150 stimuli. Emotion recognition for all five emotion categories as intended by the dancer were recognized above chance level (chance: 20%; joy: 45%, anger: 48%, fear: 37%, sadness: 50%, neutral state: 51%). Fear had significantly lower emotion recognition rates than the rest of the emotion categories, but was still above chance. This finding is in accordance with previous literature in which the difficulty to recognize fear from full-body movements has been reported 44. One-hundred-thirty-nine of the 150 stimuli were recognized above chance level. Respecting sequence membership, data showed that all five stimuli of a total of 22 sequences were recognized above chance level. This means that for leaving sequence-membership intact, a set of 110 stimuli (22 sequences × 5 emotions) can be used from this emotional dance movement stimuli set.

Importantly, as a manipulation check, the neutral state stimuli (neutral expressivity), were rated as less intense than all other emotions, confirming that these neutral state stimuli were less emotionally expressive in intensity, as had been intended by the dancer. Thus, this category can be used as a control condition. We found no difference between anger and joy in terms of intensity, as has been reported before. Anger was rated as more intense than the stimuli intended to express sadness and fear, and joy was rated as more intense than neutral (joy = anger; joy > neutral; anger > fear/sadness > neutral).

Regarding our conjecture about implicit emotion recognition via aesthetic judgment, we found that participants’ aesthetic judgment (beauty ratings) was indeed sensitive to the intended emotion by the dancer. Stimuli expressive of joy and sadness received the highest beauty ratings, fear and neutral expressivity received the lowest (joy > anger > fear > neutral, and sad > fear > neutral). Interestingly, the high arousal emotions anger and joy were rated as equally intense, but participants’ beauty ratings differed between the two emotions, with joyful movements being rated as more beautiful, than angry movements. On the other hand, low-intensity stimuli expressing sadness were rated as more beautiful, than other low-intensity stimuli including neutral state and fearful stimuli. These results suggest that aesthetic judgment could indeed be conceptualized as a type of implicit emotion recognition task.

Interindividual difference measures of personality and aesthetic responsiveness did not significantly predict emotion recognition accuracy, except for conscientiousness that predicted higher emotion recognition accuracy. Our engagement measure ‘interest in task’ predicted intensity ratings and beauty judgments, while beauty judgments were also negatively predicted by the personality trait negative emotionality.

Overall discussion and conclusion

It has long been argued that accurate emotion recognition from conspecifics confers an evolutionarily adaptive advantage to the individual22,45,60,61, yet results remain mixed62,63. Importantly, while there are different channels of emotional expressivity (face, voice, and the body), few validated full-body stimuli sets are available to test for emotion recognition effects and their possible links to broader cognitive function. This is an important pitfall, especially, as some research suggests that a high recognition accuracy, specifically, for bodily expressions of emotion (as opposed to facial expressions of emotions) could be associated with negative psychosocial outcomes2,10.

Therefore, we here propose dance movements as stimuli for emotion science, to answer a range of questions about human full-body emotion perception13,14,64,65,66,67,68. Echoing this, we created and validated a set of 150 full-body dance movement stimuli for research in emotion psychology, affective neuroscience and empirical aesthetics. We provide emotion recognition rates, intensity ratings and aesthetic judgment values for each stimulus, and have demonstrated emotion recognition rates above chance for 139 of the 150 stimuli. We also provide first data to suggest that aesthetic judgment to this carefully controlled stimuli-set could serve as a useful implicit emotion recognition task.

Methods

Ethical approval for the experiment was in place through the Umbrella Ethics approved by the Ethics Council of the Max Planck Society (Nr. 2017_12). Informed consent was obtained from all participants and/or their legal guardians. The informed consent was given online through a tick-box system. All methods were performed in accordance with the relevant guidelines and regulations.

Participants: the dancer

One professional dancer from the Dresden Frankfurt Dance Company, Germany, collaborated and was remunerated as model for all stimuli. The dancer was a professional dancer trained in classical ballet technique, but working in a professional dance company where Western contemporary dance was the main mode of expression.

Participants: the observers

Participant characteristics of the 90 participants are set out in Table 4.

Table 4 Sociodemographic characteristics of participants.

The sample size was determined as follows. The final stimuli number (n = 173 including duplicates; see “Stimuli” section) would have been too many stimuli to rate for participants in one experiment. Therefore, stimuli were divided into 3 sets. Each set was rated by a different group of participants, and we planned to compare these three groups in terms of their ratings to 15 shared stimuli to evaluate interrater reliability. Sample size was determined separately for these groups, using G*Power 3.169. Choosing the threshold of a large effect size of d = .8070, our sample size calculation for independent samples t-test (effect size = .80; alpha = .05; power = .90) suggested a sample size of 28 per group. We tested 30 participants per group to ensure full randomization (30 is divisible by 5 emotions, 28 is not).

Materials

Stimuli

We used N = 173 video clips of 6 s length of a white silhouette dancer on black background. Stimuli contained no facial information, no costume, nor music. Each clip was faded in and out.

A dancer choreographed 30 sequences of dance movements. Of the 30 sequences, five were Western classical ballet, the rest were Western contemporary dance. The length was 8 counts in dance theory, ~ 8 s. The dancer performed each sequence five times each with different emotional expressivity at each repetition; joy, fear, anger, sadness and neutral state. A total of 173 stimuli were recorded instead of 150 (30 sequences × 5 emotions = 150 stimuli): When the dancer wasn’t satisfied with her performance of a sequence, she asked to repeat it. Therefore, some of the stimuli were repeated. All 173 stimuli were included in the experiment to be able to select the “best” stimuli based on emotion recognition data. The 23 additional takes were deleted before analysis, by selecting the stimulus with the highest emotion recognition rate among duplicates. See Fig. 1 for an illustration of the stimuli creation process and a sample stimulus.

Questionnaires

Participants provided demographic information and interindividual difference measures were collected. First, the personality measure Big Five Inventory Short version (BFI-S)71,72 that contains five subscales, namely Agreeableness, Conscientiousness, Extraversion, Negative Emotionality and Open-mindedness. Second, the Aesthetic Responsivity Assessment (AReA)73 that screens for sensitivity and engagement with the arts. It contains 14 items (answers were given on a 5-point Likert scale between 0 (never) and 4 (very often)) that split into three first-order factors: Aesthetic Appreciation (AA; how much an individual appreciates different types of art, like poetry, paintings, music, dance), Intense Aesthetic Experience (IAE; an individual’s propensity to experience a subset of more intense aesthetic experiences like being moved, awe or the sublime), and Creative Behaviour (CB; an individual’s propensity to actively engage in creative processes like writing, painting, music making or dancing).

Participants had an average of 1.6 years (SD = 4.55) of dance experience, with many participants having no dance experience at all (81%, range 0 – 30).

Attention and engagement checks

A series of attention checks controlled for engagement: On two trials of the questionnaires, participants were asked “please press the central circle” and non-compliance lead to exclusion. On two of the emotion recognition trials, cartoon videos were shown with very obvious emotional expressions (Sponge Bob crying a river of tears; correct response: sad; and Mikey Mouse’s head turning red and exploding; correct response: angry). Participants who rated these incorrectly were excluded. Finally, a question was added after the emotion and aesthetics rating tasks, “Did the videos play alright? (0 = not at all; 5 = yes, all good). Participants who rated 3 or less were excluded.

A final question in the experiment asked participants how interesting they found the task they had just participated in. This is because previous research suggests that the personal interest in the task modulates task engagement and quality of responses32,43,74. We included this variable in the regression models.

Procedure

See Fig. 1 for the stimuli creation procedure.

To obtain normative values, the N = 173 video clips were divided into three sets and presented to three separate groups of 30 participants. Three randomly chosen sequences (= 15 stimuli) were included in all three sets for interrater reliability assessments between the three groups. Including the three ‘shared’ sequences, the resulting three stimuli sets were as follows: Set 1 included only the ballet sequences (seven sequences) and consisted of 39 stimuli (including 4 additional takes). Set 2 included contemporary dance sequences (15 sequences) and consisted of 84 stimuli (including nine additional takes), and Set 3 included contemporary dance sequences (14 sequences) and consisted of 80 stimuli (including 10 additional takes).

The experiment was set up on Limesurvey®, where participants were also asked to read an information sheet and sign the consent form. Participants signed up for the rating experiment online via the Prolific© platform. The experiment began with the demographics questionnaire, followed by the emotion recognition task including beauty and intensity ratings, followed by the remaining questionnaires.

On each trial, participants were shown one dance video stimulus (randomized presentation), and then a forced-choice paradigm was used where participants were asked to select one emotion the dancer was intending to express (joy, anger, fear, sadness or neutral state). It was not possible to repeat the video after it had played one time. Two slider questions from 0 (not at all) to 100 (very much) probed for perceived intensity of the emotional expression and beauty of the movement (i.e., “How intensely was the emotion expressed?”/“How beautiful did you find the movement?”). “Intensity” was added as a proxy measure of “power” commonly used in emotion research. However, research participants find it difficult to rate “power” and we opted for “intensity” instead.

For a qualitative assessment, we added an open question, where participants were invited to indicate any other emotions that they perceived in the movement, by writing the emotion in a box (this data is not analysed in this manuscript). Participants were debriefed about the objectives of the experiment at the end.