Communication for coordination: gesture kinematics and conventionality affect synchronization success in piano duos

Bishop, Laura; Goebl, Werner

doi:10.1007/s00426-017-0893-3

Communication for coordination: gesture kinematics and conventionality affect synchronization success in piano duos

Original Article
Open access
Published: 21 July 2017

Volume 82, pages 1177–1194, (2018)
Cite this article

Download PDF

You have full access to this open access article

Psychological Research Aims and scope Submit manuscript

Communication for coordination: gesture kinematics and conventionality affect synchronization success in piano duos

Download PDF

3677 Accesses
3 Altmetric
Explore all metrics

Abstract

Ensemble musicians often exchange visual cues in the form of body gestures (e.g., rhythmic head nods) to help coordinate piece entrances. These cues must communicate beats clearly, especially if the piece requires interperformer synchronization of the first chord. This study aimed to (1) replicate prior findings suggesting that points of peak acceleration in head gestures communicate beat position and (2) identify the kinematic features of head gestures that encourage successful synchronization. It was expected that increased precision of the alignment between leaders’ head gestures and first note onsets, increased gesture smoothness, magnitude, and prototypicality, and increased leader ensemble/conducting experience would improve gesture synchronizability. Audio/MIDI and motion capture recordings were made of piano duos performing short musical passages under assigned leader/follower conditions. The leader of each trial listened to a particular tempo over headphones, then cued their partner in at the given tempo, without speaking. A subset of motion capture recordings were then presented as point-light videos with corresponding audio to a sample of musicians who tapped in synchrony with the beat. Musicians were found to align their first taps with the period of deceleration following acceleration peaks in leaders’ head gestures, suggesting that acceleration patterns communicate beat position. Musicians’ synchronization with leaders’ first onsets improved as cueing gesture smoothness and magnitude increased and prototypicality decreased. Synchronization was also more successful with more experienced leaders’ gestures. These results might be applied to interactive systems using gesture recognition or reproduction for music-making tasks (e.g., intelligent accompaniment systems).

Distinguishing between straight and curved sounds: Auditory shape in pitch, loudness, and tempo gestures

Article Open access 18 September 2023

Synchronization to metrical levels in music depends on low-frequency spectral components and tempo

Article 15 July 2017

Understanding Coarticulation in Musical Experience

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Interpersonal communication is critical for joint action tasks like playing piano duets, playing team sports, or dancing, which require collaborators to align their intentions and coordinate their actions in time. Communication during these tasks is continuous and interactive, with collaborators constantly adapting their own intentions and actions in response to the signals they receive from each other (Schiavio & Høffding, 2015). The signals exchanged are typically multimodal (e.g., auditory and visual) and multilayered (e.g., involving facial expressions and body movements simultaneously), and can be subtle, comprising only a raised eyebrow or a brief moment of eye contact (Davidson, 2012). Given these complexities, it is often difficult to determine from a research standpoint exactly what is being communicated and how group members are assimilating incoming information.

Many researchers have used group music-making paradigms to investigate the communication processes underlying interpersonal coordination. Group music-making presents an intriguing context in which to study communication and coordination, since precise coordination must be achieved under inherently ambiguous temporal conditions (even for notated music, timing is only loosely defined by the score). Moreover, the possible means of communication are constrained by the task of performing an instrument, which can limit freedom of movement for much of the body, as well as conventions of public performance, which may prohibit, for example, counting out loud or using a metronome. In recent years, researchers have been applying their knowledge of the communication processes involved in group music-making to computer systems that replicate or react to performer movements (Dahl, 2014; Hoffman & Weinberg, 2011). Such an application requires a detailed understanding of gesture kinematics and how they relate to performers’ intentions.

During skilled ensemble performance, most communication through audio and visual channels is nonverbal. Usually, perception of jointly-produced sound gives sufficient information for performers to coordinate, but visual communication can be important too (Bishop & Goebl, 2015). Visual communication is only rarely a matter of one performer giving directions to another; rather, even if there is a designated leader, collaborating musicians’ body movements interrelate (Chang, Livingstone, Bosnyak, & Trainor, 2017; Moran, Hadley, Bader, & Keller, 2015) and can be mutually influential (Badino, D’Ausilio, Glowinski, Camurri, & Fadiga, 2014). Research has shown that musicians move more predictably when performing with others than when performing alone (Glowinski et al., 2013), a finding that parallels observations made elsewhere in the joint action literature (Hart, Noy, Feniger-Schaal, Mayo, & Alon, 2014; Vesper, van der Wel, Knoblich, & Sebanz, 2011). Visual communication is particularly important in more ambiguous temporal contexts (e.g., at abrupt tempo changes or following long pauses), when co-performers’ interpretations are less certain to align (Bishop & Goebl, 2015; Kawase, 2013).

The current study investigates the gestures that ensemble musicians use to coordinate piece entrances. Typically, at piece entrances, in the absence of a conductor, one musician in an ensemble acts as the leader and gives the others an audio-visual signal to begin. This visual signal should indicate the timing of the first beat as well as the starting tempo for the piece. Ensemble musicians coordinate piece entrances with varying degrees of success. While professionals typically synchronize their first notes with near-perfect precision (at least in concert), students may require several attempts to begin. Synchronization success can vary depending on a range of factors, including musicians’ expertise and familiarity with each other’s playing style, the genre and tempo of the music, the number and combination of instruments, and the presence or absence of a conductor. The aim of our study was to identify factors that contribute to successful coordination at piece onset during piano duo performance. Specifically, we examined how cue gesture kinematics relate to note synchronization.

Kinematics of effective cueing gestures

Successful coordination is partly dependent on the quality of the visual signal given by the leader—particularly at piece onset, where there is no prior audio to indicate when the first notes should fall or at what tempo. Musicians commonly use head gestures to signal piece onsets, regardless of their instrument; head gestures were therefore our focus here, though we acknowledge that much of the upper body, as well as facial expressions and breathing, can be involved. The current study investigated how head movement kinematics communicate beats, and tested four kinematic properties of head gestures that we predicted could help observers detect communicated beats more successfully. This section of the paper develops these hypotheses.

For both conductors and instrumentalists, the kinematics rather than trajectories of cueing gestures have been shown to indicate the position of the beat, or tactus (Luck & Toiviainen, 2006). “Followers” attempting to synchronize with instrumentalists’ cueing gestures tend to perceive beats as aligning with major peaks in the leader’s head acceleration, rather than with points of direction change in head trajectories (Bishop & Goebl, 2017). This was observed among pianists and violinists in a study of synchronization in duo performance. Performers’ head movements were measured using accelerometers and Kinect sensors as they took turns cueing each other in at the starts of short passages. An aim of the current study was to replicate these findings (H1) using an expanded version of the same procedure and a more precise motion measurement system.

The easiest gestures to synchronize with are presumably those that convey beat position clearly and reliably. If followers aim to align their starting notes with peaks in leaders’ head acceleration, then leader/follower synchronization should be most successful when the leader’s starting notes align precisely with his or her own head acceleration peaks. The current study tested this hypothesis (H2) by calculating the offset of leaders’ first note onsets from major peaks in leaders’ head acceleration curves, and relating the magnitude of these offsets to success in note synchronization.

The clarity of a gesture, and how readily others synchronize with it, might also be influenced by its articulation. Wöllner, Parkinson, Deconinck, Hove and keller, (2012) found that observers synchronized finger-taps more successfully with quantitatively averaged conductor gestures, which were low in jerk, than with individual conductor gestures, which were higher in jerk. Jerk, the third derivative of position, indicates the smoothness of acceleration changes. The authors also observed more successful synchronization with marcato gestures, where the differences between acceleration maxima and minima were large, than with legato gestures, where the differences between acceleration maxima and minima were small. Here, we hypothesized that gestures high in smoothness (low in jerk) would be clearer and synchronized with more successfully than gestures with high jerk (H3).

We also tested the possibility that musicians synchronize more successfully with gestures marked by a larger-magnitude indication of the beat than with gestures marked by a smaller-magnitude indication of the beat (H4). Gesture magnitude was quantified in terms of how far the head travelled along the forwards–backwards axis while indicating the beat. Instrumentalists sometimes exaggerate their gestures at piece entrances and other places where exchanging visual cues benefits synchronization, and a test of how gesture magnitude affects synchronization should indicate whether this is an effective strategy.

Observers’ success at synchronizing with instrumentalists’ cueing gestures might also relate to the prototypicality of those gestures (Wöllner, Parkinson, Deconinck, Hove, & keller, 2012). People synchronize most successfully with gestures that are similar to those they produce themselves (Keller, Knoblich, & Repp, 2007; Wöllner & Cañal-Bruland, 2010). This effect is generally attributed to the strengthening of action prediction mechanisms that occurs with experience. According to this theoretical perspective, people use their own action planning systems to simulate observed movements—a process that may or may not yield overt motor output (Calvo-Merino, Grezes, Glaser, Passingham, & Harrad, 2006; van der Wel, Sebanz, & Knoblich, 2013). They then predict the course of observed gestures using the same mechanisms that they use to predict the course of their own gestures. When the observed or performed gesture is similar to gestures a person has performed in the past, action-perception links are stronger and prediction is facilitated. In the current study, we expected that highly prototypical cueing gestures would be more likely than highly idiosyncratic cues to align with followers’ own gesture tendencies, and would therefore be easier to predict and synchronize with (H5).

Gesture mimicry to facilitate synchronization

Theories of embodied music cognition posit that we use our own bodies to interpret the musical gestures produced by others (Leman, 2012). In other words, we understand others’ motor intentions by overtly or internally mirroring aspects of their actions (Jacob & Jeannerod, 2005; Jeannerod, 2003). Our ability to internally simulate others’ gestures is thus central to the concept of embodiment. Simulation mechanisms facilitate the translation of gestures into sound and the translation of sound into expressive gestures (Leman & Maes, 2014).

As stated above, observed actions can be simulated without overt replication, though in some cases the process clearly shapes motor output. For example, an imitation bias is observed among people who are asked to make speeded movements that are either congruent or incongruent (e.g., in terms of magnitude or direction) to irrelevant movements viewed simultaneously on a computer screen (Grosjean, Zwickel, & Prinz, 2009; Schubö, Aschersleben, & Prinz, 2001). Incongruent movements are performed less accurately than congruent movements, indicating an unintentional assimilation of observed motion parameters into the observer’s own performed actions.

At times, people overtly mimic each other’s behaviour. This mimicry has cognitive benefits: when people perform gestures that are high in similarity and coordinated in time, their attention is drawn towards each other and their perception and memory for each other’s behaviour is facilitated (Macrae, Duffy, Miles, & Lawrence, 2008). Furthermore, moving in rhythmic coordination with others can promote social bonding, improving participants’ ratings of partner trust and likeability (Hove & Risen, 2009) and increasing the likelihood of prosocial behaviour (Wiltermuth & Heath, 2009).

The current study included a test of whether coordination of body gestures occurs between duo performers at piece onset. Previously, some correlation in head and upper body sway has been observed within pairs of duo pianists. Goebl and Palmer (2009) found that duo pianists’ head movements were more synchronized when they performed under reduced auditory feedback conditions (unable to hear themselves or unable to hear their partner) than when they performed with normal two-way auditory feedback. Despite the heightened synchrony of head movements, however, note synchronization under reduced auditory feedback conditions was poor. Keller and Appel (2010) tracked the upper body movements of piano duos and found that the further the body movements of the primo performer (who usually plays the higher-pitched part) lagged behind those of the secondo (who usually plays the lower-pitched part), the greater note asynchrony became. Since primo note onsets consistently led secondo note onsets, the authors suggested that congruence between leader/follower relations at the levels of note onsets and body sway may be important for successful ensemble coordination.

Still unclear is whether leader/follower coordination of cueing gestures occurs at piece onset, and to what extent this coordination of body movement relates to note synchronization. In line with theories of embodiment, we hypothesized that coordinating cueing gestures would help performers gauge each other’s intended timing and, therefore, facilitate note synchronization (H6). Followers’ gestures were expected to mimic the form of leaders’ gestures and follow a parallel timecourse.

Current study

This study aimed to assess how kinematic measures affect the synchronizability of ensemble musicians’ cueing-in gestures. Motion capture recordings were made of nine piano duos performing short musical passages under alternating leader and follower conditions. The assigned leader for each trial was responsible for cueing in the follower at a particular tempo, with the aim of synchronizing their performance of the passage as precisely as possible. During a subsequent gesture-following task, a subset of the motion capture recordings of leader performances were presented (with audio) to an independent sample of 10 skilled musicians, who tapped in time with the leaders’ performed beats. Using data from this test, a measure of “synchronizability” (i.e., average leader–follower first beat asynchrony) was obtained for each leader gesture.

The alignment between followers’ first taps (for gesture-following task participants) or performed beats (for interactive duo performance task participants) and extremes in leaders’ head position, velocity, and acceleration curves was examined. Our focus was exclusively on leader–follower synchronization at piece onset, though a similar investigation of how gesture kinematics affect synchronization across the first few beats of a piece could also be made. It was expected that followers’ first taps would align with acceleration peaks in leaders’ gestures (H1), confirming previous findings (Bishop & Goebl, 2017). It was also expected that the precision of alignment between leaders’ first note onsets and their own head acceleration peaks (H2), as well as increased gesture smoothness (H3), magnitude (H4), and prototypicality (H5) would improve the synchronizability of leaders’ gestures. Finally, the hypothesis that increased similarity in the movements made by leader–follower pairs at the time of piece onset relates to improved leader–follower note synchronization was tested (H6), using data from the recording sessions.

Methods

Interactive duo performance experiment

Our first experiment investigated pianists’ synchronization with their duo partners’ cueing-in gestures under interactive conditions, while two-way communication was possible. The hypothesis that followers synchronize their piece onsets with peaks in leaders’ head acceleration was assessed. We also tested for coordination in duos’ body movements around piece onsets.

Participants

Eighteen pianists (10 female) recruited from among the students at the University of Music and Performing Arts Vienna completed the experiment. Our sample size was set with the aim of obtaining enough recorded performances to carry out the gesture-following task. Six pianists had minimal experience playing the piano in small ensembles, six had extensive experience, and six were completing a degree in either choral or orchestral conducting. Some pianists (10 of the 18) had completed a similar version of the task for the experiment reported in Bishop and Goebl (2017). Some of the pianists knew their assigned partner, but none had performed together before. Participants provided informed consent before completing the experiment and received a small travel reimbursement.

“Conductor”, “ensemble-experienced”, and “ensemble-inexperienced” groups did not differ significantly in terms of age (conductors M = 28.0, SD = 7.5; ensemble-experienced M = 27.2, SD = 4.7; ensemble-inexperienced M = 25.2, SD = 3.4; F(1, 13) = 0.37, p = 0.55) or years of piano-playing experience (conductors M = 17.0, SD = 7.4; ensemble-experienced M = 22.0, SD = 5.6; ensemble-inexperienced M = 17.8, SD = 3.4; F(1, 13) = 1.64, p = 0.22). However, the ensemble-experienced group had more experience playing in duos and other small ensembles (self-rated M = 12.7 out of 15, SD = 1.4; conductors M = 8.5, SD = 2.1; ensemble-inexperienced M = 8.2, SD = 1.3; F(1, 13) = 18.45, p = 0.001, \(\eta ^2=0.59\)). Only the conductors had prior conducting experience (M = 3 years, SD = 1.7).

Table 1 Musical stimuli are listed with their starting tempi and meters

Full size table

Stimuli and equipment

Pianists performed 15 passages adapted from the starts of pieces in the Western classical repertoire (Table 1). Some further details on these pieces are given in Bishop and Goebl (2017). A sample piece is shown in Fig. 1. All passages were in duple meter, 2–4 bars in length, multi-voiced (to be played with both hands), and adapted so that the two performers would always start in unison on the first downbeat. Pieces that were likely to be unfamiliar to participants were chosen to encourage communication between duo members and to ensure that they would not have preexisting expectations regarding the tempo. A tempo was selected for each passage based on the original tempo indications in the score; these ranged from 45 to 220 bpm, with approximate mean interbeat intervals of 111 ms at the slowest tempo and 1000 ms at the fastest tempo.

Pianists performed on two Yamaha CLP470 Clavinovas and faced each other directly, as shown in Fig. 2. Audio and MIDI from the Clavinovas were recorded via a Focusrite Scarlett 18i8 sound card in Ableton Live, along with audio from a standing microphone placed between the two performers (44.1 kHz sampling rate).

Pianists’ upper body movements were recorded using an eight-camera (Prime 13) OptiTrack motion capture system. Each pianist wore a jacket and cap, to which 25 spherical markers were affixed (including three on the head). Marker positions were recorded at a rate of 240 frames per second.

To synchronize audio/MIDI and motion capture data, a film clapboard was placed on top of one of the Clavinovas with an OptiTrack marker attached and struck once at the start and end of each block. These claps were clearly discernible in the OptiTrack data and in the audio recorded by the standing microphone, which was recorded in synchrony with the audio and MIDI from the Clavinovas.

Procedure

Pianists were given hard copies of the passage scores at the start of the recording session and had time to practice together. The recording phase began once both performers could play the passages without error.

Recording sessions were divided into two blocks. In each block, the performers played once through each of the 15 passages (in a pseudorandomized order, structured so that passages with a similar tempo were played consecutively). Each performer was instructed to play either the part labelled “A” or the part labelled “B”; these indicated primo and secondo lines and were assigned pseudorandomly, so that each performer played about the same number of primo and secondo parts (7 or 8 of each). Thus, each participant played a total of 30 trials, going once through the 15 passages in each block.

Leader/follower roles were assigned on an alternating basis, so each performer led each passage once. At the start of each trial, the assigned leader was handed a pair of headphones and listened to a metronome beat indicating the tempo for the passage. They then returned the headphones to the experimenter before beginning to play. The leader’s task was to coordinate the entrance of the passage without speaking (e.g., counting out loud). Duos were instructed to focus on playing together and to ignore pitch errors as much as possible, but if major timing or pitch errors made it impossible to continue, they were allowed to redo the trial. Once the recordings were finished, pianists completed a musical background questionnaire.

Analysis

Alignment of audio/MIDI and motion capture data

The experimenter struck a film clapboard at the start and end of each block (see “Stimuli and equipment”). The initial strike was used as a reference “time 0”, and the timestamps for all recordings were recoded to indicate elapsed time since this point. To check the precision of this inter-recording alignment, for each recording, the interval between first and second clapboard strikes was calculated, and discrepancies between recording devices in interval lengths were assessed. The mean discrepancy was minimal, less than the duration of one sample of motion capture data (M = 2.9 ms, SD = 2.4).

Gesture position, velocity, acceleration, and jerk

Motion capture data comprised series of x, y, and z axis coordinates for 25 upper body markers, indicating forwards/backwards, left-right, and up/down movement, respectively. Here, we report only on the motion of the front-most head sensor (positioned slightly above the forehead), as motion was also measured from this location in Bishop and Goebl (2017), and the current study was partially designed to validate our earlier findings. For analyses of position and velocity, only forwards−backwards (x axis) data were used. For analyses of acceleration, a 3D measure was computed using the square root of the sum of squares for x, y, and z axes, with gravity added into the y dimension (gravity was included, again, for the purpose of equating our measures with the earlier work).

Gesture position data were smoothed using functional data analysis (Ramsay & Silverman, 2002; Goebl & Palmer, 2008). Order-7 b-splines were fit to the trajectories with knots every 50 ms, applying a roughness penalty on the fifth derivative (\(\lambda = 10^{-18}\)), which smoothed the third derivative (jerk). The functional data were then converted back for further analysis with samples every 5 ms.

Motion data were segmented into trials, based on visual analysis of the motion capture recordings. A “cue window” was then identified in each trial, comprising the two interbeat intervals prior to the leader’s first note onset and the leader’s first performed interbeat interval. Interbeat intervals were defined as the duration of a quarter note for pieces written in 4/4 and as the duration of a half note for pieces written in 2/2. Any cueing-in gestures that were given would fall within that window.

Primo-secondo note asynchronies

MIDI data from the Clavinovas were aligned with the corresponding notation using the performance-score matching system developed by Flossmann, Goebl, Grachten, Niedermayer and Widmer (2010), which pairs MIDI pitches with score notes according to pitch sequence information. Only pitch sequence is considered, so rhythm errors are not penalized. Mismatched pitches resulting from performer error or incorrect interpretation of the pitch sequence by the matching system can be corrected via a graphical user interface. Matched performances thereby include only correctly performed and correctly matched notes. The mean pitch error rate across all completed performances (i.e., excluding false starts, but including all other notes) was 9.5% (SD = 8.8%). Using these matched performances, primo-secondo asynchronies were calculated for notes that should have been synchronized, according to the score. Asynchronies were calculated for the entirety of each performance, but for the analyses presented here, we used the asynchronies achieved on the first chord of each piece as our main dependent variable. Asynchronies were not normally distributed, so non-parametric tests were used.

Gesture-following experiment

A second experiment was carried out with the aim of identifying the kinematic properties that improve cueing gesture synchronizability. Audio-visual recordings of pianist performances, collected during the first experiment, were used as stimuli for a beat-tapping task, which was completed by a sample of 10 musicians. The average accuracy of these musicians’ synchronization was taken as an indicator of gesture synchronizability, serving as a more reliable measure than the accuracy of individual followers’ synchronization during the interactive duo performance experiment, due to the larger sample size.