Indicators of beat position
A central aim of this study was to test the hypothesis that points of peak acceleration in instrumentalists’ cueing gestures communicate beat position (H1). This prediction was addressed with an analysis of how followers’ first note onsets aligned with leaders’ gestures. More specifically, we assessed the alignment between followers’ first onsets and extremes in leaders’ head position, velocity, and acceleration curves. If beats were to be communicated via head position, it is logical to expect that beat locations would align with points of path reversal, as these occur in all gestures, regardless of their trajectory. If beats were to be communicated via head velocity or acceleration, they would likely coincide with either maxima or minima in the velocity or acceleration curves.
Peaks and troughs, therefore, were identified in the cue window of each leader’s position, velocity, and acceleration curves. Peaks were defined as points preceded by five consecutively increasing observations and followed by five consecutively decreasing observations that were outside the 99% confidence interval for a surrounding window of 300 ms. Troughs were defined as points preceded by consecutively decreasing observations and followed by five consecutively increasing observations that were likewise outside the 99% confidence interval for a surrounding 300 ms window. A constant rather than tempo-adjusted window size was used, as adjusting for tempo would have required a subjective judgement of at what hierarchical level of the beat each performer had gestured (e.g., 2 beats per bar vs. 4 beats per bar).
Cue gestures were assumed to have at least two points of path reversal, so peak-trough pairs separated by no more than one beat were identified. Since we expected the cue gesture to be more prominent than other movements made during the cue window, the peak-trough pair spanning the greatest range in position, velocity, or acceleration values was selected. The time interval between each selected peak and trough and the follower’s first note onset was calculated as an indication of the precision of their alignment. Peak-to-onset and trough-to-onset intervals were averaged across trials to produce a mean interval for each follower. Interval durations were divided by performers’ average interbeat intervals to achieve normalized values with units of interbeat intervals (IBIs). Sample head position, velocity, acceleration, and jerk curves are given in Fig. 4.
Table 2 Peak-to-onset and trough-to-onset distribution medians and standard deviations (in IBIs) for followers from both experiments
If either peaks or troughs in a given dimension indicate beats, then peak- or trough-to-onset intervals could be expected to cluster around two points: interval lengths of approximately 1 IBI would occur if the selected gesture feature preceded the follower’s onsets by one IBI (communicating a preparatory beat), while interval lengths of approximately 0 IBIs would occur if the feature and the follower’s onsets were synchronized. For our purposes, a clustering of intervals around either value was taken as an indication that the point communicated beat position. To assess the reliability of alignment between the selected peaks and troughs and followers’ first onsets, the proportion of average intervals approximating either 0 IBIs or 1 IBI (±0.2 IBIs) was calculated. Separate analyses were done for followers from the interactive duo performance task and gesture-following task.
Statistics for peak- and trough-to-onset interval distributions are presented in Table 2. To make sense of these data, we have to consider both (1) how reliably followers’ onsets aligned with each landmark and (2) around which values each distribution centered. Reliable alignment with a particular landmark (high percentages in columns 3 and 6 of Table 2), plus a median value near 0 or 1 IBI, would be evidence that the landmark communicates beat position.
Interval distributions for the interactive duo performance and gesture-following tasks are shown in Figs. 5 and 6, respectively. For head position, we found that followers in both tasks aligned their first onsets more closely with peaks than troughs. Medians for peak-to-onset distributions were near 1 IBI, while medians for trough-to-onset distributions were not. For head velocity, neither peaks nor troughs seemed to communicate beats, as alignment percentages were low and medians were not reliably near to either 0 or 1 IBI. For head acceleration, the results were a bit more complex: followers aligned their onsets more closely with peaks than troughs in the interactive task, while in the gesture-following task, onsets aligned slightly more closely with troughs than peaks. The period of deceleration between acceleration peak-trough pairs might have communicated beats to participants in the gesture-following task—a possibility that is considered in the discussion. For both tasks, however, followers aligned their first onsets more reliably with acceleration landmarks than with position or velocity landmarks.
Across motion parameters, a timing difference was noticeable between the interactive duo performance and gesture-following tasks: gesture-following participants’ first taps tended to align with a later point on leaders’ gesture curves than did interactive duo followers’ first onsets. For example, peaks in position and acceleration preceded interactive duo followers’ first onsets by slightly less than one beat and gesture-following participants’ first taps by slightly more than one beat. Correspondingly, leader-follower asynchronies were greater for gesture-following participants than for interactive duo participants, Z = 20.44, p < 0.001, r = 0.56 (gesture-following task M = −0.16 IBIs, SD = 0.64 IBIs; interactive duo task M = −0.01 IBIs, SD = 0.17).Footnote 2 This timing difference could reflect better anticipation of the beat among interactive duo followers or a task-dependent difference in how beats were perceived.
The potential effects of leader gesture/follower onset alignment on note synchronization were assessed as an additional test of which gesture parameters were most useful in communicating beat position during the gesture-following task. Only position peak-to-onset, velocity peak-to-onset, and acceleration peak- and trough-to-onset interval distributions were considered, since their medians were close to 1 IBI (Table 2, column 7). For each distribution, trials with intervals approximating 0 or 1 IBI (±0.1 IBIs; “aligned”) were compared to trials without intervals approximating 0 or 1 IBI (“not aligned”), using mean absolute asynchronies of first tapped beats as the dependent variable. Significantly improved synchronization (at \(\alpha =0.01\)) was observed when first taps aligned with position peaks, Z = 2.95, p = 0.003, r = 0.11 (aligned M = 0.33 IBIs, SD = 0.29; not aligned M = 0.41 IBIs, SD = 0.33), acceleration peaks, Z = 4.83, p < 0.001, r = 0.15 (aligned M = 0.33 IBIs, SD = 0.32; not aligned M = 0.41 IBIs, SD = 0.34), and acceleration troughs, Z = 7.13, p < 0.001, r = 0.22 (aligned M = 0.27 IBIs, SD = 0.27; not aligned M = 0.42 IBIs, SD = 0.34). No significant difference was observed for velocity peaks, Z = 2.02, p = 0.04. These findings provide evidence that leaders’ head trajectories and acceleration patterns are used as cues to beat position.
Gesture properties that support successful synchronization
In this section, analyses testing the potential effects of gesture kinematics and leader expertise on leader–follower synchronization are presented, using data from the gesture-following task. Asynchronies obtained from the gesture-following task were not normally distributed, so the results of non-parametric tests are reported.
Alignment between leaders’ gestures and sounded performance (H2)
Increased precision in the alignment between leaders’ first note onsets and their own cueing gestures was expected to facilitate leader–follower note synchronization. To test this hypothesis, the time intervals between leaders’ first note onsets and peaks and troughs in their head position, velocity, and acceleration curves were assessed, using the same analysis procedure as described in the previous section. This analysis had the additional effect of clarifying which kinematic landmarks correspond to leaders’ intended beats.
Interval distribution statistics are presented in Table 3. As we saw for followers, leaders’ first onsets aligned more closely with peaks than troughs in head position—only the peak-to-onset interval distribution median was close to 1 IBI. Leaders’ first onsets did not reliably align with either velocity peaks or troughs. For acceleration, alignment was more precise and reliable with peaks than troughs, as evidenced by the peak-to-onset distribution median near 1 IBI and the relatively high proportion of leaders whose average peak-to-onset intervals approximated 1 IBI in length. Leaders’ onsets aligned slightly more reliably with acceleration peaks than with position peaks, as we saw in the previous section for followers’ onsets.
We also tested whether note synchronization was more successful in the gesture-following task on trials where leaders’ head position or acceleration peaks either aligned with or preceded their own first onsets by 1 IBI (±0.1 IBIs) than on other trials. The difference in mean absolute note asynchronies was significant (at \(\alpha =0.03\)) for position peaks, Z = 2.44, p = 0.01, r = 0.08 (aligned M = 0.32 IBIs, SD = 0.22; not aligned M = 0.40, SD = 0.33), but not acceleration peaks, Z = 1.01, p = 0.31 (aligned M = 0.36, SD = 0.27; not aligned M = 0.41, SD = 0.35). The alignment of leaders’ first onsets with peaks in their own head trajectories, therefore, systematically improved note synchronization.
Table 3 Peak-to-onset and trough-to-onset distribution means and standard deviations (in IBIs) for leaders
Gesture smoothness and magnitude (H3–4)
Better synchronization was expected with gestures that were smooth than with gestures that were high in jerk. Better synchronization was also expected with gestures that provided a large rather than small magnitude indication of the beat. For each trial, an average value of 3D gesture jerk was calculated (using the root sum squared of jerk values in x, y, and z dimensions), and a measure of gesture magnitude (calculated as the spatial distance between the leader’s maximum and minimum head positions) was obtained. The degree of correlation between these values and the mean absolute asynchronies achieved by participants in the gesture-following task on their first tap of each trial were assessed. There was a positive correlation between mean gesture jerk and mean absolute asynchronies, \(\tau =0.22\), z = 3.26, p = 0.001 (significant at \(\alpha =0.03\)), suggesting a tendency for asynchrony to increase with increasing jerk. Gesture magnitude correlated slightly but significantly with mean absolute asynchronies, \(\tau =-0.19\), z = 2.48, p = 0.01, indicating that asynchronies decreased as gesture magnitude increased.
Gesture prototypicality (H5)
Gestures that followed prototypical patterns of motion were expected to encourage more successful synchronization than gestures that followed idiosyncratic patterns of motion. To obtain a measure of “gesture prototypicality”, we evaluated how similar each gesture was to all other gestures in the stimulus set. Cross-correlations were calculated between all recorded leaders’ cue gestures, within and between duos. For each gesture, a mean absolute lag-0 correlation magnitude was then computed. The acceleration curves with the lowest and highest prototypicality (i.e., highest and lowest mean correlation magnitudes, respectively) are shown in Fig. 7.
Correlations were calculated between average lag-0 correlation magnitudes and the mean absolute asynchronies achieved by participants in the gesture-following task, on the first beat of each trial. Positive correlations (at \(\alpha =0.02\)) were observed for head position, \(\tau =0.19\), z = 2.87, p = 0.004, velocity, \(\tau =0.38\), z = 5.81, p < 0.001, and acceleration, \(\tau =0.25\), z = 3.89, p < 0.001, indicating that as gesture prototypicality increased, mean asynchronies also increased. Thus, contrary to our hypothesis, followers synchronized less successfully with leaders who gave more prototypical gestures.
Evaluating predictors of synchronization success
The potential value of the gesture attributes discussed above as predictors of followers’ synchronization success was evaluated via multiple regression. A (non-hierarchical) model was constructed that included (1) leader experience group (ensemble-inexperienced, ensemble-experienced, conductor-pianists), (2) leaders’ note alignment with their own head acceleration peaks, (3) gesture jerk, (4) gesture magnitude, and (5) gesture prototypicality as predictors. Mean absolute asynchronies achieved by participants in the gesture-following task on their first taps served as the dependent variable.
The overall model was significant, F(5, 720) = 27.32, p < 0.001, (adjusted) \(R^2=0.18\). It accounted for a low proportion of variance, but this is not surprising given the number of factors involved in synchronizing with visual cues. Significant effects at an adjusted \(\alpha =0.01\) were observed for gesture magnitude, t(720) = 3.73, p < 0.001, \(\eta ^2=0.02\), gesture jerk, t(720) = 2.96, p = 0.003, \(\eta ^2=0.01\), and gesture prototypicality, t(720) = 10.24, p < 0.001, \(\eta ^2=0.12\). We also found a significant effect of leader experience: synchronization was more successful with ensemble-experienced pianists’ gestures than with ensemble-inexperienced pianists’ gestures, t(720) = 4.77, p < 0.001, \(\eta ^2=0.03\), and more successful with conductor-pianists’ gestures than with ensemble-inexperienced pianists’ gestures, t(720) = 3.35, p < 0.001, \(\eta ^2=0.81\). The effect of leader gesture-note alignment, t(720) = 1.61, p = 0.11, was not significant. We can conclude, therefore, that increased ensemble and conducting experience, increased gesture smoothness and gesture magnitude, and decreased gesture prototypicality contribute to improved follower synchronization.
Gesture coordination and note synchronization in interactive duo performance task
Similarity in leader–follower gesture patterns (H6)
It was hypothesized that, during the interactive duo performance task, some followers would make gestures that were similar in timing and form to the gestures made by leaders. To assess the similarity in movements made by leaders and followers, cross-correlation functions were calculated between leaders’ and followers’ head position, velocity, and acceleration curves, for each trial, in intervals of 15 ms, up to a maximum lag of three IBIs.
For each trial, the lag with the strongest absolute correlation was identified. Positive correlations indicated that the leader and follower were moving in-phase; negative correlations indicated that they were moving in anti-phase. Correlation values and corresponding lags are reported in Table 4. Moderate negative correlations were observed between absolute maximum correlation values and their corresponding lags for position, \(\tau =-0.27\), z = 6.56, p < 0.001, velocity, \(\tau =-0.26\), z = 6.33, p < 0.001, and acceleration curves, \(\tau =-0.25\), z = 6.28, p < 0.001 (all significant at \(\alpha =0.02\)), suggesting that when leader and follower gestures aligned more closely in time, the degree of similarity in their movements also increased.
Table 4 Leader–follower cross-correlations
We also examined whether greater temporal alignment in performed gestures related to note synchronization. As a measure of temporal alignment between gestures, we used the lag that corresponded to the maximum correlation value. When maximum correlations occurred at lags close to 0, this would indicate high temporal alignment between leader and follower. Mean absolute note asynchronies achieved on trials in which maximum correlations occurred close to lag 0 (±0.3 IBIs) were compared to the asynchronies achieved on all other trials. None of these tests yielded significant results (at \(\alpha =0.02\)), Z = 0.71, p = 0.48 (position), Z = 2.07, p = 0.04 (velocity), Z = 0.05, p = 0.96 (acceleration), indicating that note synchronization success did not depend on the temporal alignment of leaders’ and followers’ head gestures.
Effects of ensemble and conducting experience on note synchrony in duo performance
Leader experience was found to affect the quality of synchronization by participants in the gesture-following task (see above). An ANOVA was run on the absolute mean asynchronies achieved by participants in the interactive duo performance task, on the first onset of each piece, to test whether the same between-group differences would emerge. It was expected that the experience shared by members of the conductor-pianist and ensemble-experienced duos would enable both better leading and better following than was the case for ensemble-inexperienced duos, resulting in more successful synchronization among conductor-pianist and ensemble-experienced groups. The effect of ensemble experience was not significant, however, F(1, 265) = 2.93, p = 0.09. Figure 8 shows the mean asynchronies achieved by duos in each group.