Introduction

In musical ensembles, musicians perform a musical piece by ensuring precise synchronisation1,2,3,4. However, there are many time delays between the musicians involved. For instance, there are time delays for sound to travel through space5 between the musicians and those involved in sensory information processing and sensorimotor coordination. In addition, there is a time delay between the time the musician moved and that the sound is actually produced6. Although these time delays sometimes reach 100 ms or more, musicians succeed in presenting accurate performances through precise synchronisation5,7,8,9,10,11,12. However, the characteristics of rhythm coordination in the face of time delays as well as the mechanisms for overcoming the effects of time delay in musical ensembles remain unclear.

Over the recent years, several researchers have conducted multiple ensemble hand-clapping experiments13,14,15,16,17 as well as experiments on instrumental performance5,18,19,20,21,22 to investigate the effects of time delays on the performance of musical ensembles. Chafe et al.13 conducted an ensemble hand-clapping experiment in which paired participants were required to match their partners’ clapping sounds within the time delay. Synchronisation accuracy was optimal in the time delay conditions of 8 to 25 ms and performance accuracy deteriorated significantly for time delays of 55–66 ms. Farner et al.16 conducted experiments similar to those of Chafe et al.13. The tempo decreased when the time delay was 15–23 ms, whereas it accelerated when the time delay was below this threshold. The timing variability increased, and the subjective judgements of the ensemble performances among the participants involved dropped after a time delay of 25 ms. In the study conducted by Bartlette et al.5, the duets of clarinets and strings were influenced significantly after the time lag increased over 100 ms.

As mentioned above, in previous experiments involving instrumental performance and hand-clapping tasks5,13,16, various factors associated with time delays have affected the nature of the synchronisation in musical ensembles. For instance, the tolerance to time delays shifted from 25 to 100 ms when one keyboardist changed from the accordion to the piano23. Essentially, the tolerance for the time delay threshold differs depending on the task conditions provided in previous studies on musical ensembles. Washburn et al.24 conducted piano-duo ensembles with time delays of 10, 20, or 40 ms. The tempo of the performance was faster when the musical pieces of the duo were similar with respect to pitch range and melodic contour for a time delay of 10 ms. By contrast, for a time delay of 40 ms, the tempo was faster for the dissimilar condition with respect to pitch range and melodic contour than that for the similar condition. In addition, the tempo of participants who started the performance as soloists was slower than that of participants who joined the performance later as a part of a duo. However, the similarity of the music parts and starting/joining roles had no effect on the asynchronies of note onset for all the time delays, while the melodic and harmonic complexity of the accompaniment part in the piano duet without a time delay did affect the asynchrony25. Therefore, depending on the musical ensemble and intended performance, it is difficult to determine the basic timing control mechanisms affected by the time delay in a musical performance.

In the current literature associated with this topic, through the use of external event stimuli, finger-tapping experiments have often been conducted to investigate the characteristics of and timing control mechanisms behind synchronisation depending on the simplicity of the tasks involved26,27,28,29,30,31. Single synchronisation and/or continuation tapping tasks are used to determine the timely coordination of people’s actions through external auditory stimuli. In such tasks, the participant must synchronise their finger taps with a metronome tone or maintain the tempo of the metronome through finger tapping after the metronome stops. However, in these synchronisation tapping tasks, tapping tends to precede the target metronome tones by tens to a hundred milliseconds or more. This phenomenon, termed negative mean asynchrony (NMA), is well known in such studies26,32,33,34. Considering that people rarely recognise NMA, this suggests that the points of objective and subjective synchrony do not generally coincide35. The timing of the participants’ finger taps is mainly controlled by three factors: the previous inter-onset intervals (IOIs) of the metronome, the previous own inter-tap intervals (ITIs), and the previous synchronisation errors (SEs) between own tap timing and corresponding metronome stimuli36,37,38,39,40.

Dyadic synchronisation-continuation tasks have often been used to investigate the characteristics of rhythm coordination with other people. In such tasks, two participants synchronise their finger taps with metronome tones at a certain tempo in the first synchronisation phase. After the metronome’s tone stops, the participants tap their fingers to synchronise with the auditory feedback stimuli of their partner’s tap timing, while maintaining the tempo of the metronome (continuation phase). Different characteristics have been found in dyadic finger-tapping tasks compared with solo finger-tapping tasks. For example, in a solo synchronisation-continuation tapping task, the ITIs showed a gradual increase or decrease in the continuation phase depending on the tempo of the metronome in the synchronisation phase41,42. Collyer et al.41 found that the ITIs tended to decrease in the continuation phase when the tempo of the metronome was 250–413 ms but increase when the tempo was 513–748 ms. In a dyadic synchronisation-continuation task, Okano et al.43 found the tempo of the pair to accelerate in the continuation phase when the tempo of the metronome was set at 300, 500, or 800 ms in the synchronisation phase. Konvalinka et al.28 reported “hyper-followers” in their dyadic finger-tapping task. In their study, the participating pair mutually adapted their subsequent ITIs to the partner’s previous ITIs. In other words, both followed their partner’s tempo.

Kimura et al. conducted a dyadic synchronisation-continuation tapping task with various tempos of the metronome in the synchronisation phase (700–3200 ms) to investigate the way in which participants changed their strategy to synchronise with their partners tempo29. Although they investigated the effect of the partner’s ITIs instead of the IOIs of the metronome on timing control, they established that the timing control strategy for dyadic synchronisation also depended on the aforementioned three factors: previous IOIs, ITIs and SEs. Specifically, they used a model that determined the subsequent tap timing by a linear combination of the previous own ITIs, partner’s ITIs, and SEs multiplied by their respective impact coefficients. In all tempo conditions, the authors found the following strategies. First, for the influence of the previous own ITIs, a short/long ITI followed a long/short ITI, respectively. This strategy could maintain the tempo. Second, the participants adapted their subsequent ITIs to the previous partner’s ITIs. Finally, they tended to change the next tap timing to decrease the previous SEs. However, only the effect of the previous partner’s ITIs on the subsequent tap timing increased when the metronome tempo in the synchronisation phase was slower, while those of the previous own ITIs and SEs did not change.

This study aimed to investigate the fundamental characteristics and timing-control strategies of dyadic rhythm coordination depending on the time delay. Using the dyadic finger-tapping task can reveal the effects of time delays on synchronisation with other people without the complexity involved in music ensembles such as different instruments, roles (leader/follower; melody/accompaniment), and temporal structure in the music. To achieve this objective, we conducted a dyadic synchronisation-continuation tapping task with a time delay, in which the auditory feedback stimuli of their partner’s tap timing were delivered to participants with a certain time lag. The time delay conditions ranged from 0 to 240 ms in increments of 40 ms.

First, this study investigated the basic properties of synchronisation with time delays by focusing on the SEs and ITIs involved. We expected that tolerance to a longer time delay would be observed in the dyadic finger-tapping task than in the clapping and musical-ensemble tasks of the previous studies5,13,16. Specifically, small SEs would be observed for longer time delays in this study than in the previous ones. As mentioned above, NMA was observed in the synchronisation using finger tapping with a metronome26,32,33,34, which would offset the effect of the time delay13. As the musical context such as rhythmic complexity decreased NMA26,44, the longer tolerance to the time delay would be shown in the dyadic finger-tapping task with less musical context than in the clapping and music ensemble tasks. In addition, the tempo of the participants would change from acceleration to deceleration under longer time delays in this study than in those using musical ensembles16 because the tendency of people to tap early to external stimuli seen under NMA could cause the acceleration of the tempo43. Furthermore, the stability of the tempo in the longer time delay conditions would accompany the stability of the synchronisation or small variability of the SEs.

Second, the mechanisms behind timing control in rhythm coordination with a time delay were investigated. Specifically, the effects of one’s own and one’s partner’s ITIs and SEs on timing control in dyadic rhythm coordination were investigated using a linear model and the previous own ITIs, partner’s ITIs, and SEs from a previous study29. To investigate the mechanisms for sensorimotor synchronisation, many models have been proposed, among which the validity of models with the two error correction processes of phase and period correction has been demonstrated26,27,36,37,45. Phase correction is a process to reduce the asynchrony between ones’s own timing and an external signal, while period correction is a process to match the period or tempo to the external signal. A large (small) degree of period correction indicates that people tend to strongly match their periods to the external ones (strongly maintain their tempo). However, as mentioned above, Kimura et al. suggested that the process for maintaining one’s own tempo is separate from the process of adjusting one’s own tempo to that of one’s partner29. Therefore, we used Kimura’s model to investigate the mechanisms in dyadic synchronisation with time delay. We expected that the effect of the previous partner’s ITIs would increase as the time delay increases because the longer the time delay, the slower the tempo in dyadic synchronisation16,24. However, if the tempo of finger tapping were to stabilise under a particular time delay, participants would no longer need to correct their tap timing using their partner’s ITIs. In this case, the effect of the previous partner’s ITIs would decrease under that time delay.

Figure 1
figure 1

The accuracy and stability of synchronisation. (a) Means of the absolute values of the standardised global SEs in each condition. The values were lower in the 40–160 ms time delay conditions than in the other conditions. (b) Means of the standard deviations of the standardised global SEs. The changing of asynchrony was also smaller in the 40–160 ms time delay conditions than in the other conditions. The error bars show the standard deviations between the participants. *, **, and *** represent \(p < 0.05\), \(p < 0.01\), and \(p < 0.001\), respectively.

Results

Sixteen participants (eight pairs) participated in the experiment. The mean ITI of each participant was calculated as the index for maintaining the tempo of the metronome. First, the SEs between the paired participants’ tap timing were calculated. Additionally, the SEs between the participants’ tap timing and auditory feedback stimuli of their partners’ tap timing under a constant time delay were calculated. Hereafter, the former is referred to as global SEs, and the latter is referred to as the local SEs (gSE and lSE, respectively). To investigate the accuracy and stability of synchronisation between the paired participants, the mean of the absolute gSEs and standard deviation (SD) of the gSEs were calculated. In addition, the mean lSE was calculated to investigate synchronisation accuracy from the participants’ perspective. Multiple regression analysis on the timing control model was conducted to investigate the degrees of tap-timing correction using the previous own and their partner’s ITIs as well as their previous lSEs29. The lSE was the SE that only the participants could perceive, whereas only a third party could perceive the gSE. Therefore, the lSE was selected as the explanatory variable to calculate the correction strength of the SE.

Synchronisation accuracy and stability

Figure 1a shows the means of the absolute gSEs. Detailed values are shown in Supplementary Table S1. The gSEs were standardised using the corresponding ITIs because gSEs increase as ITIs increase46,47,48,49. The graph was U-shaped, with the lowest values in the 80–120 ms time delay conditions. Further, the standardised gSEs in the 0 and 240 ms time delay conditions were significantly larger than those in the 40–160 ms time delay conditions. Additionally, the standardised gSEs in the 200 ms time delay condition were significantly larger than those in the 80–160 ms time delay conditions. There were no significant differences between the pairs comprising two of the 40, 80, 120, and 160 ms time delay conditions. Thus, below the 80 ms time delay condition, the gSE decreased as the lag increased, whereas the gSE increased as the lag increased above 160 ms. Supplementary Table S2 shows the results of the statistical tests for each combination of time delay conditions.

Figure 1b shows the means of the SDs within one trial of the gSEs standardised using the corresponding ITIs for each condition50,51,52,53,54. The values of the averaged SDs are shown in Supplementary Table S1. The graph was also U-shaped, with the lowest values in the 80–160 ms time delay conditions. Further, the SDs in the 0 ms time delay condition were significantly higher than those in the 40–120 ms time delay conditions, whereas those in the 40 ms time delay condition were significantly lower than those in the 120 ms time delay condition. The SDs in the 200 ms time delay condition were significantly higher than those in the 80–160 ms time delay conditions, while those in the 240 ms time delay condition were significantly higher than those in the 40–160 ms time delay conditions. There were no significant differences between the pairs comprising two of the 80, 120, and 160 ms time delay conditions. Therefore, as the time lag increased, the SDs decreased before the 120 ms time delay condition, whereas they increased thereafter. Please see Supplementary Table S3 for the details of the statistics.

Figure 2a shows the means of the standardised lSEs using the corresponding ITIs for each participant of the pairs in each condition. Supplementary Table 1 shows the values in each time delay condition. The participants from each pair were randomly assigned to be Participants A and B. In the 0 ms time delay condition, the means of the standardised lSEs were close to zero. For both participants, the negative values of the standardised lSEs increased as the time lag increased. A two-way mixed ANOVA revealed the significant effect of the time delay condition (\(p<0.001\)) but no significant main effect of the participants and interaction (\(p=0.145\) and \(p=0.976\), respectively). The post hoc test showed the significant differences among all the time delay conditions (\(p=0.001\) between the 160 ms and 240 ms time delay conditions and \(p<0.001\) between all the others), except between the 200 ms and 240 ms time delay conditions (\(p=0.070\)). These results revealed that the participants tapped earlier than the stimuli from their partners as the time lag increased.

Figure 2
figure 2

The accuracy and stability of the standardised lSEs. (a) Means of the standardised lSEs of each participant. The same tendency as for the lSEs was observed for the standardised lSEs with respect to the time delay. However, there was no significant difference between the 200 and 240 time delay conditions. (b) Means of the SDs of the standardised lSEs. The SDs were higher in the 200 and 240 ms time delay conditions than in the others. ** and *** represent \(p < 0.01\) and \(p < 0.001\), respectively. The error bars show the SDs between the participants.

Figure 2b shows the SDs of the standardised lSEs for each participant in each time delay condition. The values of these SDs are shown in Supplementary Table 1. A U-shaped graph was also found in the standardised lSEs as well as the standardised gSEs (Fig. 1b). A two-way mixed ANOVA showed the significant main effect of the time lag (\(p<0.001\)) but no main effect of the participants or interaction (\(p=0.931\) and \(p=1.000\), respectively). The variability of the lSEs increased for time delays over 200 ms from the perspective of the participants. Supplementary Table S4 shows the results of the post hoc test for the time delay conditions.

Figure 3
figure 3

Means of the ITIs for each condition. There were significant differences between all the conditions. The longer the time delay, the larger the average ITIs were. The red horizontal line indicates the tempo of the metronome. The error bars indicate the SDs between the participants.

Tempo-keeping accuracy

Figure 3 shows the mean ITIs for each condition. The values of the mean ITIs in each condition are shown in Supplementary Table S1. The statistical tests found significant differences between all the combinations of the time delay conditions (\(p = 0.008\) between the 160 ms and 200 ms time delay conditions and \(p < 0.001\) for the other combinations). The values were significantly higher as the time lag increased. At time delays of 120 ms or less, the participants’ tempos increased, whereas they decreased at time delays above 200 ms. The participants maintained the tempo of the metronome accurately in the 160 ms time delay condition. The means of the SDs of the ITIs and results of the statistical tests for those are presented in Supplementary Tables S1 and S5, respectively. The graph of the SDs of the ITIs was U-shaped, as was that of the SDs of the standardised gSEs and lSEs.

Timing control mechanisms of tapping with time delays

We evaluated the values at which the degrees of tap timing were affected by the previous own and partner’s ITIs as well as the previous lSEs using multiple regression analysis. To avoid the trend in the ITI time-series data influencing the regression analysis, we detrended the ITI time series. Based on the detrended ITIs, we then reconstructed the tap timing data and recalculated the resulting lSEs and \(\delta\)ITIs. The ITIs, lSEs, and \(\delta\)ITIs were analysed using the augmented Dickey–Fuller (ADF) test to determine their stationarity. As a result, 116 of the 448 data items included non-stationary data, and these were removed. No combination of participants and conditions was discarded throughout the trials. Of the remaining 332 data items, only six did not have an adjusted R-squared (\(R^2\)) value above 0.5.

The parameters \(\alpha\), \(\beta\), and \(\gamma\) are the coefficients of the previous own ITIs, previous partner’s ITIs, and previous lSEs. These values indicate the degrees of the individual effects of the previous own and partner’s ITIs and those of the previous lSEs on the tap-timing correction. Figure 4a shows the mean \(\alpha\) for each condition. The negative values mean that the participants increased (decreased) their ITIs if the previous own ITIs were short (long). There were no significant differences between all the combinations of the time delay conditions (see Supplementary Table S6). Therefore, the values were steady in all the conditions. Figure 4b shows the mean value of \(\beta\). The positive values mean that the participants tended to match their ITIs to their previous partner’s ITIs. The \(\beta\) in the 80 ms time delay condition was significantly lower than those in the 0, 120, 200, and 240 ms time delay conditions (see Supplementary Table S7 for the details). The longer the lag, the lower the values below the 80 ms time delay condition, but the higher the values above the 80 ms time delay condition. Figure 4c shows the mean value of \(\gamma\). The negative values mean that the participants corrected their tap timing to reduce the lSEs. There were no significant differences between all the combinations of the conditions (see Supplementary Table S8). The values remained steady in all the time delay conditions.

Figure 4
figure 4

The degrees of the effects of the previous own and partner’s ITIs as well as the previous lSEs. (a) Means of \(\alpha\) in the regression model in each condition. The values represent the strength of the influence of the participant’s own ITIs on their tap timing. This impact was steady in all the conditions. (b) Means of \(\beta\) in the regression model in each condition. The values represent the strength of the influence of the partner’s ITIs on their tap timing. The influence was less in the 80 ms time delay condition than in most of the other conditions. (c) Means of \(\gamma\) in the regression model in each condition. The values represent the strength of the influence of the SEs on their tap timing. There were no significant differences between all the conditions. The error bars indicate the SDs between the participants. * means \(p < 0.05\).

Discussion

To investigate the influence of time delays on dyadic rhythm coordination, we conducted a dyadic synchronisation-continuation tapping task with time delays. The results of the standardised gSEs showed that the synchronisation was the most accurate in the 40–160 ms time delay conditions and the most stable in the 40–120 ms time delay conditions. Accuracy and stability deteriorated when the time delay exceeded 200 ms. The negative lSEs increased as the time lag increased and the variability of the lSEs increased in the 200 and 240 time delay conditions. The means of the ITIs revealed that the tempo was faster (slower) than the tempo of the metronome when the time lag was shorter than 120 ms (longer than 200 ms). The length of the time lag did not affect the degree of tap-timing correction using the participant’s previous ITIs and lSEs, but it did affect the degree of tap-timing correction using the partner’s previous ITIs. The effect of the partner’s previous ITIs on the \(\delta\)ITIs was the smallest in the 80 ms time delay condition.

Synchronisation errors

The results of the gSEs revealed that the participants could synchronise with each other more accurately in the 40–160 ms time delay conditions than in the 0 ms time delay condition. Using hand-clapping tasks and experiments to investigate instrumental performance, previous studies have demonstrated that slight delays in sound transmission enable increasingly accurate synchronisation5,13,14,15,16,17,18,19,20,21,22. Furthermore, NMA has been shown in synchronisation tapping tasks in which participants synchronise their finger taps with a metronome, as mentioned above26,32,33,34. In this study, the time delay could compensate for this tendency to tap slightly earlier than the external rhythmic signals. As a result, the accuracy and stability of synchronisation increased in certain time delay conditions. However, the large tolerance to the time delay could not have been caused by the offset from NMA alone. NMA was found to be 40 ms in a synchronisation finger-tapping task with a tempo of the metronome of 1000 ms55. Other studies have also shown that NMA decreased as the tempo of the metronome decreased46,47,48. The average tempo in the present study was under 1000 ms—even in the 240 ms time delay condition. Thus, the degree to which the participants tended to tap earlier than the stimuli was insufficient to compensate for the 40–160 ms time delay.

Another possible cause was the larger tolerance for negative asynchrony than positive asynchrony. Studies have found that while participants corrected the asynchrony immediately when the tap followed the stimuli from a metronome (positive asynchrony), they showed tolerance to negative asynchrony34,56. In addition, Pollok et al. found that NMA increased to 100–150 ms when the tempo of the metronome accelerated under the perceptual threshold from 900 to 816 ms57. In the present study, at a time lag of 0 ms, the tap of one participant followed that of their partner if the asynchrony was not zero. When the participants tapped later than their partners, they would correct the positive asynchrony more strongly than their partners who received negative asynchrony. This asymmetry could have caused the higher mean and greater variability of the gSEs in the 0 ms time delay condition than in the time delay conditions. However, the time lag allowed both participants of the pair to tap earlier than the stimuli from their partners, as shown by the fact that the means of the lSEs were negative when the time lag was added (see Fig. 4a). Thus, for both participants of the pair, the degree of synchrony would be higher up to a certain time delay from their perspective than that at a time lag of 0 ms because people have a tolerance for negative asynchrony, which is also related to the stability of the synchrony. Furthermore, at that time, synchronisation accuracy and stability also increased from a third-party perspective (Fig. 1). In other words, the time delay made subjective and objective simultaneity compatible. However, in the time delay conditions over 200 ms, the asynchrony would be over the tolerance to negative asynchrony because the means and variability of the gSEs increased in the 200 ms and 240 ms time delay conditions. Thus, the mean lSEs at time delays of 160 ms and 200 ms suggest that the tolerance to negative asynchrony would be from 160 to 200 ms.

In addition, the increase in negative lSEs due to the time lag could be related to the fact that NMA indicates that people anticipate rather than react to pacing signals26,27. Although tolerance to NMA in the synchronisation with an isochronous metronome is 60 ms34, NMA has been found to increase when the timing of metronome stimuli changes57,58. For example, Wing et al. added temporal noise (jitter) to the isochronous metronome in a finger-tapping task, finding that as the noise increased, NMA also increased58. This result suggests that people are more anticipatory when the timing of external stimuli is unreliable or difficult to predict. In addition, Malcolm et al. showed that inhibiting the left superior temporal-parietal region using repetitive transcranial magnetic stimulation increased NMA in the finger-tapping task with an auditory metronome59, implying that this region is responsible for the phase correction. Moreover, the authors discussed the possibility that the inhibition of the neural input to central anticipatory processing causes the greater anticipatory response. For our participants, the stimuli timing from the partner would be more unreliable than the isochronous metronome. In addition, time delays could make the timing from the partner much more unreliable. Thus, the participants would tap with more anticipation as the time delay or unreliability increased in the dyadic task.

The results of this study suggest a threshold of synchronisation tolerance at time delays of 160–200 ms. However, this threshold was higher than that demonstrated in previous studies. Chafe et al.13 revealed that coordination using hand clapping broke down in the 55–66 ms time delay condition. Bartlette et al.5 showed that in a musical ensemble, the tempo decreased and its variability increased significantly at a time delay of over 106 ms, after which the musicians could not perform musically and interactively. This difference in the tolerance thresholds between this study and previous ones could be because the musical contexts in previous studies decreased or displaced NMA. For example, the auditory feedback of participants’ taps reduced NMA34,60. In addition, trained musicians show lower NMA than untrained individuals26,27,34. Furthermore, rhythmic complexities or subdivisions decrease NMA or remove it entirely44. The findings of the present study suggest that the time delay allowed both participants to tap ahead to their partner’s stimulus, thus showing tolerance to time delays. In such a case, the lower the NMA in musical contexts, the shorter the allowed time delay would be. In the future, the effects of these musical contexts on dyadic synchronisation using finger tapping with time delays should be investigated, especially for tolerance to lSEs. If these musical contexts reduced tolerance to lSEs, tolerance to time delays would also decrease.

In this study, the tempo differed from that used in previous studies and this aspect could have affected the tolerance thresholds. Chafe13 and Farner et al.16 employed similar hand-clapping tasks in which the tempo was 86–94 bpm. The interval of the eighth note to which their participants clapped corresponded to 319–349 ms. When converted to an eighth note interval, the musicians performed the two scores prepared by Bartlette et al.5 at tempos of 200–300 ms and 450–600 ms, respectively. Compared with these, the tempo of the metronome in the current experiment, which was 800 ms, was slower. The threshold of synchronisation tolerance for time delays is lower when the tempo is fast than when it is slow7. Additionally, at the same time lag, a high density of the notes involved resulted in a greater deceleration of the tempo than a low density of notes21. Therefore, compared with previous studies, the increased thresholds in this study partly result from slower tempos. As the tempo of the metronome affects the degree of NMA in synchronisations using finger tapping with a metronome46,47, the tolerance threshold to negative lSEs could also change as the tempo changes. When the tempo of the metronome in the synchronisation phase increases, the tolerance threshold to negative asynchrony would decrease in the continuation phase, which could result in the deceleration of the tempo at a shorter time lag than in the present experiment. In future studies on this topic, the effects of the dependency of the tempo on synchronisation tolerance depending on time delay should be investigated using dyadic finger-tapping tasks.

Tempo keeping

As in previous studies43,61,62, the means of the ITIs decreased from the target tempo of 800 ms in the 0 ms time delay condition. The asymmetry of the responses to positive and negative asynchrony34,56 would have caused this acceleration of the tempo. The participants who tapped later would correct their tap timing more strongly on the subsequent tap than their partners who tapped early. This asymmetry of the tap-timing correction could make both participants tap earlier and their tempo could gradually accelerate. Alternatively, Wolf et al.62 proposed a phase advance mechanism based on an oscillating timekeeper to explain the acceleration of the tempo in rhythm coordination activities. The oscillating timekeeper generates a signal if a threshold is exceeded, following which the phase of the oscillator is reset. If an external signal arrives slightly earlier than when the oscillating timekeeper is set to generate the signal, the timekeeper reduces the time it takes to reach the threshold to decrease the asynchrony between the timekeeper’s signal and external one. This results in an advancement of the phase and a shortened period. Wolf et al. indicated that this shortened period would cause the acceleration of the tempo via the period correction process.

In our experiment, when the time lag allowed both participants to tap earlier than the stimuli (negative lSEs), the asymmetry disappeared. Thus, as the time lag increased, the participants would be less likely to tap later than their partners, which would cause higher mean ITIs in the longer time delay conditions. In addition, the earlier taps of both participants rather than the stimuli from the partner caused the phase advance mechanism not to work because the mechanism only came into play when the tap followed the stimuli. In this case, the acceleration of the tempo was also suppressed or stopped. Indeed, the acceleration of the tempo stopped on average in the 160 ms time delay condition in which the mean lSE was around 160 ms. Thus, the participants were able to stop the acceleration of the tempo by tapping 160 ms earlier than the stimuli from the partners on average. This result suggests a strong tendency for the tempo to accelerate in dyadic finger-tapping tasks undertaken by non-musicians.

Timing control mechanism

Although \(\alpha\) and \(\gamma\) did not show significant differences as the time delays extended, \(\beta\) tended to decrease in the 0–80 ms time delay conditions, whereas it increased in the 120–240 ms time delay conditions. However, there was a significant difference between the 80 ms and 120 ms time delay conditions. Therefore, the increased levels of asynchrony and instability in the short and long time delay conditions could be associated with the increased dependency of timing control on the partner’s previous ITIs compared with the intermediate time delay conditions. In musical and clapping ensembles with time delays, adopting a strategy to ignore the signals from a partner can enhance synchronisation performance13,63. Fairhurst et al.64 investigated the effect of this strategy on sensorimotor synchronisation using an adaptive virtual partner, which is an auditory pacing signal that implements period correction. They found that the virtual partner shortened (lengthened) the subsequent IOI when the participants showed negative (positive) asynchrony. Participants who responded weakly to the asynchrony between the tap and stimuli from the virtual partner showed more stable ITIs than those who responded strongly to the asynchrony. Returning to the experiments conducted in this study, although the low dependency of the partner’s ITIs could have resulted from the compensation of negative lSEs and tendency to decrease the tempo, the results suggest that the less the participants followed the partner’s ITIs or tempo, the better the synchronisation performance between the people involved.

According to previous studies65,66,67, the timing control mechanism involves automatic and intentional processes. Therefore, the different dependencies of \(\alpha\), \(\beta\), and \(\gamma\) on the length of the time delay could have resulted from either an automatic or an intentional timing control. The phase correction mechanism in which the timing is controlled using the synchronisation between the tap and external stimuli is an automatic process65. Therefore, our participants could not modify \(\gamma\), which represents the degree of dependence on the previous lSEs. On the contrary, the period correction mechanism in which the tap intervals are modified is an intentional process26,27,65. People refer to the previous asynchrony for the period correction, and the correction occurs when the asynchrony and/or tempo change is consciously recognised27,68. Thus, the participants would have been aware of the change in the lSEs and/or tempo in the short and long time delay conditions and then increased \(\beta\). In the 0 ms and short time delay conditions, they could have intentionally recognised the positive lSEs and/or acceleration of the tempo. As mentioned above, people are more sensitive to positive asynchrony than negative asynchrony34,56. Moreover, they could have been aware of the overly large lSEs and/or deceleration of the tempo in the long time delay conditions. However, we did not directly investigate whether the participants consciously recognised the lSEs and changes in tempo. Hence, the intention toward dyadic synchronisation with time delays should be investigated in future research.

In the present study, the participants changed \(\beta\) along with the time delay, but not \(\alpha\). As mentioned above, Kimura et al. reported that the slower the tempo, the higher \(\beta\) is. These results suggest that people separately control how much to match their own tempo to their partner’s and the extent to which to maintain their own tempo. In fact, different neural circuits were engaged in the continuation tapping task and synchronisation tapping task with a metronome26,27. For example, in the continuation tapping task, the contralateral sensorimotor cortex, supplementary motor area, and basal ganglia were recruited, which would relate to the internal tempo or representation of the presented tempo69,70. By contrast, the cerebellar-premotor network for sensorimotor and audio-motor coordination was engaged in the synchronisation task71,72. The different tendencies toward \(\alpha\) and \(\beta\) along with the tempo29 and time delay could have prompted a modification of the dual process model, which is an internal model for sensorimotor synchronisation using phase and period corrections36,37. As noted earlier, period correction is the mechanism used to match one’s own tempo to that of external signals using the difference between them36 or SEs68. Thus, this model represents the dependence on the external tempo and one’s own tempo as a single parameter. Although it should be noted that the model used in this study represented a phenomenon and was different from the dual process model, which is an internal model for sensorimotor synchronisation, these different dependences could have been separately represented in the internal model.

Conclusions

As expected, a longer tolerance to the time delay was found in the dyadic finger-tapping task than in the musical contexts of previous studies. However, unlike our expectation that NMA would offset the time delay, the tolerance to the time delay was longer than the average NMA in the synchronisation task with the constant-tempo metronome. This could have been because the time delay allowed both participants of the pair to tap ahead of the stimuli from their perspective. In addition, the unreliability in the timing of the stimuli from the partner under a time delay could have been due to the increased tolerance to NMA, which could have caused the large tolerance to the time lag. Consistent with our second expectation of the timing control process, the participants decreased their dependency on their partner’s ITIs up to a certain time lag. This weak dependency on the partner’s ITI could relate to better synchronisation performance under certain time delays than in the other time delay conditions. Uncertainty in the timing of the partner’s taps could result in greater anticipatory responses, which would lead to tolerance to longer time delays and the stabilisation of the tempo in dyadic synchronisation using finger tapping.

Methods

Participants

Sixteen people (eight pairs, five women and 11 men, \(24.25 \pm 2.89\) years old) participated in the experiments. Two of the participants were ambidextrous and the rest were right-handed. They did not have auditory and movement disorders and none of them had any experience in professional musical training. The experiments were authorised by the Research Ethics Review Committee of the Tokyo Institute of Technology and conducted in accordance with the Declaration of Helsinki. Written informed consent was obtained from all the participants in advance.

Stimuli and apparatus

A micro-computer (Arduino Mega2560 Rev3, Arduino, United States) was used to control the timing of the stimuli presented to the participants and record the timings of the tapping tasks. The time resolution of signal processing was less than 1 ms. The tap timings were detected using pressure sensors (FSR-406, Interlink Electronics, United States). Auditory stimuli involving a metronome and partner feedback were transmitted using square waves at 500 Hz with a duration of 100 ms, and they were delivered via earphones (SHE3010WT, PHILIPS, Nederland) to the participants involved. Additionally, white noise was presented to the participants via headphones (HPH-50B, YAMAHA, Japan) to prevent the surrounding noise and noise generated through finger taps.

Task and conditions

This study involved a dyadic synchronisation-continuation tapping task. In the synchronisation phase, the pairs of the participants were presented with 10 metronome tones at a tempo of 800 ms simultaneously. As mentioned in the Introduction, it was considered that the tempo of the dyadic tapping with or without time delays could be faster or slower. Thus, 800 ms was chosen as the initial tempo at which it would be possible to tap stably, even if the tempo was somewhat faster or slower in the continuation phase. They began finger tapping from the fifth tone to achieve synchronisation with the metronome. After the metronome stopped, the continuation tapping task started immediately. In this task, both the participants were presented with auditory stimuli delivered with a specific time lag after the partner’s finger taps. The participants did not receive any auditory stimuli when they tapped. They were instructed to prioritise synchronising with the feedback auditory stimuli and maintain the tempo of the metronome as much as possible. Seven time delay conditions (0, 40, 80, 120, 160, 200, and 240 ms) were prepared.

Procedure

This experiment was conducted in a sufficiently silent room. The two participants sat back-to-back at two separate desks. They wore eye masks to avoid visual information influencing their performance. The participants tapped the centre of the pressure sensor using the index finger of their dominant hand. The volume of the auditory stimuli and white noise was set at a comfortable level for each participant and adjusted throughout the experiment. After an auditory cue for the first trial, 10 metronome tones at a tempo of 800 ms were presented to both the participants simultaneously. After the metronome tones stopped, the participants performed the continuation tapping task. Each trial ended when both participants completed 100 taps after the metronome stopped. Each pair conducted four sessions, including one trial per time delay condition. Therefore, seven trials were conducted in one session. In total, 28 trials were conducted. In each session, the order of the trials involved was randomised and counterbalanced between the pairs. Before starting the experiment, practical trials shortened to 20 taps were conducted for all seven conditions. Throughout the duration of the experiments, the participants were allowed to rest for one minute between trials.

Analysis

The tap-timing data were analysed using MATLAB (MATLAB R2020a, MathWorks, United States) and R (Ver. R 4.0.0, R Development Core Team). We tested the effect of the delays on the means and SDs of the gSEs, means and SDs of the ITIs, and parameters (\(\alpha\), \(\beta\), and \(\gamma\)) using the Wilcoxson signed-rank test. The means and SDs of the standardised lSEs were tested using a two-way mixed ANOVA. Adjusted p-values were calculated using the Benjamini–Hochberg method.

Variables

As the indicator of timing synchronisation accuracy, the absolute value of the gSE, which is the timing difference between the paired participants’ taps, was calculated for each time delay condition. The two participants in each pair were identified as Participants A and B. Let the n-th tap timing of Participant A be \(R_A(n)\) and that of Participant B be \(R_B(n)\). The first tap was defined as the tap just after the metronome stopped. According to these definitions, the variables were calculated as shown in Eq. (1). In the equations, i and j represent the identifications of the participants involved. The gSEs increased as the ITIs of the participants increased46,47,48,49. Therefore, the gSEs were standardised by dividing them by the corresponding ITIs (Eq. 2) and they were averaged between the participants in each time delay condition. To investigate the stability of the synchronisation between the two participants, the SDs of the gSEs were also calculated. Variations in the gSEs increased when the ITIs of the participants increased50,51,52,53,54. Therefore, the gSEs were also standardised using the participants’ ITIs when the SDs were calculated. In addition, we calculated the SEs from the perspective of each participant (i.e. lSEs). The timing of the feedback auditory stimuli to Participant B, which was triggered by the n-th tap of Participant A, was denoted as \(S_A (n)\) and the timing of the auditory feedback stimuli from Participant B to Participant A was denoted as \(S_B (n)\) (Eq. 3). The lSEs were then calculated, as shown in Eq. (4). We also calculated the standardised lSEs with the corresponding ITIs (Eq. 5). As an indicator of how well the participants could maintain the tempo of the metronome, their ITIs (Eq. 6), which represented the tempos of each participant’s tap timings, were investigated.

$$\begin{aligned}&gSE_i(n) = R_i (n) - R_j (n) \end{aligned}$$
(1)
$$\begin{aligned}&Standardised\> gSE_i(n) = \frac{gSE_i(n)}{ITI_i(n)} \end{aligned}$$
(2)
$$\begin{aligned}&S_i (n) = R_i (n) + lag \end{aligned}$$
(3)
$$\begin{aligned}&lSE_i (n) = R_i (n) - S_j (n) \end{aligned}$$
(4)
$$\begin{aligned}&Standardised\> lSE_i(n) = \frac{lSE_i(n)}{ITI_i(n)} \end{aligned}$$
(5)
$$\begin{aligned}&ITI_i (n) = R_i (n) - R_i (n-1) \end{aligned}$$
(6)

Analysis of the timing control mechanism

To investigate the influence of the participants’ own and their partners’ previous ITIs and SEs on the subsequent tap timing, we used multiple regression analysis on the timing control model similar to that proposed in a previous study29. Instead of the gSE, which has been used in a previous study, we used the lSE because the participants of the current experiment did not perceive a difference between both participants’ tap timing (gSE), but they perceived a difference between their tap timing and the auditory feedback stimuli (lSE). The model can be expressed as follows:

$$\begin{aligned}&\delta ITI_i (n) = \alpha _i ITI_i (n-1) +\beta _i ITI_j (n-1) +\gamma _i lSE_i (n-1) \end{aligned}$$
(7)
$$\begin{aligned}&\delta ITI_i (n) = ITI_i (n) - ITI_i (n-1) \end{aligned}$$
(8)

where we set the \(\delta\)ITIs (Eq. 8) as a dependent variable and the participant’s own as well as their partner’s ITIs and lSEs as explanatory variables. The parameters \(\alpha\), \(\beta\), and \(\gamma\) represent the degree of each variable’s influence on the \(\delta\)ITIs. To ensure that the parameters showed the influence of each factor, we standardised all the variables by converting them to a distribution with a mean of zero and a variance of one.

In multiple regression analysis, the dependent and explanatory variables must be stationary. However, in dyadic synchronisation-continuation tapping tasks, the tempo often accelerates29,43. Moreover, the duration of hand clapping, including the time delay between two people, shows the acceleration and deceleration of the tempo13. These findings demonstrate the possibility that the time-series data of paired synchronisation-continuation tapping tasks containing time delays represent the long-term trend. In this study, we conducted third-order polynomial detrending on the ITI series, reconstructed the time-series data based on these detrended ITIs, and recalculated the lSEs and \(\delta\)ITIs involved. After these variables were standardised, we performed an ADF test to verify the stationarity of the data. The lag order of the ADF test was decided using the Akaike information criterion. When non-stationarity was detected in a series of a specific participant and condition, the data including that series were excluded from the multiple regression analysis. Therefore, the results of the regression analysis comprised only stationary data.