Rhythmic Features of Movement Synchrony for Bonding Individuals in Dyadic Interactions

The physical world and our biological processes have their own cyclical and periodic rhythms, which nonetheless can be attuned or synchronized with each other, as exemplified by the tuning of our biological processes to the earth’s orbit of the sun, which occurs over the course of every 24 h (circadian rhythm; Winfree 1967). Similarly, face-to-face interactions have rhythmic characteristics, and conversation behaviors such as on–off vocal activity have a rhythm and tempo synchronized between interactants (Warner 1979, 1992a, b). These phenomena have been termed interpersonal coordination (Bernieri and Rosenthal 1991). Previous studies have illustrated that coordination represents and/or creates rapport (Bernieri et al. 1988; Tickle-Degnen and Rosenthal 1990; Vacharkulksemsuk and Fredrickson 2012), and recent meta-analyses have confirmed the robustness of coordination’s positive social outcomes, including bonding individuals during both verbal and nonverbal communication (Mogan et al. 2017; Vicaria and Dickens 2016).

Interpersonal Coordination, Mimicry, and Synchrony

Bernieri and Rosenthal (1991) conceptualized interpersonal coordination as “the degree to which the behaviors in an interaction are nonrandom, patterned, or synchronized in both timing and form).” They differentiated this phenomenon into two facets: behavior matching and synchrony. Early literature defined behavior matching as the similarity of body postures between interactants, and researchers focused on the interactant’s postural congruence, which was found to lead to rapport (LaFrance 1976, 1979). Recent literature has described behavioral matching as a form of mimicry characterized by an automatic tendency to imitate another’s behavior at a particular moment in time (e.g., Chartrand and Bargh 1999; Lakin 2013; Vicaria and Dickens 2016). The target of behavioral mimicry is broad; it includes posture, gestures, mannerisms, and other motor movements (for a review, see Chartrand and Lakin 2013; Lakin 2013). Behavioral mimicry is typically assessed by examining whether the same or similar behaviors co-occur at a given point in time or whether the presented behavior is mimicked by an interactional partner within a short time frame.

Bernieri and Rosenthal (1991) identified three components of synchrony: interaction rhythms, simultaneous movement, and behavioral meshing. These components loosely refer to the convergence of rhythm and timing. In general, synchrony research uses continuous or discrete time series data with constant sampling rates, which are usually obtained through video analysis, motion tracking, and psychophysiological and neurophysiological techniques (for a review, see Cornejo et al. 2017). Temporal coordination, or rhythmic similarity, has been investigated through the observation and analysis of rhythmic joint task situations such as swinging pendulums (Schmidt and O’Brien 1997), rocking in rocking chairs (Richardson et al. 2007), stepping (Miles et al. 2010), and arm curls (Miles et al. 2011).

Spectrum Analysis for Synchrony Research

Spectrum analysis is a powerful tool to capture rhythmic features of a phenomenon like communication behavior, as it deconstructs a complex time series into its periodic components. The Fourier transform is a well-known type of spectrum analysis used in psycholinguistic research; for instance, Warner employed this technique to extract rhythmic and cyclic characteristics from interactants’ on–off (non-content) vocal activity during face-to-face conversation (Warner 1979, 1992a, b). In addition to vocal activity, spectrum analysis can be used to evaluate movement synchrony from the perspective of interaction rhythms. Coherence measurement through cross-spectrum analysis examines the proportion of variance at each frequency component in a series from one individual that can be predicted by the other series to the partner, forming an index of rhythmic similarity between interactants at each frequency component (Fujiwara and Daibo 2016; Schmidt et al. 2012; Schmidt and O’Brien 1997).

In applying spectrum analyses to synchrony research, it is important to note that some linear analysis methods, such as the Fourier transform, assume a stable frequency or a repetitive pattern throughout the entire interaction (Issartel et al. 2006). Thus, the Fourier transform operates best during periodic or rhythmic joint task situations that generate stable rhythmic properties, i.e., a repetitive pattern of movements, and it may be less effective in capturing the frequencies of “unstructured” conversations, which often involve less stable rhythms. In such situations, the Fourier transform would not be the best means of quantifying rhythmic similarity.

To overcome this limitation, a modified linear analysis, which involves a short-time windowed technique, can be adopted (see also Issartel et al. 2006). However, the results acquired using this technique depend greatly on the window size. Small windows have a good time resolution. They can also extract local information in rhythmic fluctuations, whereas their frequency resolution is generally poor; especially, given that a slow tempo cannot be extracted from a short time series. By contrast, large windows have good frequency resolution, and they can extract global information from interaction. However, the time resolution worsens as local information is generally not extracted in rhythmic fluctuations. If there is no clear assumption on the time-series signal, such as investigating an unstructured interaction, then researchers cannot easily determine the window size.

The wavelet transform does not require stationarity in each time series and, thus, can be a possible alternative to the aforementioned linear methods. This approach employs cross-wavelet analysis to determine the cross-Wavelet Coherence (WTC), which measures the similarity between two time series at each frequency component on a range from 0 to 1, whereby a WTC of 1 reflects perfect convergence between the two movements, while 0 reflects no congruence (Richardson et al. 2007; Schmidt and O’Brien 1997). This metric may be interpreted as a positive bivariate correlation whereas it is a measurement in the frequency domain. In addition, the wavelet approach is superior to short-time windowed techniques because of its multi-scale properties, which help to precisely detect the properties of a complex signal (Issartel et al. 2006). Thus, we can assess rhythmic convergence at each frequency component at one time (without iterating windowed analysis by shortening/extending the time window). Several studies have applied wavelet transform to identify indices of rhythmic convergence. For example, Washburn et al. (2014) collected body movement data in dance settings and found that trained dancers’ WTC was significantly higher than that of non-dancers, indicating that dancers achieved a higher level of rhythmic coordination with their partners. WTC has also been applied to extract rhythmic convergence in a face-to-face, unstructured conversation (Fujiwara and Daibo 2016; Issartel et al. 2015; Schmidt et al. 2014). Fujiwara and Daibo (2016) observed significantly higher WTC values among genuine pairs who were engaged in actual conversation than among a pseudo pair that produced virtual data.

By using spectrum analyses, including the wavelet transform, the convergence of timing can also be quantified as a relative phase. A relative phase of 0° or an in-phase patterning indicates movement in the same part of the cycle at a given time. By contrast, a relative phase of 180° or an anti-phase patterning shows movement in the opposite part of the cycle at a given time. Intuitively, the in-phase patterning of two signals can be interpreted as a positive correlation (e.g., r = + 1.0) in their interaction timeline whereas the anti-phase patterning can be illustrated as a negative correlation (e.g., r = − 1.0). Although previous studies have demonstrated that synchronized movement tends to stabilize either in- or anti-phase patterning (e.g., Schmidt et al. 2012), the role of relative phase is unclear in face-to-face unstructured conversation (Fujiwara and Daibo 2016). To better understand synchrony using spectrum analysis, this study employs relative phase information, as well as coherence.

Rhythmic Features of Our Daily Interaction and Synchrony

The rhythmic features of human communication behavior have been investigated in various fields. Psycholinguistic research has associated on–off vocal activities in dyadic face-to-face interactions with slow-tempo rhythmic cycles (e.g., 150–300 s per cycle; Warner 1992a), whereby a 200-s cycle typically comprises about 100 s of speaking (long or frequent vocalizations), followed by about 100 s of listening (long or frequent pauses) (Warner 1979). Other studies have focused on the temporal characteristics of speech in examining speech rates (e.g., Apple et al. 1979), multi-scale clustering (Abney et al. 2015), and interval distribution (Abney et al. 2018). Recent studies have examined the ways that rhythmic and temporal structures of vocal communication converge to facilitate pro-social behavior (Manson et al. 2013).

In dyadic face-to-face interactions, both bodily movements and vocal activities have rhythmic characteristics that can be interpreted in terms of speech encoding processes (Davis 1982; Dittmann and Llewellyn 1969; Pelose 1987), and the rhythm of bodily movements is also synchronized (Bernieri and Rosenthal 1991). Previous research has demonstrated that bodily movement during unstructured interaction is more easily synchronized under 0.5 Hz (i.e., one time per 2 s) (Fujiwara and Daibo 2016). The same frequency was observed in another experiment employing a periodic rhythmic task (Schmidt et al. 2012); however, engaging in a specific task also makes it possible to achieve synchronization at a faster rhythm (e.g., at about 1.33 Hz; Schmidt et al. 2014). Bodily movements around 1.0 Hz presumably include hand gestures and nods, which may occur repetitively. By contrast, large movements such as postural sway and leg crossing are captured in much slower rhythm. Although these studies provided some implication about the rhythmic properties of movement synchrony in human interactions, precisely which rhythmic features of movement synchrony contribute to bonding individuals remains unclear. To address this question, Study 1 explored which frequency band(s) of movement synchrony distinctively impact on rapport in dyadic interactions between same-sex strangers.

Representing and Bonding: The Extent of Synchrony and Its Bonding Effect

The role of synchrony in interpersonal relationships may be considered from the perspectives of representing and bonding. On the one hand, Communication Accommodation Theory (CAT) proposes that conversational features, including vocal patterns or gestures, become convergent with increasing intimacy between participants (Giles et al. 1991), and several studies have found that relationship profile (e.g., closeness) precedes synchrony, and thus, the extent of synchrony seems to represent the rapport level or relationship quality between interactants. Bernieri et al. (1988) demonstrated that infants showed more coordinated movement toward adults with whom they were acquainted than with unacquainted adults, and friends have been found to exhibit more synchrony compared to strangers (Latif et al. 2014). Similarly, synchronous movement (such as stepping) has been observed to decrease in situations in which the relationship is not well-established (Miles et al. 2010). Karremans and Verwijmeren (2008) reported that individuals in a romantic relationship exhibit less mimicry to an attractive opposite-sex other, which represents their relational availability. The other type of coordinated movement, mimicry, toward opposite-sex partners can also be considered as a relationship status function.

In contrast, a number of other studies have demonstrated ways in which synchrony can work as a form of social glue that creates social bonds (e.g., Mogan et al. 2017; Vacharkulksemsuk and Fredrickson 2012; Vicaria and Dickens 2016) even after belongingness needs are threatened (Lakin et al. 2008). From the perspective of bonding, strangers, rather than friends, would take the benefit of bonding effect. In studies of children, the bonding effect of synchrony was salient for out-group members via the minimal group paradigm; however, its impact was significantly less among in-group members (Tunçgenç and Cohen 2016). Similarly, another study found that out-group members showed more synchrony in performing repetitive rhythmic actions than the in-group ones (Miles et al. 2011). Given that synchrony provides an effective means for individuals to develop relationships, such results can be considered from the perspective of motivational processes whereby friends (or in-group members) with already close relationships may have little need to further deepen their intimacy, while strangers are motivated to diminish minor interpersonal differences in order to accomplish a task. In their behavioral mimicry study, Lakin et al. (2008) illustrated that individuals excluded by an in-group member selectively exhibited increased coordinated behavior (i.e., mimicry) toward a subsequent interaction partner from the in-group. In a situation in which friendship or in-group membership has been established, synchrony will not work well as social glue because their goal of establishing their affiliation has been already fulfilled.

To investigate these two non-exclusive possibilities regarding the representing and bonding effects of synchrony (i.e., rhythmic similarities), Study 2 manipulated interaction partners such that participants were alternately engaged with a same-sex stranger or a friend. Then, the extent of synchrony and its bonding effect was investigated from the perspectives of representing and bonding. From the perspective of representing, it was expected that interactions between friends would exhibit more synchrony, while from the perspective of bonding, the bonding effect of synchrony would be more salient for strangers than for friends. All procedures performed in both the studies were approved by the Ethics Committee of the Faculty of Human Sciences, Osaka University of Economics.

Study 1

Method

Participants

In exchange for extra course credit, 88 Japanese undergraduates volunteered to participate in the experiment. Each participant was randomly paired with a same-gender stranger. In two cases, the conversations were not captured due to a malfunction in the video equipment and were removed from a subsequent analysis, such that a total of 42 dyads from 84 participants (46 males, 38 females, Mage = 18.77, SDage = 1.07) were analyzed.

Procedures

First, the participants completed a consent form while seated back-to-back; then they were seated opposite to each other at a distance of 80 cm and were instructed to become acquainted with each other by engaging in a 6-min conversation about any topic of their choice. The conversations were video-recorded using a camcorder placed 330 cm away from and to the right side of the participants. After the conversation, each participant completed a questionnaire on rapport.

Rapport Questionnaire

Perceived rapport was measured using three items selected from Bernieri et al. (1996): cooperative, involving, and awkward (reversed) (α = .83) on an 8-point scale ranging from 1 (not at all) to 8 (very much). This short version was created from the translated scale of Bernieri’s 18-items scale for the Japanese participants and its reliability was sufficient (Kimura et al. 2005).

Generating Time Series Movement Data

To generate time series movement data, body movements were characterized using Motion Energy Analysis (MEA; Ramseyer and Tschacher 2011; see www.psync.ch for details). Rather than using manual observer ratings, this technique employed signal processing to quantitatively analyze body movements. Users selected a region of interest (ROI) in the video image, and the MEA software then automatically calculated the change of gray-scale pixels between consecutive video frames, which were encoded as movements. In this study, the whole body of each interactant was covered as an ROI, and sampling frequency was set to 30 Hz—equal to the video frame rate.

Quantification of Synchrony

The WTC was calculated for each dyadic time series using MATLAB (Mathworks) and the wavelet toolbox (Grinsted et al. 2004). Grinsted et al.’s (2004) default parameters were employed, except that following Issartel et al. (2006), the number of order was set to 8. Morlet was used as the mother wavelet. The cone of influence (COI) area was not included for subsequent analysis (Grinsted et al. 2004). An averaged WTC value less than 4 Hz was extracted through the whole timeline because the unstructured conversation of strangers was not active or fast (Fujiwara and Daibo 2016). Each averaged WTC was standardized using a Fisher’s Z transformation before conducting statistical analysis. Additionally, to investigate which frequency band of synchrony was distinctive to create rapport, we calculated the average coherence of each frequency band (under 0.025 Hz, 0.025–0.05 Hz, 0.05–0.1 Hz, 0.1–0.2 Hz, 0.2–0.5 Hz, 0.5–1 Hz, 1–2 Hz, 2–3 Hz, and 3–4 Hz, respectively).

The proportion of relative phase in nine 20° regions from 0° to 180° was also calculated. Following previous studies (e.g., Schmidt et al. 2012), the region of 0°–20° was defined as in-phase patterning and the region of 160°–180° was defined as anti-phase patterning, and we targeted the area where the cross-wavelet spectrum was significant (Issartel et al. 2015). The number of occurrences in each 20° region was counted, and the percentage distribution was calculated for each pair. The proportion of in- and anti-phase regions was transformed using arcsine transformation, which was used in the subsequent analysis.Footnote 1

Commonality Analysis

The WTC of each frequency band is usually highly correlated, particularly when they are situated close to each other, which could bias the results of a linear model. Therefore, commonality analysis (Kraha et al. 2012; Nimon and Oswald 2013) was employed to address the risk of multicollinearity. Commonality analysis decomposes regression R2 into its unique and common effects, such that unique effects indicate how much variance is uniquely accounted for by a single predictor, and common effects indicate how much variance is common to a predictor set. Commonality analysis was conducted for 42 dyads to explore which frequency band of synchrony impacts on rapport, whereby the score of perceived rapport was averaged with each dyad.

Hierarchical Linear Modeling

The current study’s IV and DV were technically on different levels. The WTC and the proportion of in- and anti-phase patterning were measured on the dyadic level (level 2: Ndyad = 42), while perceived rapport was measured on the individual level (level 1: Nindividual = 84). Hierarchical linear modeling (HLM) was applied with the dyad designated as the random effect in order to investigate the impact of each frequency band of synchrony on perceived rapport. The predictor was the WTC of the specific frequency band confirmed by the commonality analysis, which was centered prior to analysis, and several models were compared to determine their fitness (i.e., AIC score).

Pseudo Synchrony Testing

Additional analysis on random pairs of participants (Pseudo Synchrony Paradigm; Bernieri and Rosenthal 1991) was conducted to the test the validity of HLM analysis. This analysis tested whether predictors of randomly shuffled pseudo pairs could not be significantly associated with the report of their rapport.

Results

The coherence value of each frequency band (under 0.025 Hz, 0.025–0.05 Hz, 0.05–0.1 Hz, 0.1–0.2 Hz, 0.2–0.5 Hz, 0.5–1 Hz, 1–2 Hz, 2–3 Hz, and 3–4 Hz) was first submitted to a one-way ANOVA with a within-subject variable, and the result indicated that the main effect of frequency band was significant, F(8, 328) = 27.25, p < .001, η2p = 0.40. Holm’s multiple comparison identified 0.5 Hz as the boundary of the extent of synchrony. The coherence value decreased as the frequency become higher, and there was a significant difference between each frequency band. Additionally, the WTC of each frequency band was significantly correlated with those adjacent to it (Table 1).

Table 1 The mean score and correlation of WTC in each frequency band

Commonality Analysis

A simple multiple regression demonstrated that some WTCs were highly correlated with each other, biasing the results such that the WTC of each frequency band had no significant effect on rapport (Ndyad = 42, M = 5.94, and SD = 1.23). On the contrary, the result of commonality analysis suggested that the WTC under 0.025 Hz, 0.5–1 Hz, and 1–2 Hz had a positive impact on rapport (Table 2) such that the WTC under 0.025 Hz had a relatively strong unique effect, while the WTCs of 0.5–1 and 1–2 Hz had an almost common effect, which suggested that the WTC around 1.0 Hz could have an impact on rapport.

Table 2 Summary of commonality analysis (Ndyad = 42)

Hierarchical Linear Modeling

First, the HLM confirmed that the WTC under 4 Hz (Ndyad = 42, M = .288, SD = .046) significantly predicted rapport (Nindividual = 84, M = 5.94, SD = 1.47) (b = 9.98, SE = 4.12, p = .020, AIC = 298.3). Second, based on the results of commonality analysis, the WTC under 0.025 Hz, 0.5–1 Hz, and 1–2 Hz was considered as a possible predictor at each frequency band. However, the WTCs of 0.5–1 Hz and 1–2 Hz had a higher correlation, such that they seemed not to be included in the same model; rather, the WTC of 0.5–1.5 Hz (i.e., 0.5 Hz around the 1.0 Hz mark) was newly created (ndyad = 42, M = .250, SD = .029). A total of seven models were compared with the fitting value to identify the AIC score, and four models had WTCs of under 0.025 Hz, 0.5–1 Hz, 0.5–1.5 Hz, and 1–2 Hz, respectively. Three models included two predictors, one of which was the WTC of under 0.025 Hz, while the other encompassed each of the WTCs of 0.5–1 Hz, 0.5–1.5 Hz, and 1–2 Hz. The results showed that the WTC of 0.5–1.5 Hz indicated the lowest AIC score in the single predictor model, which had a significant positive impact on rapport. Among all the models, those with the lowest and second lowest AIC scores each consisted of the two predictors, such that the WTCs of under 0.025 Hz and 0.5–1.0 had a significantly positive impact on rapport (Table 3).

Table 3 Summary of HLM with a two-predictor model

Similar to the WTC, in HLM analysis, the predictor of rapport was the proportion of relative phase (arcsine transformed) under 4 Hz (Ndyad = 42, Min-phase = 116, SDin-phase = .112, Manti-phase = .102, SDanti-phase = .083) and in the four frequency bands (under 0.025 Hz, Ndyad = 31, Min-phase = .093, SDin-phase = .166, Manti-phase = .194, SDanti-phase = .387; 0.5–1 Hz, Ndyad = 42, Min-phase = .087, SDin-phase = .062, Manti-phase = .112, SDanti-phase = .072; 1–2 Hz, Ndyad = 42, Min-phase = .087, SDin-phase = .042, Manti-phase = .122, SDanti-phase = .055; and 0.5–1.5 Hz, Ndyad = 42, Min-phase = .081, SDin-phase = .048, Manti-phase = .121, SDanti-phase = .057, respectively). However, the results of HLM indicated that there was no significant influence of in- and anti-phase patterning even in the single and two-predictor models (all ps > .135).

Pseudo Synchrony Testing

To test the validity of HLM analysis, Pseudo Synchrony Paradigm (Bernieri and Rosenthal 1991) was employed. Here the ratio of relative phase was not examined because only WTC could predict the reports of rapport in the original analysis. Results showed that WTC under 0.025 Hz (Ndyad = 42, M = .292, SD = .154), 0.5–1.5 Hz (M = .237, SD = .019), and under 4 Hz (M = .248, SD = .019) were significantly lower in the pseudo than the genuine pairs who engaged in their conversation, t(78.33) = 3.48, p < .001, d = 0.758; t(70.67) = 2.58, p = .012, d = 0.530; t(55.99) = 5.47, p < .001, d = 1.203; respectively. Furthermore, HLM results showed that WTC under 4 Hz did not predict the rapport (b = − 2.59, SE = 8.59, p = .764) of the pseudo pairs. In the two-predictor model, each of the pseudo pair’s WTC under 0.025 Hz (b = − 0.25, SE = 1.06, p = .816) and 0.5–1.5 Hz (b = − 1.21, SE = 8.70, p = .890) did not predict their rapport.

Discussion

Study 1 investigated which frequency bands of synchrony best predicted rapport between individuals. The results generally aligned with previous experiments in showing that the overall synchrony (i.e., the rhythmic similarity of under 4 Hz) was positively associated with rapport, which is consistent with the general idea that synchrony contributes to creating rapport between interactants (Bernieri and Rosenthal 1991; Tickle-Degnen and Rosenthal 1990; Vacharkulksemsuk and Fredrickson 2012). However, this study was conducted with the more precise goal of investigating rhythm or rhythmic congruence, and the results showed that the movement of similar rhythms contributes to creating rapport between interactants. This finding was validated by pseudo synchrony testing.

Among the novel results of this study is our finding that the specific frequency band of movement synchrony (i.e., the WTCs of under 0.025 Hz and around 1.0 Hz) distinctively predicted rapport. The WTC of each frequency band was significantly correlated, which resulted in biased results for multiple regression; however, by conducting commonality analysis to decompose regression R2 into its unique and common effects, the current study was able to isolate the distinctive rhythm that creates rapport. The higher value of the WTC around 1.0 Hz, which represents a tempo of one time per second, reflects the synchronous repetitive movement of this middle tempo, which presumably includes hand gestures and/or nodding. In contrast, the higher value of the WTC of the much slower tempo under 0.025 Hz would reflect loosely-coupled features of movement. To measure the WTC of under 0.025 Hz, a wider time window of at least 40 s is required (0.025 Hz represents one tempo per 40 s). Such a wide time window can encompass various behaviors characterized as movements, such as postural sway and crossing legs while also performing hand gestures and nodding. These movements may occur slowly and/or frequently in the given time window. On–off vocal activity was associated with similar loosely-coupled rhythmic features with approximately 3- and 6-min cycles (0.0056 Hz and 0.0027 Hz, respectively; Warner 1979, 1992a). This type of rhythmic convergence can have a different meaning in the middle tempo of synchrony (i.e., around 1.0 Hz).

While the convergence of rhythm (i.e., WTC) significantly predicted rapport, our results diverged from those of previous studies in that the relative phase (i.e., the proportion of in- or anti-phase patterning) was not found to be such a predictor (Hove and Risen 2009; Miles et al. 2009). The difference in findings could be attributed to the conversational setting and the characteristics of movement synchrony. In an unstructured conversational situation, interactants can voluntarily take turns. Moreover, unlike on–off vocal activity, which requires anti-phase patterning of one interactant’s speech and the other interactant’s silence, our participants’ movement was relatively free, which would fluctuate the relative phase patterning along with each conversation (see also Fujiwara and Daibo 2016). Thus, the ratio of in- and anti-phase patterning might not be sufficiently salient to have an impact on the perceived rapport, or pair-level fluctuation might mask its influence on perceived rapport: the phase pattern would be self-organized in each pair, such that one pair interacts mainly by in-phase patterning, while the other interacts by anti-phase patterning.

Study 2

Study 1 demonstrated the convergence of rhythm (i.e., the WTC) under 0.025 Hz and around 1.0 Hz as well as under 4.0 Hz was positively associated with rapport. Study 2 manipulated the dyads by changing the interaction partners to investigate if the same frequency bands of rhythmic convergence represented the quality of the relationship, and/or if they could work as social glue.

Method

Participants

In exchange for extra course credit, 152 Japanese undergraduates participated in the experiment. In one case, the dyad’s conversations were not captured due to a malfunction in the video equipment. This pair was removed from a subsequent analysis, resulting in the analysis of a total of 75 dyads from 150 participants (male = 50, female = 100, Mage = 19.09, SDage = 1.05). 33 of the dyads comprised same-gender strangers (male dyad = 13, female dyad = 20), and 42 dyads comprised pairs of same-gender friends (male dyad = 12, female dyad = 30), whereby the length of the friendships ranged from 1 to 170 months (M = 20.43, SD = 28.88). For the stranger dyads, each participant was randomly paired with a same-sex partner, whereas friends adjusted their schedules in order to arrive at the laboratory together.

Procedures

After the participants completed the consent forms in their separate booths, they were seated opposite one another (approximately 100 cm apart), and the different types of dyads were instructed to either become acquainted with each other or deepen their acquaintance by engaging in a 5-min conversation regarding topic(s) of their choice. Their conversations were video-recorded using a camcorder placed 280 cm away from and to the right side of the participants. After the conversation, participants completed a questionnaire concerning the motivation to develop or deepen a friendly relationship with their partner.

Questionnaire of the Motivation to Develop a Friendly Relationship with the Partner

The motivation to develop or deepen a friendly relationship with the partner was measured using three items on a 7-point scale from 1 (not at all) to 7 (very much): “I would like to keep company with the partner”; “I would like to know more about the partner”; and “I would like to get closer to the partner” (a = .95).

Generating Time Series Movement Data and Quantification of Synchrony

The method of generating time series movement data and quantification of synchrony was identical to that used in Study 1. After generating time series movement data via MEA (Ramseyer and Tschacher 2011), the study calculated the WTC and the ratio of relative phase in the frequency bands of under 4 Hz, under 0.025 Hz, and 0.5–1.5 Hz (around 1.0 Hz).

Hierarchical Linear Modeling

As in Study 1, Study 2 employed HLM because the IV and DV were on different levels. The influence of the WTC and the ratio of relative phase in the dyadic level (level 2: Ndyad = 75) on the motivation in the individual level (level 1: Nindividual = 150) was examined. Dyad was included as the random effect.

Pseudo Synchrony Testing

In Study 2, validity testing was conducted to assess whether the predictors of randomly shuffled pseudo pairs could not be significantly associated with motivation.

Results

First, a separate t test indicated that the WTC under 4 Hz was significantly higher for the dyads with friends compared to those with strangers (Table 4). However, the WTC of each frequency band was not significantly different from that of the others.

Table 4 The mean score of the WTC

Second, the result of the HLM showed that the WTC under 4 Hz (Ndyad = 75, M = 5.49, SD = 1.45) significantly predicted the motivation to develop a friendly relationship. Further, and more importantly, the interaction effect of the WTC under 4 Hz and the pre-existing friendship was significant (Table 5), which indicated that synchrony was predictive of the motivation to develop a relationship if the participant interacted with a stranger (b = 11.20, SE = 3.16, p < .001), but did not do so during interactions between friends (b = 2.37, SE = 2.83, p = .405). The WTC of 0.5–1.5 Hz showed the same pattern as the WTC less than 4 Hz (Fig. 1), whereas the WTC under 0.025 Hz indicated a different pattern. This result might suggest the influence of a possible statistical artifact (i.e., the ceiling effect) because the motivation score was too high among friends; however, this interpretation is undermined because the significance of results did not change in cases when the highest scoring answers (i.e., 7 points) were removed (Table 5).

Table 5 The bonding effect of movement synchrony
Fig. 1
figure 1

Moderating effect of pre-existing friendship on synchrony’s bonding effect

The proportion of in- and anti-phase patterning (arcsine transformed) was compared between friends (under 4 Hz, Ndyad = 42, Min-phase = .152, SDin-phase = .093, Manti-phase = .076, SDanti-phase = .029; under 0.025 Hz, Ndyad = 11,Min-phase = .483, SDin-phase = .532, Manti-phase < .000, SDanti-phase < .000; and 0.5–1.5 Hz, Ndyad = 42, Min-phase = .119, SDin-phase = .038, Manti-phase = .105, SDanti-phase = .033, respectively) and strangers (under 4 Hz, Ndyad = 33, Min-phase = .130, SDin-phase = .057, Manti-phase = .095, SDanti-phase = .053; under 0.025 Hz, Ndyad = 10, Min-phase = .088, SDin-phase = .147, Manti-phase = .089, SDanti-phase = .191; and 0.5–1.5 Hz, Ndyad = 33, Min-phase = .121, SDin-phase = .061, Manti-phase = .102, SDanti-phase = .054, respectively). While the in-phase patterning under 0.025 Hz was significantly different between dyads of friends and those of strangers, t(15.68) = 2.64, p = .018, d = 0.95, the sample size appeared to be too small to make any firm conclusions, as the other differences did not reach the level of significance (|t|s < 1.84, all ps > .072). Moreover, the results of HLM indicated that there was no significant influence of in- and anti-phase patterning (including the interaction effect of the pre-existing friendship) on the motivation (all ps > .070).

As in Study 1, the results of the pseudo synchrony testing showed that WTC under 0.025 Hz (Ndyad = 75, M = .255, SD = .151), 0.5–1.5 Hz (M = .220, SD = .021), and under 4 Hz (M = .245, SD = .016) were significantly lower in the pseudo than in genuine pairs, t(140.87) = 7.21, p < .001, d = 1.183; t(133.72) = 4.76, p < .001, d = 0.750; t(93.79) = 10.94, p < .001, d = 1.782; respectively. Furthermore, HLM results showed that the WTC under 4 Hz did not predict the motivation of pseudo pairs (b = − 3.06, SE = 7.48, p = .683). In the two–predictor model, each of the pseudo pair’s WTC under 0.025 Hz (b = 0.17, SE = 0.80, p = .836) and 0.5–1.5 Hz (b = − 8.24, SE = 5.85, p = .164) did not predict motivation.

Discussion

Study 2 demonstrated that the convergence of rhythm represented the interactants’ pre-existing friendships, whereby dyads of friends showed higher rhythmic similarity, yet such convergence also has a bonding effect for interactants, as dyads of strangers were motivated to cultivate friendly relationships based on the rhythmic convergence in their conversation behavior (i.e., movement). Particularly in the latter case, rhythmic convergence around 1.0 Hz worked well as social glue.

Previous studies indicated that the extent of synchrony represents rapport level between interactants (Bernieri et al. 1988; Latif et al. 2014). CAT (Giles et al. 1991) considers the convergent process of conversational behavior based on the intimacy between interactants. The results of Study 2 provided an additional empirical support for this framework from the perspective of rhythm, such that the more intimate the relationship between interactants, the more convergent their conversational rhythm will be. However, this result refers only to the whole rhythm and not to the specific frequency band. While Study 1 illustrated the distinctive rhythms for predicting rapport (i.e., under 0.025 Hz and around 1.0 Hz), these frequency bands were not significantly different between stranger- and friend-pairs. This finding might be partially due to the brevity of the conversations, which may not have provided sufficient time to differentiate between friends and strangers. On the contrary, the similarity might be attributable to our instruction that participants get acquainted with each other, as coordination is a strategy to achieve the goal of affiliation (Lakin and Chartrand 2003). In other words, setting affiliation as a goal may have encouraged the participants to facilitate synchrony, which resulted in blurring the differences between friends and strangers.

Study 2 also found that synchrony had a bonding effect for interactants, which was moderated by pre-existing friendship. While numerous studies have demonstrated that synchrony works as social glue (Mogan et al. 2017; Vicaria and Dickens 2016), other studies have indicated that the saliency of the bonding effect could slightly differ depending on the relationship between the interactants (e.g., out-group members via the minimal group paradigm, Miles et al. 2011; Tunçgenç and Cohen 2016). The results of Study 2 support other studies in suggesting that since synchrony is an effective means for individuals to develop relationships, individuals who already share close friendships may not be subject to its benefits.

The influence of the relative phase remains unclear. Unlike the anti-phase patterning characterizing vocal activity (Warner 1979, 1992a, b), the phase pattern of interactants’ movements might inevitably remain undetermined in unstructured conversational situations.

General Discussion

This study explored which frequency band(s) of synchrony (i.e., rhythmic convergence) contributes to bonding between individuals in an unstructured conversation. The role of pre-existing friendship was examined as a moderator from the perspective of representation and bonding. Through the two studies, the results of employing the wavelet transform indicated that some rhythmic convergences (i.e., under 0.025 Hz and around 1.0 Hz) predicted rapport in a manner that distinguished them from other frequency bands (Study 1), and the rhythmic convergence of specific frequency bands was predictive of the motivation to develop a relationship (Study 2). Although Bernieri and Rosenthal (1991) suggested the importance of rhythmic properties in synchrony research, movement rhythm in unstructured conversation has not been a subject of focus previously. Thus, our findings represent the first empirical evidence illustrating the existence of a distinctive rhythm of movement synchrony for individuals bonding in unstructured interactions.

The rhythmic features of face-to-face conversations were investigated as on–off vocal activity or turn-taking (Warner 1979, 1992a, b), which illustrated rapid transitions of turn around 200 ms (Stivers et al. 2009) and rhythmic cycles with slow tempo (Warner 1979, 1992a). The rhythmic features of movement synchrony for social bonding identified in this study are highly compatible with the general idea of co-existing dual (or multiple) middle- (around 1.0 Hz) and slow-tempo (under 0.025 Hz) frequency bands. Each frequency band was associated with a different synchronous behavior, such that the rhythmic convergence of movement around 1.0 Hz seemed to reflect synchronous conversation behaviors such as hand gestures and/or nodding, while slower temporal synchrony reflected loosely-coupled features of movement such as postural sway, crossing legs, and crossing legs while also making hand gestures and nodding, which may occur slowly and/or frequently. These findings may add new insight to synchrony research focusing on daily conversation. However, it must be noted that the rhythmic cyclicity of vocal activity can extend over longer lengths of time, such that multiple cycles of 3 or 6 min may be extracted from longer conversations of over 30 min (Warner 1979, 1992a). This study focused on 5- or 6-min conversations, which is simply too short a time to extract multiple lengthy rhythmic cycles. In Study 2, the rhythmic convergence of slow tempo (under 0.025 Hz) failed to predict a motivation to deepen the friendship with the partner, which might be due to brevity of the time allocated for conversation. Future studies should investigate the role of rhythmic convergence in slower tempo during longer conversation sessions.

The bonding effect of synchrony is moderated by conditions of pre-existing friendship, which suggests that synchrony is the result of an unconscious decision by the interactants to achieve certain levels of rapport or closeness in their relationships. In the early stages of a relationship, individuals will be more motivated to diminish even minor interpersonal differences in order to enhance rapport, which is accomplished by achieving synchrony. In contrast, synchrony will not work as social glue among friends because they are already close and do not feel the need to diminish their differences. Another possible explanation for the absence of a bonding effect among friends derives from the concept of uncertainty reduction, which some researchers consider to play a significant role in social interaction (Berger and Bradac 1982). Rhythmic convergence can be described as the predictability of conversation behaviors (i.e., movement), which would contribute to reducing uncertainty between interactants. Thus, strangers prefer greater rhythmic convergence or predictability because it reduces uncertainty, whereas friends may prefer less predictability and prefer to be surprised. In Study 2, there was no significant influence of synchrony on the pairs of friends, which suggests that they did not benefit from less predictability. Thus, whether one attributes the findings to the desire to overcome differences or to reduce uncertainty, it is clear that greater rhythmic convergence or predictability is not necessarily required between friends. Although this study’s experimental design did not compare the validity of these theoretical perspectives (i.e., achieving a certain level of relationship quality vs. uncertainty reduction), future studies should consider both models in assessing the social functions of movement synchrony in face-to-face conversations.

Limitations and Future Research Directions

One possible limitation of the study is that our sample included only Japanese participants. Although turn-taking—a main component in establishing conversation rhythms—is a universal system characterized by a general avoidance of overlapping talk and minimization of silence between conversational turns (Stivers et al. 2009), the speed of response or average gap between turns is culturally divergent, and Japanese speakers have demonstrated more rapid turn transitions compared to English speakers (Stivers et al. 2009). Given the cultural difference in the turn-taking speed, the distinctive rhythm of movement synchrony for predicting rapport (i.e., under 0.025 and around 1.0 Hz in Study 1) can differ across cultures. Although the findings of this study (i.e., representing and bonding effects of synchrony) may be generalized to various cultural contexts, future studies might consider the impact of cultural variations in rhythmic features in movement synchrony, particularly in intercultural communication situations.

While this study identified pre-existing friendship as a moderator in the bonding effect of movement synchrony, future research should also scrutinize the role of rhythmic convergence in interpersonal relationships. For example, Tickle-Degnen and Rosenthal (1990) proposed that the components of rapport can change over the course of a developing relationship between individuals, such that positivity and attentiveness are more heavily weighted in early interactions, whereas the impact of positivity would decrease in later interactions. In our findings, the rhythmic convergence around 1.0 Hz contributed to create a social bond for strangers, yet it was not effective in increasing intimacy between friends. The movement synchrony around 1.0 Hz may correspond to Tickle-Degnen and Rosenthal’s (1990) notion of positivity. To increase our understanding of the nature of synchrony and interpersonal relationships, it would be fruitful for future studies to investigate the conceptual positioning of the rhythmic convergence of movement during face-to-face conversations. In this stage, Bernieri’s entire 18-item scale should be used. In this study, a 3-item version of the scale based on a previous study (Kimura et al. 2005) was employed, which would benefit the participant cost. However, these items could not be a representative measure of rapport; therefore; they could not cover the three main components Tickle-Degnen and Rosenthal (1990) theorized. Moreover, it would be worthwhile to investigate the relationships of the DVs used in this study (rapport in Study 1 and motivation to develop relationships in Study 2). Theoretical work with an objective measurement of synchrony would become increasingly important in future research.

Our conclusions primarily derive from findings related to rhythmic convergence. Although previous studies employing rhythmic joint tasks have found that interactants synched in the in-phase (e.g., Schmidt et al. 2012) and proposed a positive relationship between in-phase synchronization and affiliation (Hove and Risen 2009), the proportion of in- and anti-phase patterning had no significant influence on perceived rapport and the motivation to develop relationships in our study focusing on unstructured face-to-face conversations. These results might be interpreted as indicating that our contribution is limited; however, Bernieri and Rosenthal (1991) conceptualized synchrony as comprising several facets, namely rhythm (i.e., rhythmic convergence), simultaneous behavior (i.e., convergence of timing), and behavioral meshing. Though previous research did not provide clear distinctions among these elements, our results indicate that each synchrony index diacritically influences social bonding, and it is possible that each facet of synchrony is based on a different mechanism. Certainly, synchrony needs investigation from various perspectives, and this study may have opened the door to further investigation. On the basis of this perspective, future studies should explore which factors make it possible to achieve the in- and/or anti-phase patterning of interactants’ movement even in unstructured conversations. Some joint conversational tasks that manipulate the interactive nature of conversations (e.g., Tolston et al. 2014) would benefit future research.

Future research could also investigate the relationship between verbal and nonverbal signals from the perspective of synchrony. A recent study suggested that the temporal heterogeneity hypothesis, that is, the temporal distributions (burstiness) of verbal and nonverbal behaviors are different (Abney et al. 2018). Language has a hierarchical nested structure, consisting of phonemes, syllables, words, phrases, and sentences, which presents a certain temporal scale pattern (Abney et al. 2015). Further, a long temporal feature could be found in turn-taking lines (Warner 1979, 1992a). On the other hand, such a nested structure would be unlikely assumed for nonverbal behavior, thereby differentiating the temporal distribution of verbal and nonverbal behaviors. In this manner, insights into synchronous nonverbal rhythm may not be sufficiently captured thus far. This may be because the relationship between rhythmic characteristics and explicit (or categorical) nonverbal behaviors are not revealed. In this study, the possibility that hand gestures and/or nods are related to the rhythmic convergence of the middle tempo is suspected. By contrast, relatively large behaviors including postural sway and leg crossing are suspected to reflect synchrony in slow tempo. Future research should explore which behavior is associated with rhythmic convergence at each frequency component to reveal some nonverbal interaction structures.

Furthermore, the differences between behavior matching/mimicry and synchrony are still subject to debate (Burgoon et al. 2014; Chartrand and Lakin 2013; Lakin 2013), which is partly due to the lack of methodology to differentiate and integrate these elements. To this end, the wavelet transform can be a potent measuring tool. Using this method, coordination assessed in the time–frequency plane would indicate the extent of rhythmic convergence located in the timeline; by using the timeline, the boundary between behavioral mimicry and synchrony would become blurred, and the difference between them could be regarded as a difference in perspective toward coordination rather than a marker of different phenomena. Synchrony in the time–frequency plane reflects similarities in the rhythm or velocity between interactants throughout the timeline. In contrast, behavioral mimicry in the time domain represents the extent to which the behavior co-occurs or the degree of similarity in the amount of movement throughout the timeline. From this perspective, synchrony and behavioral mimicry are distinguished by their focus, with the former focusing on the velocity and the latter on the amount. However, the role of form may still be important to distinguish mimicry and synchrony. Behavioral mimicry always requires similar behaviors, whereas synchrony does not (Chartrand and Lakin 2013). It is possible to posit commonalities and differences between the concepts of synchrony and behavioral mimicry; both arguments of differentiating and integrating these two elements of interpersonal behavior could facilitate the development of coordination theory.