Frame-differencing methods for measuring bodily synchrony in conversation
The study of interpersonal synchrony examines how interacting individuals grow to have similar behavior, cognition, and emotion in time. Many of the established methods of analyzing interpersonal synchrony are costly and time-consuming; the study of bodily synchrony has been especially laborious, traditionally requiring researchers to hand-code movement frame by frame. Because of this, researchers have been searching for more efficient alternatives for decades. Recently, some researchers (e.g., Nagaoka & Komori (IEICE Transactions on Information and Systems, 91(6), 1634–1640, 2008); Ramseyer & Tschacher, 2008) have applied computer science and computer vision techniques to create frame-differencing methods (FDMs) to simplify analyses. In this article, we provide a detailed presentation of one such FDM, created by modifying and adding to existing FDMs. The FDM that we present requires little programming experience or specialized equipment: Only a few lines of MATLAB code are required to execute an automated analysis of interpersonal synchrony. We provide sample code and demonstrate its use with an analysis of brief, friendly conversations; using linear mixed-effects models, the measure of interpersonal synchrony was found to be significantly predicted by time lag (p < .001) and by the interaction between time lag and measures of interpersonal liking (p < .001). This pattern of results fits with existing literature on synchrony. We discuss the current limitations and future directions for FDMs, including their use as part of a larger methodology for capturing and analyzing multimodal interaction.
KeywordsSynchrony Movement Liking Alignment Frame-differencing methods Image processing MATLAB
Conversation is arguably one of the most common—and important—modes of social interaction. Combining a variety of intrapersonal and interpersonal mechanisms, conversation presents a rich source of data for researchers in numerous areas, from linguistics and affect to posture and gesture. Interpersonal synchrony research lies at the intersection of many of these areas, seeking to characterize the way that interlocutors (individuals involved in conversation) grow to have similar behavior, cognition, and emotion over time. Many areas of research in the behavioral sciences have approached the issue of synchrony, resulting in a scattered terminology: accommodation (Giles & Smith, 1979), alignment (Pickering & Garrod, 2004), coordination (Richardson & Dale, 2005), coupling (Shockley, Santana, & Fowler, 2003), entrainment (Brennan & Clark, 1996), mimicry (Chartrand & Bargh, 1999), and social tuning (Valdesolo, & DeSteno, 2011), among others.1
The overwhelming growth of this research in recent years, including the diverse range of terms and concepts different researchers invoke, only adds to the importance of precise measurements and analytical methods. Research in this area has seen a recent push toward computer vision and computer-aided analysis techniques that can streamline objective measurement while significantly improving efficiency and cost effectiveness. Attempts to standardize methodology for this area of research may even lead to stricter definitions of terms, enriching the field while helping researchers communicate the precise nature of their work.
Issues with synchrony data collection and analysis
Interpersonal synchrony research faces problems not found in traditional research areas within experimental psychology. For example, synchrony research often centers on the dyad rather than the individual. This can lead to smaller sample sizes, since dyads are more costly and difficult to recruit. Conversations unfold over minutes, not milliseconds, and a single data collection session from one dyad may last several hours (e.g., Skuban, Shaw, Gardner, Supplee, & Nichols, 2006). Data collection for these studies, therefore, involves a significant investment of time by a researcher interested in interpersonal processes.
After collecting potentially dozens of hours of interaction for an experiment, researchers must spend even more time analyzing the data. Conversations must be transcribed for analyses of linguistic synchrony, but a single hour of dialogue may require over 10 h to transcribe (Kreuz & Riordan, 2011). Analysis of bodily synchrony, the primary focus of this article, has historically required researchers to meticulously hand-code limb movement in videotaped interactions frame by frame (e.g., Condon & Sander, 1974). Some postcoding automated techniques have been developed to detect repeated patterns of movement synchrony (e.g., THEME; Grammer, Kruck, & Magnusson, 1998); while inarguably helpful in identifying meaningful patterns of movement, these techniques do not mitigate the labor-intensive hand-coding process. Some believe that these issues have likely discouraged some researchers from studying interpersonal synchrony due to lack of funding or insufficient staffing (Bernieri, Davis, Rosenthal, & Knee, 1994).
Although significant, the challenges presented by data collection and analysis are not insurmountable. Researchers have been refining cost-effective and efficient methods of studying synchrony for decades, and research on interpersonal synchrony has unveiled new ways of exploring conversation through facilitated analysis (e.g., Bernieri et al., 1994). Improved research methods may minimize the restrictions imposed by minimal funding and can be combined with other methods to explore questions of cross-channel synchrony and interaction (e.g., affect and body movement; see the General Discussion section).
Existing alternatives to hand-coding for bodily synchrony
One of the most established alternatives to hand-coding involves holistic ratings by judges. The specific methods employed by each researcher vary, but all have a relatively similar general procedure. Judges may be completely naïve (e.g., Bernieri, Reznick, & Rosenthal, 1988) or strictly trained (e.g., Criss, Shaw, & Ingoldsby, 2003; Grammer, Honda, Jüette, & Schmitt, 1999), depending on the goals of the study. Judges are commonly instructed to watch videotaped interactions and provide a rating of the interaction, typically based on Likert scales of general interaction qualities (e.g., Bernieri et al., 1988; Criss et al., 2003). The interlocutors’ dialogue may be muted (e.g., Bernieri et al., 1988) or included in the raters’ materials (e.g., Criss et al., 2003); both have been established as equally effective as measures of bodily synchrony (Bernieri et al., 1994). Each interaction may be rated only once (e.g., Bernieri et al., 1988) or, to ensure high interrater reliability, by multiple raters (e.g., Criss et al., 2003).
Holistic ratings often require significantly less time than frame-by-frame analyses, but they are not without their own methodological problems. With the exception of event-based counting methods (e.g., Skuban et al., 2006), holistic ratings are almost entirely subjective. Intensive judge training and employing multiple raters may decrease subjectivity, but they increase the amount of time required for analysis (e.g., the 6-week training course used by Criss et al., 2003). Because synchrony is often based on measures with fewer than a dozen items, these methods often provide less within-subjects power for statistical tests.
Researchers have improved holistic ratings, but these methods remain unable to objectively quantify bodily synchrony. Ratings may be effective for studying how individuals perceive synchrony, but their inherent subjectivity limits the degree to which researchers can parse apart the mechanisms behind synchrony. While these methods are significantly more efficient, judges’ holistic ratings lose the precision of Condon and Sander’s (1974) hand-coding methods.
Automated video analysis
Other researchers have begun attempts at automating analyses of bodily synchrony. Although these new methods are accompanied by new difficulties, they provide significant advantages over other methods proposed to date. Many of the methods blend computer vision techniques with psychological research to create rater-free, coding-free analytical techniques. Computer-driven techniques minimize researcher interaction with raw data, thereby removing the subjectivity of holistic ratings and the labor of hand-coding. These analyses are intended to be efficient, content-free methods for assessing bodily synchrony during interaction.
Motion-tracking systems appear to be an ideal candidate for tracking interlocutors’ body movement over time. Other areas of research have already begun utilizing these systems’ automated collection of data and computation of movement-related variables for the whole body and individual body parts (e.g., Battersby, Lavelle, Healey, & McCabe, 2008; Lu & Huenerfauth, 2010). However, existing motion-tracking systems are almost as restrictive in their own right as hand-coding methods. Current systems are expensive and can present methodological concerns (Welch & Foxlin, 2002). For example, specialized motion-tracking suits are often tight-fitting; participants may feel discomfort, impacting naturalistic movement. Once these systems become cheaper and less restrictive, motion tracking may become a standard tool for bodily synchrony research. Nevertheless, for researchers facing limitations in funding and for those whose questions are not compatible with the high-tech motion capture requirements, body-suit motion capture still poses significant challenges.
Meservy et al. (2005) have pioneered another appealing method. Their methodology—similar to attempts at automated blob analysis (e.g., Lu, Tsechpenakis, Metaxas, Jensen, & Kruse, 2005)—is intended to automatically track patterns of head and hand movement in videos captured in moderately high quality. However, as presented in their article, the program is only partially automated; it currently requires a significant investment of time at the beginning of analysis and almost constant guidance throughout the process. It also poses restrictions for interaction researchers: Videos must be shot head-on with a single participant in the image, creating problems for applications in naturalistic conversation between two interlocutors. While interesting, Meservy et al.’s paradigm is not yet feasible for interaction research.
We believe that the most promising and effective interpersonal bodily synchrony techniques to date are what we will call frame-differencing methods (FDMs). Rather than tracking specific body parts, FDMs are grounded in research showing that interlocutors synchronize in overall body movement in addition to posture and gesture (e.g., Shockley et al., 2003). Encompassing several existing named (e.g., motion energy analysis [MEA]; Grammer et al., 1999; Ramseyer & Tschacher, 2008, 2011) and unnamed (e.g., Nagaoka & Komori, 2008) methods, FDMs track the changes in pixels from one frame to the next. These methods require the background of an image to remain static, so the only pixel changes from frame to frame will likely be caused by interlocutors’ movement. FDMs generally analyze movement quantitatively by strictly measuring pixel changes between frames (e.g., Nagaoka & Komori, 2008), but some FDMs utilize qualitative analyses of movement (e.g., judges’ ratings of movement FDM-derived visualizations; Grammer et al., 1999).
FDM data collection setups (e.g., Nagaoka & Komori, 2008; Ramseyer & Tschacher, 2008, 2011) generally have similar requirements. They often require only one or two unmoving video cameras and stable ambient lighting, making FDMs highly cost effective. Prior to analysis, the video data are often transformed—manually or automatically—into grayscale images and normalized for brightness. Existing FDMs are indifferent to many movement characteristics (e.g., direction), and they have been used successfully in several studies to date, primarily in clinical (e.g., Kupper, Ramseyer, Hoffmann, Kalbermatten, & Tschacher, 2010; Nagaoka & Komori, 2008; Ramseyer & Tschacher, 2008, 2011) and ethological (e.g., Grammer et al., 1999) domains. Our goal in this article is to present FDM to those interested in basic experimental research on conversation and to develop a simple version of an FDM that can be run with minimal programming experience in MATLAB code, which we supply in Appendix 2.
By presenting a template for a very basic but nevertheless powerful FDM, we hope to provide experimental researchers with a tool that can be easily modified according to their research needs. We also point researchers to existing methods in nonexperimental fields to explore additional ways of implementing similar analyses (e.g., Grammer et al., 1999; Ramseyer & Tschacher, 2011).
A simple frame-differencing method
In this article, we showcase a highly simplified, MATLAB-based method of extracting overall body movement between two people engaged in conversation. Interpersonal synchrony is a highly diverse research topic, comprising researchers from various fields and technical backgrounds. The FDM presented here is based on modifications to existing methods and may provide an affordable, efficient, yet robust source of data to explore how bodily synchrony relates to conversation. A script only a few lines in length provides the basic measures, and analyses can be performed quickly and with very little effort by the researcher. While some semiautomated analyses require researchers to specify individual areas of interest to be analyzed (regions of interest [ROIs] for MEA; e.g., Ramseyer & Tschacher, 2011), many FDMs—including the one presented here—analyze overall body movement (e.g., Boker, Xu, Rotondo, & King, 2002). By combining existing FDM-based techniques and contributing some additions, the FDM offered here provides an analytical method for researchers equipped only with a moderate- to high-quality digital video recorder, standard analysis software (e.g., MATLAB), and very modest programming skills. After we detail an example FDM, we will demonstrate its use and effectiveness in a study on conversational interaction.
Data collection and preparation
The videos must then be uploaded to a computer and segmented into image sequences; higher-quality image formats (e.g., PNG) are preferable, although not required. Assuming a high-quality recording, the images may remain in the native camcorder resolution and do not require rescaling. This can be done with a number of commercially available video processing programs, including Apple’s QuickTime or iMovie. For researchers using Apple products, we have included a sample AppleScript to automate the image segmentation of videos in iMovie (see Appendix 1).3 Researchers may also use MATLAB and VideoReader to import video directly, but we chose to use out-of-the-box software to get image sequences in order to further minimize programming requirements.
The sampling rate may vary according to researcher needs and storage space. We have experimented with a number of sampling rates and have found that 8 Hz affords a great deal of detail without generating unwieldy amounts of data. In contrast with existing methods, the FDM we present can utilize full-color image sequences and does not require the images to be transformed into grayscale or normalized grayscale brightness. We calculate the frame differences using the RGB code in MATLAB’s image arrays. By analyzing images in full color, this FDM is able to detect movement of an object of one color against a background of a different color that may have the same intensity. This means that we can track changes in colors that may have the same overall intensity (i.e., the same summed color codes, but differently distributed over the red, green, and blue spectra). Any differences in intensity of a person’s clothing and background will be captured with this approach.
Data analysis with MATLAB
A second-order Butterworth low-pass filter is then applied to each sequence of half-images in lieu of a threshold (e.g., Grammer et al., 1999) or band-pass filter (e.g., Nagaoka & Komori, 2008). A powerful but relatively simple filter, the Butterworth filter is characterized by normalized cutoff frequency, a maximally flat passband, and a stopband that slopes down to zero. By standardizing the images and applying the low-pass filter, the script is able to control for slight fluctuations in light sources (e.g., high-frequency fluctuations of fluorescent lighting) while remaining sensitive to slight movements (e.g., shifts in posture). Without a filter, co-occurring sources of fluctuation across the images may lead to false detection of synchronized movement, since these fluctuations will occur in time for both image sequences. All of this is done with a few simple lines of code in MATLAB.
If two individuals’ movements are synchronized, r will be highest closer to a lag of 0, reflecting that changes in their movement coincide in time. Unlike other channels of communication (e.g., speech), both interlocutors are able to move simultaneously without impeding the flow of the interaction. Individuals spontaneously synchronize in dyadic rhythmic movement tasks (e.g., Miles, Lumsden, Richardson, & Macrae, 2011; Richardson, Marsh, Isenhower, Goodman, & Schmidt, 2007; Schmidt, Carello, & Turvey, 1990). These findings suggest that interlocutors may exhibit some forms of synchronous—rather than time-lagged—body movement even in naturalistic contexts.
In addition to providing an objective quantification of bodily synchrony, the cross-correlation coefficient across time lags allows for a greater exploration of trends of lagging and leading in bodily synchrony (for a discussion, see Boker et al., 2002). Because we do not have any explicit hypotheses about leading or following in this “role symmetrical” conversation in our sample study, we take the mean r between −1 and +1 lag, −2 and +2 lag, and so on.5 The MATLAB script then records all coefficients for analysis. The entire analysis for a 10-min dyadic interaction requires approximately 6 min on a 3.1-GHz Intel Core i5 Apple iMac computer with 4-GB 1333-MHz DDR3 memory.
After retrieving the cross-correlation coefficients, researchers may use them in a variety of statistical tests. Researchers may use the entire time series, a portion of the time series, an average synchrony score, or the highest/lowest synchrony scores, according to the research questions and statistical tools available (see, e.g., Caucci, 2011, for some discussion on the use of interpersonal synchrony scores in various analyses). In order to confirm that this method works to measure synchrony of body movement rather than co-occurring artifacts, we ran two validation analyses, shown in Appendix 3. In the next section, we present a larger study that explores how synchrony is organized in naturalistic interaction.
Much research has shown a possible link between affiliation and synchrony (e.g., Bernieri et al., 1994; Chartrand & Bargh, 1999; Lakin & Chartrand, 2003; Ramseyer & Tschacher, 2008). However, this synchrony–affiliation link can be modulated under various group circumstances (e.g., Miles et al., 2011). Here, we exemplify our methods in a study that shows gross-body movement synchrony during conversational interaction and tests a correlation of this synchrony with liking between interlocutors.6
As a proof of concept, we investigated whether individuals involved in naturalistic conversations with a broad affiliative prompt achieve bodily synchrony detectable by the FDM outlined above. The correlation coefficient should be higher closer to a lag of 0, because this correlation reflects the closest match in time. As lag increases, the time series are being correlated at a wider relative lag and are, therefore, further apart in time; synchrony would predict a drop in the correlation coefficients as time lag increases.
Existing literature suggests that synchrony should be positively correlated with liking. Rather than using simple correlation, the present study uses linear mixed-effects models for data analysis. We hypothesize, therefore, that the model will predict an increase in r as levels of interpersonal liking increase.
Participants were 40 undergraduate students at the University of Memphis (mean age = 22.08 years; females = 32) and 22 undergraduate students at University of California, Merced (mean age = 19.36 years; females = 17).7 All were awarded extra credit for participating. All were fluent in English. They participated in pairs, as 31 conversational dyads (19 female, 1 male, 11 mixed-sex), according to individual availability via the participant pool’s online scheduler. Only two dyads (one mixed-sex, one female) reported knowing one another prior to participation in the study and were retained for all analyses. One female dyad was removed from all analyses due to experimenter error. Although a seemingly small sample size, this exceeds established dyadic research sample sizes by a moderate (e.g., 21 dyads; Ramseyer & Tschacher, 2008) or wide (e.g., 4 dyads; Boker et al., 2002; Nagaoka & Komori, 2008) margin.
Materials and procedure
After individually completing a brief questionnaire packet and signing informed consent forms, participants were brought into a private room. They were seated facing one another and were recorded in profile (see Fig. 1) to ensure that their movements were captured in a time-locked fashion. Interactions were recorded using a Canon Vixia HF M31 high-definition digital video camcorder mounted on a tripod to ensure stability. To ameliorate the potential awkwardness of interacting with a stranger, participants were allowed 3 min to introduce themselves and briefly get to know one another without the experimenter present. Following the introductory period, the experimenter prompted the participants to discuss entertainment media (e.g., movies, music) that they both enjoyed. They were instructed to talk for 10 min with the experimenter present but outside their line of sight. Experimenters ensured that all conversations lasted at least 8 min and issued additional brief prompts to keep participants on topic (mean additional prompts per conversation = 0.54). Frames during which prompts occurred were excluded from analysis. The participants were then brought to separate rooms and rated how much they liked their partner on a 1–6 Likert scale. After completing the measure, participants were brought together, debriefed, and thanked for their participation.
Data handling and analysis
The participant videos were processed and analyzed using the methods described in the preceding section. We extracted a time series of body movement at 8 Hz for each person, applied a second-order Butterworth filter to each time series, calculated the cross-correlation coefficients at each lag within a window of ±3 s (recommended by Richardson, Dale, & Tomlinson, 2009), and standardized the resulting coefficients. These standardized coefficients served as the dependent variables in the following analyses.
The standardized coefficients were predicted with a series of linear mixed-effects models to investigate basic questions of synchrony (as in Baayen, Davidson, & Bates, 2008; Boker et al., 2002). Using the standardized cross-correlation coefficients derived from the MATLAB script, bodily synchrony was defined as concurrent movement in time. Therefore, when absolute time lag is included as a predictor, r should go down as lag increases (from a lag of 0—matching in time—to lags reflecting greater temporal disparity). In addition, we tested whether there is a relationship between affiliation and r: We predicted that the more participants reported affiliation, the higher the standardized r would be overall. To test these questions, we included fixed factors of time lag and affiliation. All models used dyad and participant as random effects.
In the first model, we focused on synchrony as a function of time lag. This basic model tested whether individuals are more likely to move together in time. The model was found to be significant, p < .001 [t(1842) = 27.6],8 and predicts a drop in the cross-correlation coefficient with each successive time lag (i.e., 125 ms) away from 0 (ß = −.22). This indicates that interpersonal synchrony is highest toward a time lag of 0, or that interlocutors’ movements coincide at relatively the same amplitude in time. Put simply, individuals synchronize their body movements during conversation.
We ran a second basic model to test whether reported levels of interpersonal liking would significantly predict the correlation coefficient. In this model, we included all the data (each r at each lag) and participants’ ratings of interpersonal liking. The model was not found to be significant, p = .84 [t(1842) = .102], suggesting that interpersonal bodily synchrony is not predicted by self-report levels of liking alone.
Finally, we combined these two fixed factors into a single model, using both lag and liking (centered) to predict the correlation coefficient at each time lag. In this model, the interaction term was significant, p < .001 [t(1840) = 9.37, ß = −.07]. The significance of the interaction term implies that, although liking alone does not predict r values, it can moderate the effects of time lag. To illustrate this, we split our participants into two groups, high and low liking. As can be seen in Fig. 4, individuals who like their partner more achieve higher r near lag 0 than those who do not.
To confirm that the full model was the best-fitting one, we compared the Akaike information criterion (AIC) for each model. We observed that the AIC for the first model (predicting synchrony as a function of lag; AIC = 1,401) and the second model (predicting synchrony as a function of liking; AIC = 2,140). The AIC for the saturated model (predicting synchrony as a function of lag and liking) showed it to be the model best fitted to the data (AIC = 1,355).
In this brief study, we found that interlocutors synchronize with their partners during affiliative conversations. The results of this FDM analysis conform to patterns of results from previous FDM-analyzed research (e.g., Nagaoka & Komori, 2008; Ramseyer & Tschacher, 2008). In fact, our analyses have extended these other naturalistic studies and present novel insights into synchrony: We found that body movement tended to be synchronized in time, such that synchrony is greatest at a lag of 0. Thus, as a behavioral channel used during conversation, gross body movement may be patterned in-phase between two interlocutors. This means that body movement synchrony has properties that differ from synchrony in speech, which cannot be carried out in-phase during conversation, due to turn-taking. Other forms of movement have been demonstrated to have in-phase synchrony between individuals (e.g., Miles et al., 2011; Richardson et al., 2007; Schmidt, Morr, Fitzpatrick, & Richardson, in press; Schmidt et al., 1990), and FDM analyses have revealed bodily alignment in brief windows of time (e.g., 1-min windows and 5-s time-lags, Ramseyer & Tschacher, 2011; 10-min windows and 5-s time-lags, Nagaoka & Komori, 2008). However, this is the first FDM-based analysis demonstrating millisecond-to-millisecond synchrony between interlocutors’ broad body movements.9
Although no main effect for liking was found, levels of liking moderated interpersonal bodily synchrony: The more participants liked one another, the more closely synchronized their movements tended to be. Despite the lack of main effect, the interaction effect fits with previous research linking affiliation and body movement patterns (e.g., Chartrand & Bargh, 1999; Miles et al., 2011).
We describe FDMs as promoting objective quantification of interpersonal (bodily) synchrony, even in small labs with minimal funding. Although several studies on interpersonal interaction have used FDMs, there is little work showing its direct relation to holistic ratings, and there is no detailed methodological presentation of it in the experimental literature. This article serves as an introduction for experimental researchers to FDMs generally and to one simplified version (see Appendix 2).
Using similar methods to existing FDMs (e.g., Nagaoka & Komori, 2008; Ramseyer & Tschacher, 2008), we have provided MATLAB code for a simple automated version, intended to minimize the required amount of postrecording editing (see Appendix 2). This simplified FDM provides researchers with added flexibility in recording setups and, in conjunction with AppleScripts to automate data preparation (see Appendix 1), allows for an almost completely automated analysis of multiple interactions at a time. By broadening data collection conditions and automating data analysis, researchers will be able to spend more time collecting dyads, leading to larger sample sizes. The use of cross-correlation coefficients as an indicator of interpersonal bodily synchrony, rather than generalized rating scores, will give statistical analyses greater power.
We hope to expand this method to include ways of parsing out the movement of individual body parts to promote more fine-grained analysis of interpersonal synchrony (e.g., posture, gesture). By combining these and other automated methods (e.g., blob analysis; Lu et al., 2005), researchers may continue to refine the flexibility and utility of FDM-based methodologies.
Of course, FDMs are not without their limitations. The FDM outlined here is intended to minimize cost and to automate as much of the data analysis as possible. In doing so, it loses detail afforded by other methods (e.g., movement direction and velocity, limb tracking). Other FDMs offer researchers the ability to track movement in designated areas (e.g., ROIs in MEA); these allow researchers to manually designate specific areas in which to track movement (e.g., limbs). However, even these FDMs are generally blind to movement direction and velocity. All FDMs, by using varying degrees of automated methods to detect movement, tend to underestimate participants’ movements toward the camera(s). Because fewer pixels change with medial movement (relative to camera position), FDMs are far more sensitive to lateral movement.
For researchers interested in finer-grained movement characteristics, hand-coding techniques and motion tracking may prove to be worth their respective investments. Hand-coding techniques have been widely used and broadly accepted in inter- and intrapersonal synchrony research. The significant time and training required to chart each movement from frame to frame may be useful to researchers interested in tracking even participants’ smallest movements.
Motion tracking may be a viable alternative to both FDMs and hand-coding for those with ample funding and strong data management resources. Currently, few researchers employ these methods for synchrony research, but these systems have unique capabilities that would help to investigate other movement-related questions, as mentioned earlier. Motion-tracking systems would permit an investigation of temporal movement dynamics more precisely than hand-coding permits. However, researchers should weigh the impact of such an invasive technique against its sensitivity to movement: Participants may be less likely to exhibit naturalistic movement patterns while wearing a tight-fitting motion-capture suit than when being filmed, which is relatively noninvasive in comparison.
Here, we have also not discussed the issue of stationarity. This is an important issue in any time series analysis using regression-based methods. Inspecting our data, we have mostly found evidence of relative stationarity (i.e., relatively unchanged mean and variance throughout each 10-min conversation). For further discussion of this issue and potential methods to manage it, see Boker et al. (2002) and Ramseyer and Tschacher (2008, 2011).
Researchers are beginning to find evidence of interpersonal synchrony across a number of channels (Louwerse, Dale, Bard, & Jeuniaux, 2012). We believe that cross-channel questions—for example, the relation between body movement and verbal turn-taking—are an essential next step for this research area and will promote a deeper understanding of the general and channel-specific mechanisms of synchrony. Although our efforts are presently in the area of bodily synchrony, we plan to incorporate other methods for studying additional channels of interpersonal synchrony.
Using new and established methods, we have endeavored to assemble a cost-effective and efficient methodology that facilitates research into multimodal questions. All items used in the setup are commercially available and highly regarded by reviewers on commercial Web sites (e.g., Amazon). As was noted above, conversations were recorded using a Canon Vixia HF M31 HD digital video camera, mounted on a Sunpak PlatinumPlus 6000PG tripod. To facilitate linguistic analyses, each interlocutor’s audio was recorded on a separate audio channel (using an Audio-Technica ATR3350 Omnidirectional Condenser Lavalier Microphone), attached to an Azden CAM-3 On-Camcorder Mini Audio Mixer. The setup as described above costs less than $800; however, researchers may readily substitute less expensive items (e.g., a webcam for the camcorder) as needed.
We believe that this setup and methodology are flexible enough to capture a number of modes of communication. For example, researchers interested in questions of linguistic alignment (e.g., priming; Brennan & Clark, 1996; Cleland & Pickering, 2003; Kousidis & Dorran, 2009; Niederhoffer & Pennebaker, 2002; Reitter, Moore, & Keller, 2010) will find the two-channel recording method amenable to their research (e.g., transcription; Kreuz & Riordan, 2011). Additionally, by combining the FDM with pre- or postinteraction questionnaires, researchers interested in affective synchrony (e.g., Chartrand & Bargh, 1999; Lakin & Chartrand, 2003; Miles et al., 2011; Sadler et al., 2009; Valdesolo & Desteno, 2011) may begin to investigate questions of affective alignment in conjunction with other channels of communication. By combining research into these and other channels, the field can better understand the functions of interpersonal synchrony. Further investigations into cross-channel questions will serve to complement the findings of early efforts in these issues (e.g., Louwerse et al., 2012).
For the purpose of this article, we simply refer to these processes as synchrony, although additional research may help to determine relevant differences among these terms.
Although these methods are likely to capture movement effectively even when given provided lower-quality recordings, lower resolutions may be less sensitive to smaller body movements (e.g., postural sway, facial expressions).
The AppleScript code is also available for download from the first author’s Web site: http://graduatestudents.ucmerced.edu/aloan/Miscellany_files/imovie_segmentation.scpt.
The MATLAB code is also available for download from the first author’s Web site: http://graduatestudents.ucmerced.edu/aloan/Miscellany_files/sample_FDM.m.
Both negative and positive raw correlations were used. The data in Fig. 3 reflect these raw correlations. We did not apply Fisher’s Z-transformation to these data because the correlations were too low to be affected (i.e., correlations of magnitude less than .5). As is discussed later, we standardized the correlations before using them in the linear mixed-effects model in order to obtain beta weights instead of raw change values.
The research we present is part of a larger study we are conducting on differences in conversation types. Here, we focus on analysis of the conversations that involved the basic goal of affiliation.
Previous research has shown significant differences between the interaction styles of same-sex and mixed-sex dyads, and such composition may have important ethological implications (see Grammer et al., 1998). However, we exemplify our method by showing aggregate synchrony across dyad types and reserve an analysis of gender for a later study, since it is not an immediate goal of this methodological presentation.
Degrees of freedom are not easily defined for mixed models; t-values for mixed models, therefore, are often not included when reporting results (e.g., Boker et al., 2002). Some (e.g., Bates, 2006) have argued that reporting degrees of freedom can be misleading, given differences in techniques for obtaining them. However, there are several sources available for those who wish to report them (e.g., Baayen, 2008; Baayen et al., 2008). Degrees of freedom reported here were estimated using the LME function described therein. The t-values reported here are based on MCMC sampling using the “pvals.fnc” function in R, as described in Baayen et al., which also includes an excellent introduction to MCMC methods.
We did not explore synchrony relative to baseline, but methods are available to do so. For example, Ramseyer and Tschacher (2008) offer an elegant technique of window-wise shuffling. Shockley, Baker, Richardson, and Fowler (2007) recommend using a “virtual pair” analysis in which the researcher forms baseline dyads from individuals of separate dyads in the experiment. These are relatively straightforward time-series methods that are outside the methodological scope of this article, but we point the reader to these studies in case this is of interest.
The authors wish to thank Sidney D’Mello for his advice on filters. We also wish to thank undergraduate research assistants Will Dunbar and John James for their help in hand-coding for the second validation study presented in Appendix 3. This work was supported in part by NSF Grant HSD-0826825.
- Bates, D. (2006). lmer, p-values, and all that. The R-help archives. Retrieved June 6, 2012, from https://stat.ethz.ch/pipermail/r-help/2006-May/094765.html
- Battersby, S. A., Lavelle, M., Healey, P. G., & McCabe, R (2008). Analysing interaction: A comparison of 2D and 3D techniques. Paper presented at the Multimodal Corpora Workshop in the Sixth International Language Resources and Evaluation (LREC’08), Marrakech, Morocco.Google Scholar
- Caucci, G. (2011). When I move, you move: Coordination in conversation. Unpublished dissertation.Google Scholar
- Giles, H., & Smith, P. (1979). Accommodation theory: Optimal levels of convergence. In H. Giles & R. St. Clair (Eds.), Language and social psychology (pp. 45–65). Oxford: Blackwell.Google Scholar
- Kousidis, S., & Dorran, D. (2009). Monitoring convergence of temporal features in spontaneous dialogue speech. Paper presented at the First Young Researchers Workshop on Speech Technology, Dublin, Ireland.Google Scholar
- Kreuz, R. J., & Riordan, M. A. (2011). The transcription of face-to-face interaction. In W. Bublitz & N. R. Norrick (Eds.), Handbooks of pragmatics (Vol. 1, pp. 657–680). Berlin: De Gruyter Mouton.Google Scholar
- Louwerse, M. M., Dale, R., Bard, E. G., & Jeuniaux, P. (2012). Behavior matching in multimodal communication is synchronized. Cognitive Science.Google Scholar
- Lu, P., & Huenerfauth, M. (2010). Collecting a motion-capture corpus of American Sign Language for data-driven generation research. Proceedings of the NAACL HLT 2010 Workshop on Speech and Language Processing for Assistive Technologies, 89–97.Google Scholar
- Lu, S., Tsechpenakis, G., Metaxas, D. N., Jensen, M. L., & Kruse, J. (2005). Blob analysis of the head and hands: A method for deception detection. Paper presented at the annual Hawaii International Conference on System Sciences (HICSS ’05), Hawaii.Google Scholar
- Reitter, D., Moore, J. D., & Keller, F. (2010). Priming of syntactic rules in task-oriented dialogue and spontaneous conversation. Paper presented at the annual conference of the Cognitive Science Society, Vancouver, BC.Google Scholar
- Sadler, P., Ethier, N., Gunn, G. R., Duong, D., & Woody, E. (2009). Are we on the same wavelength? Interpersonal complementarity as shared cyclical patterns during interactions. Journal of Personality and Social Psychology, 97(6), 1005–1020.Google Scholar
- Schmidt, R. C., Morr, S., Fitzpatrick, P., & Richardson, M. J. (in press). Measuring the dynamics of interactional synchrony. Journal of Nonverbal Behavior.Google Scholar