Participants
Ten volunteers (four females, six males, age range 22–48 years; mean age = 31.7 years) were recruited by word-of-mouth. They all reported normal hearing. Participants received no financial compensation.
Apparatus
Stimuli were either sounds played through headphones or vibrotactile stimulation applied to the lower back. Stimulus presentation was controlled using Cycling’74 Max MSP software running on a 2010 Macbook Pro with a 2.66 GHz Intel Core i7 processor and 4 GB of DDR3 RAM. An RME Fireface 400 FireWire audio interface was used to direct six channels of audio output from the Macbook. Two channels of audio output were directed to a Behringer MX602a analog mixing console and delivered to participants via Sennheiser HD518 over-ear headphones. Each ear received the same signal. Four additional channels of audio output were directed to four voice coils (each 1″ in diameter) embedded in the seat and back of a padded form-fitting chair (Fig. 1a, Emoti-Chair; Karam et al. 2009). Pink noise was delivered in each trial through the headphones to mask any air-conducted sound originating from the voice coils. In addition, pink noise was delivered through Tactaid VBW32 bone-conduction transducers placed on the left and right mastoids. This latter procedure was adopted to mask any residual sound originating from the voice coils (Fig. 1b; after Russo et al. 2012). This setup for vibrotactile stimuli and masking was modeled after the conditions used by Ammirante et al. (2016) that led to equivalent sensorimotor synchronization across vibrotactile and auditory rhythms. Notably, this prior study found modality equivalence under conditions where the rhythm was metronomic and the area of vibrotactile stimulation was relatively large, spanning the buttocks (2 channels) and lower back (2 channels).
Prior to experimental trials, stimulus levels were adjusted to equalize the perceived magnitude of the auditory and vibrotactile stimuli: all three authors corroborated the levels. Participants were asked if they could hear the chair vibrations during the experiment and all reported that they could not. The sound level of stimuli 12″ (30.5 cm) from the chair surface was approximately 90 dB SPL, as measured using a B&K 2250 sound-level meter with a B&K ZC-0032 pre-amp and a pre-polarized free-field ½” type 4950 microphone.
Stimuli
Each target and context stimulus consisted of a pair of sinusoidal vibrations (200 and 300 Hz), presented through headphones in the auditory conditions, and via voice coils embedded in the Emoti chair in the vibrotactile conditions. Target stimuli were presented in one of the three different rhythmic contexts:
Regular rhythm (RR; Fig. 2a). Eight beats of 200/300 Hz pure tone pairs were played with an inter-beat interval of 500 ms (120 beats per minute, BPM). Beats 1–6 and 8 were the context stimuli, with the tones played in perfect synchrony. Beat 7 was the target stimulus played with either one of the ten pre-selected delays SOA or in synchrony.
No rhythm (NR; Fig. 2b). Target stimuli were presented with no context. This condition had the identical timing as the rhythm conditions, except that there were no context stimuli (beats 1–6 and 8).
Irregular rhythm (IR; Fig. 2c). Target stimuli were presented within the context of an irregularly occurring, unpredictable beat sequence. This was identical to the regular-rhythm condition except that the beat interval duration varied pseudo-randomly between 400 and 667 ms (90–150 BPM) on each of the first six beats. Target stimuli occurred at the same time within each trial as they did in both the other conditions.
In each trial, the target stimulus consisted of a pair of pure tones presented at 200 and 300 Hz. The resulting frequency ratio (2:3) is considered in Western harmony to be the most consonant interval after the unison (1:1) and octave (1:2). We avoided the unison because of potential amplitude variations resulting from phase differences between tones, and we avoided the octave as it has previously been shown to cause confusion in auditory temporal discrimination tasks (Hirsh 1959).
Each tone had an instantaneous attack and a 300 ms linear decay (see Fig. 2). The two tones were presented either in perfect synchrony or with one of ten stimulus onset asynchronies (SOA). SOA pilot trials confirmed that sensitivity to asynchrony for auditory and vibrotactile stimuli were in different ranges, and so, it was not possible to use the same range of SOAs for both modalities. Ranges for each modality were chosen by running the experimenters through pilot trials and adjusting the ranges so as to leave enough room at either end to avoid possible ceiling or floor effects. The SOA range was set at 5–23 ms in increments of 2 ms for auditory stimuli, and at 10–190 ms in increments of 10 ms for vibrotactile stimuli. Each modality had a total of ten discreet SOA values. Example trials and the Max MSP scripts used to run the experiment are available under “Online Resources”.
Procedure
Participants sat in the Emoti-Chair wearing the headphones and Tactaid mastoid stimulators. For each trial, and for each condition, participants were exposed to the context rhythms and target stimulus sequence twice: once with the asynchrony at beat seven and once with no asynchrony. Each of the ten SOA values was presented ten times, for a total of 100 trials per condition. The order of presentation was randomized via the Max MSP script and the stimulus trains were separated by a random interval ranging from 2 to 4 s. Pink noise commenced 1 s prior to the first stimulus presentation and continued until the end of the second-stimulus presentation. Participants indicated in which sequence (first or second) the asynchrony occurred by entering either “1” (for first) or “2” (for second) on a computer keyboard. This 2AFC method was chosen to avoid response bias. The next trial began once a response was entered. A block design was employed with each block consisting of either auditory or vibrotactile stimulation with one of the three rhythmic contexts. The order was counterbalanced between subjects.
Participants were given an orientation session prior to the experimental trials in which they were familiarized with the sound and feel of each pure tone played separately, synchronously, and asynchronously. They were also given approximately ten practice trials prior to commencing each block until they reported feeling confident in the task.
Data analysis
A percent correct score was calculated for each SOA, and a logistic curve between 50% (chance) and 100% correct was fitted to each participant’s datum for each of the six conditions using Eq. 1. Curves were fit using Sigmaplot, which uses a Marquardt–Levenberg algorithm.
$$ {\text{Percent correct }} = \, 0.50 + 0.50/\left( {1 + {\exp}\left( { - \left( {x - x_{0} } \right)/b} \right)} \right), $$
(1)
where x0 is the 75% threshold and b is the standard deviation which we take as our measure of variability.
Data analyses were then conducted on the 75% threshold and standard deviation values.
Separate repeated-measures ANOVAs were performed within each modality to compare thresholds, and two-way repeated-measures ANOVAs were performed to compare overall mean thresholds and standard deviations between the two modalities. Pairwise comparisons used Bonferroni correction.