It has been established that integrated auditory and vibrotactile signals activate a larger volume of the auditory cortex than the auditory stimulus alone (Auer, Bernstein, Sungkarat, & Singh, 2007). This hypothesis is also demonstrated in monkeys by Kayser, Petkov, Augath, and Logothetis (2005), who tested integration of auditory broadband noise and tactile stimulus. By using functional magnetic resonance imaging (fMRI), they detected that audiotactile signals activated the posterior and lateral side of the auditory cortex of the animal. Given the continuous technological leaps in information and communication technology, the interest in studying audiotactile integration is increased, and there are several works that demonstrate that the human auditory cortex is activated through vibrotactile excitation at the hand. Schürmann, Caetano, Jousmäki, and Hari (2004) have established that audiotactile stimulation activates the auditory cortical area in normal hearing participants. In their experiment, participants were asked to adjust the sound intensity at the same level as fixed-intensity vibration. When the participants touched the vibration source, a higher intensity than the actual produced sound intensity was perceived. This satisfies the hypothesis that under certain circumstances, vibration facilitates hearing. Further, by using whole-scalp magnetoencephalography (MEG), Caetano and Jousmäki (2006) demonstrated that the human auditory cortex can be activated by sensing fixed-intensity vibration of 200 Hz at the fingertips, and therefore, established that the auditory cortex can be activated by vibrotactile stimulation alone. Their research experiments were conducted at a fixed vibration frequency of 200 Hz, without incorporating level of frequency or location effects. In another work, researchers studied the perceptual integration at 50, 250, and 500 Hz vibrotactile and auditory tones in a detection experiment as a function of the relative phases of sound and vibration pulses (Ranjbar, Wilson, Reed, & Braida, 2016). The results did not establish significance regarding the effect of phase difference in sound-detection performance. However, combination of 250 Hz and phase difference resulted in significantly higher scores in sound detection in contrast to other fixed frequencies (50 Hz and 500 Hz). The work suggests that auditory and vibrotactile signals can be effectively integrated without regard to phase difference and fine structure regulation. Wilson, Reed, and Braida (2009) investigated the effect of stimulus phase difference and onset asynchrony on the integration of auditory and vibrotactile stimuli. The experiment aimed at examining the perceptual integration of 250 Hz, 500 ms sinusoidal auditory and tactile stimuli. The intensities of both auditory and vibrotactile signals were chosen to yield 63%–77% detection performance. The results indicated that detection performance is not affected by a phase difference between auditory and vibration signals. It has also been shown that the performance of participants improves when tactile and auditory signals are fully synchronized.

Wilson, Braida, and Reed (2010) extended their research work with the investigation of the perceptual integration in the loudness of combined auditory and tactile stimuli. Various combinations of auditory and tactile signals and purely auditory signals were equated in loudness, with a reference of 200 Hz auditory signal at fixed 25 dB intensity. The frequencies of auditory test signals were 200, 250, 300, and 547 Hz, while tactile signal frequencies were 20, 200, and 400 Hz. The auditory signal was matched in loudness with the following combinations of auditory and tactile stimuli: (1) two auditory signals at frequencies of 250 and 300 Hz; (2) two auditory signals at frequencies of 250 and 547 Hz; (3) auditory and tactile signals of 250 Hz each, with variable auditory signal intensity; (4) auditory and tactile signals of 547 Hz and 250 Hz, respectively, with variable auditory signal intensity; (5) auditory and tactile signals of 250 and 400 Hz, respectively, with variable tactile signal intensity; and (6) auditory and tactile signals of 250 and 20 Hz, respectively, with variable tactile signal intensity. The results of experiments with the combination of auditory and tactile stimuli with fixed tactile signal level indicate that there is 5.2 dB perceived increase in Case 3, whereas in Case 4, the increase in the intensity is 7 dB. When the level of auditory signal was fixed, there was 7.3 dB increase in Experiment 5 and an 8 dB increase in Experiment 6.

Further work by Wilson et al. (2010) focused on studying the effect of frequency on the integration of auditory and vibrotactile stimuli. The experiment procedure was similar to that in their earlier investigations. In Experiment 1, the vibration frequency remained constant at 250 Hz, while the frequency of auditory stimulus was altered between 125 and 2000 Hz. In Experiment 2, the auditory stimulus frequency was 250 Hz, and the vibrotactile stimulus frequency changed between 50 and 400 Hz. In Experiment 3, the tactile stimulus and auditory stimulus were held equal and ranged between 50 and 400 Hz. Both stimulus levels were chosen as in their previous studies, yielding 63% to 77% unimodal performance. The results indicate that the rate of detection is higher when auditory and tactile stimuli frequencies are equal and within the Pacinian range.

From the up-to-date literature, it is understood that audiotactile integration is more notable in some frequencies than in others, and more specifically, when these frequencies fall within the range in which the Pacinian corpuscle is shown to have maximum sensitivity. For effective design of vibrotactile interfaces, it is important to further substantiate previous research results and establish the presence of the phenomenon when simply integrating tactile and acoustic cues, as it would be done in the case of a simple interface.

In general, when it comes to sensitivity to vibrotactile stimuli, it is known that the fingertips and hand have a greater density of mechanoreceptors and more sensitive regions compared to the rest of the body and are more appropriate for receiving tactile information than other regions (BensmaÏa, Hollins, & Yau, 2005; Verrillo, 1966). Tactile sensation can be triggered by mechanical vibration of the skin at frequency ranges between 10 and 500 Hz (Kaczmarek, Webster, Bach-y-Rita, & Tompkins, 1991). When it comes to human ability of frequency discrimination in vibrotactile stimuli, Mahns, Perkins, Sahai, Robinson, and Rowe (2006) have shown that at the fingertips the relative discriminative increment, or just noticeable difference (JND), for frequencies of 20, 50, 100, and 200 Hz is 0.32% ± 0.07%, 0.19% ± 0.07%, 0.21% ± 0.03%, and 0.14% ± 0.04%, respectively. However, another study suggests that JND is constant across frequencies with an absolute discriminate increment of 22% (Löfvenberg & Johansson, 1984). In the present work, as well as in our related previous work (Abdikadirova et al., 2018), JND-related information is employed in experimental design, and specifically, for choosing the set of test frequencies shown in Table 1. It is ensured that the minimum frequency difference suggested by literature is maintained throughout the chosen values. More specifically, for lower frequencies, values of 50, 100, and 200 Hz were used in agreement to the study by Mahns et al. (2006), while higher frequencies were incremented by 22% according to Löfvenberg and Johansson (Löfvenberg & Johansson, 1984).

Table 1 Hypothesis testing for sound-only versus sound-and-vibration tones (Abdikadirova, Praliyev & Xydas 2018)

In recent work, we investigated the effect of the frequency level on audiotactile integration (Abdikadirova et al., 2018). In this work, tests were performed for 13 different frequencies, as shown in Table 1. Participants were asked to identify sounds with or without the presence of vibrotactile excitation of equal frequency. Ten different auditory stimuli levels were chosen, of which nearly half were inaudible. The interval of auditory signal levels was 5 dB. The intensities of vibrotactile signals were as low so as to avoid acoustic artifacts coming from the vibration generator. A valid audiotactile integration instance was defined as the instance in which a sound tone of certain intensity, which was inaudible on its own, was reported as being audible when delivered alongside the presence of vibrotactile stimulation at the index fingertip. Rates of successful audiotactile integration were particularly high within a specific frequency range and specifically for the frequencies 200, 230, 300, and 390 Hz. Further, a peak was observed at 300 Hz, where the maximum sensitivity to vibrotactile excitation is reported (Gescheider, Bolanowski, Pope, & Verrillo, 2002; Jones & Sarter, Jones & Sarter, 2008; Verrillo, 1966). Statistical significance was established for these four frequencies, in contrast to the rest of the frequencies, where the presence of the vibrotactile signals did not significantly enhance sound identification success rates. The four identified frequencies pertain to the maximum sensitivity range of the Pacinian corpuscle, which is present at significantly high densities at the index fingertip. More specifically, Pacinian corpuscle’s maximum sensitivity frequency is located at around 250–300 Hz, and extends to 200 and 400 Hz. Outside this frequency range, the sensitivity reduces rapidly (Verrillo, 1966).

In the work by Abdikadirova et al. (2018), no environmental masking sound was employed. Furthermore, control experiments were performed for only two of the participants. In order to further substantiate the observations, the experiment was repeated in the current study, with a focus on the four prevalent frequency values in which audiotactile integration was observed. In addition, in the present work, environmental masking with white noise was incorporated to completely isolate the user acoustically from the vibration generator. Furthermore, with fewer frequencies to examine, it was possible to include control experiments for all users and all tests. In the control experiments, the users were presented with the exact same vibratory tones, but without touching the vibratory probe. In this way it was ensured that the users are acoustically isolated from the vibration generator. White noise was employed in previous work by other researchers as well (Caetano & Jousmäki, 2006; Wilson et al., 2010).

The aims of this work follow:

  1. 1.

    To establish that audiotactile integration is present in the range of frequencies from 200 to 390 Hz.

  1. 2.

    To investigate whether there is a specific frequency in which audiotactile integration level reaches a peak.

Method

Experimental setup

The testing apparatus consists of the following equipment: (1) PC; (2) external sound card; (3) headphones with active ambient noise and sound cancellation (Sony WH-1000XM2), including automatic performance optimization given current environmental conditions; (4) a vibration generator with a vertical probe (Frederiksen 2185.00); (5) amplifier (L-Frank Audio PAA30USB); (6) custom-made sound insulation box; (7) BY-LM10 lavalier microphone; and (8) loudspeaker. The vibration generator was placed inside the insulation box with only the vibrating probe protruding, so that the sound generated due to mechanical parts movement is isolated to the maximum possible extent. A cylindrical wooden interface with a 4 mm diameter and a flat end is inserted in the center tap as the probe endpoint (which the user touches), so that it matches the dimensions used in previous research (Kayser et al., 2005). The wooden probe, alongside thermal insulation of the vibration generator, ensured that no significant amount of heat was transferred from the equipment to the participant’s finger. More specifically the vibration generator was insulated at the top and sides so that heat dissipation was downwards. Additionally, the room air-conditioning system was always activated in order to maintain a constant air temperature of 21oC. The complete experimental setup is shown in Fig 1.

Fig. 1
figure 1

Experimental setup

Voltage and current measurements were taken at the two input terminals of the vibration generator with the use of National Instruments (NI) USB-6009 data acquisition card. For voltage measurements, an analog input of NI USB–6009 was directly connected to the two input terminals of the vibration generator. To perform current measurement, a 0.22-ohms resistor was connected in series between the vibration generator and the amplifier. Voltage and current measurements are used to calculate the power of the vibration signal, as described in the Experimental Procedure section. The BY-LM10 lavalier microphone is employed to measure the intensity of the sound generated by the vibration generator so that appropriate masking sound levels can be set in order to isolate the user acoustically.

Participants

Thirteen young adults participated in the experiments. Their age ranged between 19 and 21 years (Mage = 19.9 years, SD = 0.60 years). Participants had no knowledge about the topic of the study and had not been previously involved in vibrotactile experiments. All of them signed an informed consent and were compensated for participation. All of the participants reported no hearing problems. We received Nazarbayev University ethics committee approval prior to the experiments.

Experimental procedure

At the beginning of the experiment, the participant was seated in a relaxed position with the headphones on and the noise and ambient sound cancellation activated. The noise cancelling headphones were then optimized for the specific vibration frequency of the measurement using the integrated optimization feature. The lavalier microphone was placed near the headphones so that the sound coming from the vibration generator and reaching the user was measured and subsequently cancelled out using the masking sound. The participant wore the headphones throughout the duration of the experiment, which consisted of three stages:

(1) vibration intensity and masking sound calibration; (2) audiotactile sensitivity test; (3) control experiment (no touch), which was the same as stage (2) but with the difference that the user was not touching the probe. We employ the term vibration-only-no-touch (VONT) to refer to the vibration tones delivered in this stage. To exclude the possibility of false visual-based responses to vibratory tones, users were asked to keep their eyes shut throughout the testing.

1) Vibration intensity and masking-sound calibration:

In audio tests, participants can be completely isolated from the environment using special insulation and glazing so they can focus on the controlled sound tones. In vibrotactile experiments, though, participants cannot be physically isolated from the vibration source because they need to touch it during the experiments. Therefore, ear shields and a masking background sound are used instead. Further, the vibration intensity can be calibrated in order to ensure that the user cannot listen to the vibration source. In this work, the method of limits has been employed for the vibration intensity calibration. According to the method, a stimulus having a high probability of a positive response is presented initially to the participant. If a positive response is obtained, then the stimulus level for the next trial is reduced. If a positive response is again obtained, then the stimulus level is again reduced by the same amount (the step size). This procedure is continued until a negative response is obtained (Levitt, 1971). The first stage of the study involved calibrating the vibration intensity according to the method of limits so that the user cannot hear the sound generated by the vibration apparatus. In this way, only controlled sounds through the headphones can be delivered. At this stage, participants are not touching the probe. Sinusoidal vibration signals are generated, and participants are asked to tap on the workbench with their free hand whenever they hear a tone. The vibration intensity is reduced after each tap until the user does not respond to the tone, meaning that they cannot hear the vibrating elements. Additionally, the microphone is placed near the headphones, and the intensity of the sound that is produced by the vibration generator is measured. To fully isolate the participant from the vibration generator sound, the final measured intensity is masked with white noise with intensity 10 dB higher than the measured one. For instance, if the final measured intensity of sound generated from the vibration generator is 28 dB, then white noise is generated at 38 dB. The same calibration step is repeated for every frequency. When it comes to the vibration intensity at the source (vibration generator), this was measured independently at the input electrodes of the vibration generator. By measuring the voltage input and flow of current, the root mean square (RMS) was computed and then converted into decibels using Equation 1.

$$ {Power}_{DB}=10{\mathit{\log}}_{10}\left({V}_{RMS}{I}_{RMS}/{P}_{ref}\right) $$
(1)

where, Pref is a reference power of 1 Watt, while VRMS and IRMS are the voltage and current RMS respectively. Since vibration intensity was calibrated for each user, the threshold value was different in each case. Hence, the power dB of the generated vibration was averaged for the threshold intensities of all participants and for each test frequency using linear interpolation. In effect, five separate VRMS and IRMS measurements were performed for each frequency pertaining to each user’s threshold intensity, and the average power dB value for each user was established. This was then used in the calculation of the overall power dB average values among all users for each of the frequencies. Column 2 of Table 2 shows the average power dB as calculated for each frequency for threshold intensities. Column 3 shows the SPL of the sound that is produced by the vibration generator at 0.8 m, which is approximately the distance of the participant’s head from the vibration generator. The SPL was calculated using the average power dB of a respective vibration signal. Column 4 shows the established SPL hearing thresholds (Sivian & White, 1933). The difference between actual SPL (column 3) and SPL thresholds (column 4) was compensated with the use of the headphones and the masking sound (+10 dB). The table 2 shows that the user can be effectively isolated acoustically given the masking sound and active noise reduction headphones. Note that the efficiency of the vibration generator is not considered but potentially the actual intensity at the probe would be lower than the one estimated by voltage and current measurements. The vibration intensity values shown in Table 2 are near the threshold values for tactile signals (Wilson et al., 2010). This validates the power measurements of vibration. However, dB values shown in Table 2 cannot be effectively compared to the tactile threshold values established in the study of Wilson et al., (2010) since they did not mention the amplification amount provided by the amplifier.

Table 2. Average vibration intensities at different frequencies

2) Audiotactile Sensitivity Test: In the second part of the experiment, the participant touched the probe with the index fingertip. A pillow was placed under the participant’s forearm to keep the wrist and arm relaxed. The masking sound was activated throughout the experiments. Three types of sinusoidal signals were generated at this stage: 1) Sound only (SO) (through the headphones); 2) Sound and vibration (SV); and 3) Vibration only (VO). The procedure was performed four times for each participant, one for each of the test frequencies shown in Table 2. Frequency steps were chosen by considering JNDs suggested by literature as described in the introduction. As it has been previously investigated by Wilson et al. (2009), the phase difference and asynchrony between auditory and tactile stimuli have no effect on the performance of participants. Hence, both auditory and vibrotactile signals were generated with zero phase difference and full synchronization.

A total of 25 tones were delivered to the user for each frequency: 10 sound-only tones, 10 sound- and-vibration tones (sound through the headphones and vibration at the fingertip) and 5 vibration tones (without sound). All 25 tones were generated in random order. The intensity of vibratory stimulation remained the same in all 15 stimuli (5 vibration and 10 sound and vibration). The auditory stimuli were provided using the method of constants, where several stimuli levels are chosen beforehand, and groups of observations are placed at each of these stimulus levels. The order of the observations is randomized (Levitt, 1971). Accordingly, auditory stimuli had 10 different intensities and they contained both normally audible and nonaudible tones. The order of the stimuli was randomized. The frequencies of both auditory and vibrotactile stimuli were set to be the same for each set of experiments as the integration of vibrotactile and auditory signals was shown to be maximum when frequencies of both stimuli are equal (Levitt, 1971). The sound intensities were chosen for each frequency based on calibration experiments with two young adults, so that there were nearly 5 audible and 5 nonaudible sound intensities. The difference between subsequent intensities was 5 dBFS (dB full scale). Actual sound level measurements with microphone showed that the step of 5 dBFS is roughly 7 dB. Sound signals were generated in MATLAB based on equations (2) and (3). The duration of both tactile and auditory signals was 1.2 s.

$$ A=\kern0.5em {A}_0\ast {10}^{dBFS/20}, $$
(2)
$$ y=A\ast \sin \left(2\ast \pi \ast f\ast t\right), $$
(3)

where dBFS is the full-scale dB level of sound, A0 the reference amplitude, f-frequency, and t-time. Reference amplitude is 1, as this is the maximum possible value that the sound function can handle. As in the previous parts of the experiment, participants were asked to tap whenever they heard a sound. The current study investigates whether a participant can incorporate a tactile vibration stimulus in a way that it enhances otherwise inaudible sounds, which in turn are generated by the headphones. Even though tactile vibration does activate the auditory cortex of our brain, it is possible that the person is not able to subjectively perceive it as sound. Hence, the yes/no method of recording the responses can provide more accurate results of human perception.

Results

Figure 2 presents the median values of positive responses to SO tones (left) and SV tones (right) calculated for all users. Each median, represented by a horizontal line within the boxplot, is calculated from the positive responses of all participants for the specific frequency. In addition to the horizontal line representing the median values, the boxes contain 50% of the cases. The crosses represent the outliers, while dotted lines contain the rest of the results. Note that the maximum possible number of positive responses in both SO and SV tones is 10 for each participant and, consequently, for the median as well. Figure 3 shows the medians for positive responses for all users in VONT tones where the user was not touching the probe. Again, the maximum possible median value is 10. Figure 4 presents the medians resulting from the differences between SV and SO positive responses of all users. Vibration-only (VO) responses are not considered at this stage as they were employed for confusion purposes and also, VO excitation of the auditory cortex requires specifically focused investigation.

Fig 2.
figure 2

Medians of positive responses to sound-only (SO) tones (left) and sound-and-vibration (SV) tones (right)

Fig. 3.
figure 3

Medians of positive responses in vibration-only-no-touch (VONT) tones

Fig. 4.
figure 4

Medians of the differences in positive responses between sound-only (SO) and sound-and-vibration (SV) tests

Table 3 shows the results of Wilcoxon rank-sum significance testing (significance level α=0.05) for SV and SO tones. True value of H1 indicates that the null hypothesis (“medians are equal”) can be rejected at the P-value level shown in the respective column, signifying the established statistical difference between SV and SO responses. Note that the Wilcoxon rank-sum test is non-parametric, thus no normality test is required.

Table 3 Hypothesis testing between sound and vibration and sound only for valid frequencies

Discussion

In Fig 2 it is observed that the median of positive responses fluctuates in both SO and SV experiments. Nevertheless, no trend or peak is observed as it was the case in the study of Abdikadirova et al. (2018), where a peak was observed at 300Hz. The lack of ascending or descending trend is further substantiated by considering the difference between SO and SV tones detection. To this end, Fig 4 shows that the median of the differences between SO and SV responses is constant for all frequencies. This is in agreement with the experimental results of the study of Wilson et al. (2010) where detectability rates were approximately equal at frequency values of 250 Hz and 400 Hz. The median value of ‘2’ in Fig 4 indicates that when vibrotactile excitation accompanies sound tones (SV), users can detect sound down to two steps lower in intensity than when sound is delivered on its own (SO). In parallel, results for control experiments shown in Fig 3 indicate that the median of positive responses when the user is not touching the probe is almost zero, which establishes that the user was effectively isolated acoustically from the vibration source to an adequate extend. “False alarms” have also been observed: the user sometimes tapped when neither vibratory or acoustic signal was delivered but ultimately these responses do not affect the final result statistically. The findings further substantiate previous results which suggest that, potentially, the auditory cortex is activated by touch and enhances sound perception (Abdikadirova et al. 2018). In addition, this work establishes that audiotactile integration is present for the whole range of frequencies that pertain to the Pacinian corpuscle’s maximum sensitivity range (200–400 Hz). These results can be employed in the design of vibrotactile interfaces.

The study takes into account the JND of tactile vibration, while Wilson et al. (2010) chose frequencies of 50 Hz, 125 Hz, 250Hz and 400Hz. Having smaller spacing between frequencies can yield more accurate results to analyze the two modalities integration throughout the frequency range.

Also, in our studies, the vibration intensity was set to be just low enough to isolate the acoustic artifacts coming from the vibration generator. Therefore, it can be claimed that the level of vibration stimuli is higher than the tactile threshold values. This was performed to make sure that vibrotactile stimuli is fully perceived and to decrease the number of false alarms. However, in the experiments of Wilson et al. (2010), both tactile and auditory stimuli levels are near the threshold values, thus, it is not yet clear whether tactile stimulus enhanced the perception of auditory signals or auditory stimulus helped to detect the tactile vibration.

Conclusion

This work shows that audiotactile integration is present throughout the maximum sensitivity range of the Pacinian corpuscle. The level of audiotactile integration is constant given constant vibrotactile excitation intensity. The inexistence of a peak sensitivity value as it was observed in previous work (Abdikadirova et al., 2018) is noted. It is not known how variations in vibrotactile signal intensities and amplitudes can affect the audiotactile integration. Further experiments can potentially investigate these issues along with further establishing that the audiotactile integration drops rapidly before 200 Hz and after 400 Hz. as it was observed in the previous studies. Given the results, further experiments can attempt to quantify the involvement of the Pacinian corpuscle in the audiotactile integration.