Introduction

We perceive the external world via interactions of multisensory information derived from different sensory organs and utilize the perceived information to update motor outputs that produce the desired movements. In the real world, our experiences have multisensory features. Integration of multisensory information (i.e. multisensory integration) shapes accurate perception when each modality of sensory information represents congruent features1. For example, simultaneous presentation of auditory and somatosensory stimuli enhances reaction time between motor response and sensory stimuli2. By contrast, in some cases such as noisy environments, attending to sensory signals of a particular modality increases signal detectability3,4,5, which accompanies ignoring task-irrelevant sensory information (i.e., selective attention). Previous studies demonstrated that these multisensory interaction functions are acquired at a late stage of development and are adapted with age6,7 and experiences8. This suggests that multisensory interaction functions have the potential for plastic adaptation through a course of abundant multisensory experiences.

Plastic changes in unimodal sensory functions have been investigated in previous studies of sensory and motor learning9,10. Unimodal sensory functions can be sharpened with repeated exposure to sensory stimuli of a single modality or with motor training11,12,13,14,15. Furthermore, previous studies demonstrated the plastic adaptation of multisensory interaction functions, for example, audio-visual and audio-tactile integration, in trained individuals15,16,17,18. A common approach among these studies is the manipulation of congruency of sensory information between multiple modalities. For example, expert musicians can more accurately perceive incongruent stimuli between auditory and visual sensory information, like musical score and sounds, compared with musically untrained individuals18,19,20. However, it remains unclear whether the above-mentioned multisensory interaction functions (i.e., multisensory integration and selective attention) also benefit from extensive motor training. These multisensory interaction functions involve volitional control of information derived from the multisensory sensory organs (i.e., whether integrate or suppress), which is different from the incongruent stimuli detection that identifies differences in information encoded in sensory signals of different modalities. Recent studies demonstrated that multisensory integration involves combining multisensory sensory information by summing prior knowledge and the independent stimulus estimates from each modality21,22,23,24,25. On the other hand, the selective attention in multisensory situations is defined as guiding attention toward modalities that represent task-relevant information and away from modalities that provide task-irrelevant information and/or noise. Previous studies demonstrated that short-term playing of an action video game improves visual selective attention that is characterized as guiding attention toward task-relevant visual stimuli and away from irrelevant visual disturbances26,27. However, to the best of our knowledge, none of the studies examined plasticity of selective attention in multisensory situations, especially in the auditory and somatosensory modalities.

The expert pianist is a unique population to address these issues because playing the piano provides extensive auditory and somatosensory experiences. In piano performance, sensory inputs from the auditory and somatosensory modalities provide abundant information such as loudness and timbre of notes. Indeed, previous studies have demonstrated that such multisensory experiences provide pianists with superior perceptual abilities in both modalities over untrained healthy individuals13,28,29. On the other hand, previous studies demonstrated interactions of sensory perception between these two modalities30,31,32. Such auditory-somatosensory integration plays important roles in successful piano performance and thus would be plastically adapted through long-term piano training. One of the important requirements for expressive piano performance is fine control of the intensity of sounds. The sound intensity is perceived mainly from the auditory modality, but also from the somatosensory modality that encodes information on the sound intensity via the keystroke (e.g., pressure sense at fingertips). In addition, accurate perception of the sound intensity necessitates both the integration and selective attention of information derived from these two modalities. Thus, expert pianists would have undergone unique multisensory experiences from the auditory and somatosensory modalities since childhood. This raises the hypothesis that expert pianists have superior multisensory interaction functions in the intensity domain between the auditory and somatosensory over the musically untrained individuals. Here, we designed a series of psychophysical experiments to probe the plasticity of the multisensory interaction functions in the auditory and somatosensory modalities, especially with respect to the multisensory integration and selective attention, through a comparison between expert pianists and nonmusicians.

Materials and methods

Participants

Fifteen pianists and 15 musically untrained individuals (22.87 ± 4.53 years old [mean ± SD] and 25.93 ± 3.90 years old for pianists and nonmusicians, respectively (t28 =  − 1.99, p = 0.06); 11 and 6 females for pianists and nonmusicians, respectively) participated in the present study. All pianists majored in piano performance in a musical conservatory and/or had extensive and continuous private piano training under the supervision of a professional pianist/piano professor. By contrast, all nonmusicians have absolutely no experience in training of musical instruments except for mandatory musical education programs during the elementary school period. All participants gave their written informed consent before participating in the experiments. All experimental procedures were carried out in accordance with the Declaration of Helsinki and were approved by the ethics committee of Sony Corporation. This study was not preregistered.

Auditory stimulus

Each auditory stimulus consisted of 2000 Hz pure tone and lasted for 0.2 s. We used this sound frequency because human auditory perception is highly sensitive at this frequency33. In order to eliminate pop noise, the amplitude of the auditory stimulus linearly decreased to silent during the last 100 ms. The sound signal was produced by a custom-made LabVIEW software and was delivered through a D/A converter (National Instruments Inc., US). Each stimulus was delivered via a speaker (MSP3, YAMAHA, inc. Japan) which put 1.0 m in front of the participants. To maintain the distance between the participant’s head and the speaker, we instructed the participants to maintain their head location once we measured the distance.

Somatosensory stimulus

Vibrotactile stimuli were delivered to the fingertip of the right index finger via a piezoelectric actuator (Murata Electronics, 7BB-20-6L0). Each stimulus consisted of 200 Hz sinusoidal vibration with a duration of 0.2 s. We used this frequency because the detection threshold of the vibration stimulus was the lowest in this frequency, which we confirmed in a pilot experiment. The actuator was controlled via a custom-made LabVIEW software through the D/A converter. Each participant was instructed to put the pad of the right index fingertip on the surface of the actuator without volitionally pushing down or manipulating the actuator. Before starting the experiment, we confirmed that the auditory signals arising from the vibrotactile stimuli with these properties were under the auditory detection threshold in all participants.

Intensity discrimination task

This study used a two-alternative forced choice procedure in an intensity discrimination task. Participants were seated in a piano chair with their right hand on a table. Two sensory stimuli with varying stimulus intensities were delivered to participants every trial. The intensity of the first stimulus in a single trial, termed a standard stimulus, was always set to 20 dB for the auditory stimuli and 10 dB for the somatosensory stimuli above the sensory detection threshold that was assessed prior to the testing for each participant (the detailed method is written in the subsequent section). We used different intensities for the standard stimulus between the two modalities in order to minimize the difference in the intensity discrimination threshold in response to each single sensory stimulus. The Bayesian integration or maximum likelihood models posit that multisensory interactions depend on the accuracy of inference of each sensory stimulus21. For example, if the accuracy of inference of sensory stimulus from one sensory modality is much worse than the other, multisensory interactions are less likely to occur. In the discrimination task, the discrimination threshold in response to each single-modal sensory stimulus corresponds to the accuracy of each sensory inference, and the discrimination threshold depends on the intensity of the standard stimulus34. Thus we adjusted the intensity of the standard stimulus to match the discrimination threshold between the auditory and somatosensory modalities. In our pilot experiment, we confirmed that the discrimination threshold did not differ between the two modalities when we used the abovementioned intensities of the standard stimulus. The second stimulus, termed a comparing stimulus, was delivered at the interstimulus interval of 500 ms with varying stimulus intensities. The differences in the intensity between the standard and comparing stimuli ranged from 0.25 to 4 dB with a step of 0.25 dB. The positive and negative value of the intensity of the comparing stimulus means the larger and smaller intensity of the comparing stimulus than the standard stimulus, respectively. Whether the intensity of the comparing stimulus was a positive or negative value was randomly defined in every trial. Participants were asked to answer whether they perceive the intensity of the comparing stimulus was larger or smaller than that of the standard stimulus. The intensity of the comparing stimulus was determined adaptively using a weighted up-down staircase method35. The experiment required each participant to perform the intensity discrimination task 5 times under different conditions. The adaptive procedure was used because it requires fewer trials to detect the psychometric function of the intensity discrimination and thus prevents attentional fatigue due to the repetition of the task. Each correct response decreases the difference in the intensity between the two stimuli by 0.25 dB in the next trial, each incorrect response increases it by 0.75 dB. The intensity discrimination test consisted of 100 trials.

Procedure

We firstly assessed the detection threshold for each of the somatosensory and auditory stimuli through delivering the sensory stimulus by decreasing/increasing the stimulus intensity at a random interval (2–4 s). Participants were asked to press a key as fast as possible when they perceived a sensory stimulus. Because the simple reaction time is less than 500 ms36, we interpreted that the participants failed to detect the stimulus if the reaction time between the sensory stimulus and the keypress was over 500 ms. The stimulus intensity was decreased/increased by 2 dB if the participants succeeded/failed to make the response in the 500 ms window. The experimenter stopped the measurement if the fluctuation of the stimulus intensity over trials reached a plateau (this was subjectively confirmed by the experimenter). The detection threshold was defined as the averaged intensity across the last 10 trials in which the fluctuation of the stimulus intensity reached a plateau. The detection threshold did not differ between the two groups (Fig. 1, auditory: pianists: − 40.62 ± 4.88 dB (0 dB corresponds to the sound volume of 70 dB speaker level (dBSPL) just in front of the speaker); nonmusicians: − 40.41 ± 7.10 dB; two-sample t-test: t26 = − 0.09, p = 0.93; somatosensory: pianists: − 17.19 ± 3.38 dB (The vibration intensity when an AC voltage of 10 V is applied to the piezoelectric actuator is 0 dB); nonmusicians: − 16.04 ± 2.78 dB; two-sample t-test: t26 =  − 0.87, p = 0.40).

Figure 1
figure 1

The detection threshold to the sensory stimulation in both the somatosensory and auditory modalities in each individual.

After assessing the detection thresholds of the two modalities, participants performed the intensity discrimination test 5 times with different conditions; (1) auditory condition (A), (2) somatosensory condition (S), (3) multisensory integration condition (A + S: auditory + somatosensory), (4) auditory selective attention condition (Aatt), and (5) somatosensory selective attention condition (Satt). The order of the conditions was randomized across participants. In the A + S condition, we delivered the auditory and somatosensory stimuli simultaneously. The intensity of the comparing stimulus differed from that of the standard stimulus by the same level between the two modalities. In the Aatt condition, we delivered the somatosensory and auditory stimuli simultaneously and manipulated the intensity of the comparing stimulus in the auditory modality without changing the intensity of the comparing stimulus in the somatosensory modality as that of the standard stimulus. Thus, the somatosensory stimuli represented no information on the discrimination task (i.e. task-irrelevant information). The procedure of the Satt condition was opposite from that of the Aatt, indicating that the auditory stimuli had no information on the task. Both the auditory and somatosensory stimuli were delivered via different channels of the same D/A converter, which allowed us to precisely synchronize the two stimuli. This synchronization was validated in our pilot experiment.

Data analysis

To obtain a psychometric function of the intensity discrimination performance, we fitted the data obtained from the intensity discrimination task into the cumulative Gaussian distribution function, which is defined as follows;

$$ F\left( {\mu , \sigma } \right) = \frac{1}{\sigma \surd 2\pi }\int_{ - \infty }^{x} {EXP\left( { - \frac{{\left( {x - \mu } \right)^{2} }}{{2\sigma^{2} }}} \right)dt,} $$

where μ represents how much the center of the function leaves from 0, and σ means the standard deviation of the Gaussian distribution, which is identical to the slope of the function. Because the intensity discrimination task we used was an adaptive procedure, the number of trials differed between the intensities of the comparing stimuli. Therefore, we fitted the data by using a weighted non-linear least square algorithm weighted by a square of the number of trials in each intensity. This means that the impact of data with the larger number of trials on the fitting is stronger. The sigma value represents the sensitivity of perceiving differences in two stimulus intensities and is defined as a threshold. Thus, the present study focused on the sigma value in each condition. We excluded one pianist and one nonmusician from data analyses because they always made the same answer if they missed discriminating the intensities of two stimuli. Such a biased answer shifts the center of the psychometric curve abnormally and then impairs accurate estimation of the sigma value.

The sample size was determined so as to fulfill 90% power and a two-tailed 5% level in detecting an interaction between the group and condition factors with a moderate effect size (i.e., partial eta squared value = 0.1), based on our pilot experiment (Supplementary Fig. S1). The pilot experiment was designed to determine a sample size that would, at a minimum, replicate the differences in the auditory and somatosensory integration function between musicians and non-musicians reported in the previous study37. In the pilot experiment, 4 pianists and 4 nonmusicians who did not participate in the main experiment performed the intensity discrimination task in the conditions A, S, and A + S. A two-way repeated mixed ANOVA yielded a moderate effect size of interactions between the group and condition factors (partial eta squared value = 0.1) on the sigma value between the A + S condition and the smaller one of the A and S conditions. Thus, we used this value to calculate the sample size using the power analysis (G*Power ver.3.1.9.4.)38. This analysis determined that the sample size requires 28 across both groups. On the other hand, in the selective attention test, we collected the same number of samples as with the testing of the auditory and somatosensory integration function, because we did not perform a sample size analysis to test the selective attention. Statistical analyses were performed using JASP software (JASP Team 2020) and R. A two-way mixed-design analysis of variance (ANOVA) was used to evaluate the sigma and mu values (group × condition). Mauchly’s test was used to assess sphericity before running each ANOVA. The Greenhouse–Geisser correction was performed for nonspherical data. A partial eta squared value (ηp2) was calculated as the effect size for ANOVA. In addition to these analyses, we calculated a Bayes factor (BF10) for each analysis to quantify the ratio of the likelihood of an alternative hypothesis to the likelihood of a null hypothesis. This is because the Bayesian integration statistics do not need assumptions and corrections for data distribution, and thus can provide a robust model that explains the data. A BF10 value of 5 indicates that the alternative hypothesis is 5 times more likely than the null hypothesis, whereas a BF10 value of 0.2 indicates that the null model is 5 times more likely than the alternative hypothesis. BF10 values above 1 indicate anecdotal, above 3 moderate, and above 10 strong evidence for the alternative hypothesis39,40.

Results

Discrimination perception

Figure 2A and B shows typical results obtained from the intensity discrimination tasks for one representative pianist and one nonmusician, respectively. We fitted the data into the cumulative Gaussian distribution function and obtained the sigma and the mu values that represent the slope and the center of the function, respectively. The dashed line indicates the curve fitted by the psychometric function and the size of the circle indicates the number of trials.

Figure 2
figure 2

Intensity discrimination performance in unimodal and multisensory conditions. (A) and (B) The panels show the psychometric curves obtained from the intensity discrimination task with the five conditions in a representative pianist (A) and a nonmusician (B), respectively. The dashed line indicates the fitted cumulative Gaussian distribution function. The size of each circle represents the number of trials. The vertical axis means the proportion of answering “the intensity of comparing stimulus was larger than that of standard stimulus” at each stimulus intensity defined as a difference in the intensity between the two stimuli.

Multisensory integration

To examine a group difference in the multisensory integration, we first compared the sigma value obtained from the A, S, and A + S conditions (Fig. 3). Two-way mixed ANOVA yielded significant main effects of group (F1,26 = 6.61, p = 0.01, ηp2 = 0.20) and condition (F1.41,36.54 = 4.60, p = 0.03, ηp2 = 0.15) factors, but not the interactive effect between the two factors (F1.41,36.54 = 0.16, p = 0.78, ηp2 = 0.01) on the sigma values. Post-hoc comparison with Shaffer's correction detected that the sigma value in the A + S condition was lower than that obtained from both the A (t26 = 2.06, p < 0.05) and S conditions (t26 = 4.36, p < 0.01). A Bayesian two-way mixed ANOVA also indicated strong evidence for the group + condition model (Table 1, BF10 = 12.16). However, it is possible that, in the A + S condition, the participants performed the intensity discrimination test using only more sensitive modality by paying selective attention to their own superior modalities, rather they integrated the sensory information from the two modalities. Therefore, it still remains unclear whether the superior perceptual performance in the A + S condition resulted from a superior auditory-somatosensory integration function or merely reflected the perception of the more sensitive one of the two modalities by selectively directing attention to one of the two modalities during the A + S condition. To examine this, we further compared the sigma value obtained from the A + S condition and the minimum sigma value of those obtained from the A and S conditions. Eight pianists and 7 nonmusicians showed better performance (i.e. lower sigma value) in the auditory condition than in the somatosensory condition. For the sigma value, a two-way mixed ANOVA yielded a significant interaction between the group and condition factors on the sigma value (Fig. 4A, F1,26 = 11.29, p < 0.01, ηp2 = 0.30). A Bayesian two-way mixed ANOVA indicated moderate evidence for the alternative hypothesis (Table 2, BF10 = 4.98 for group + condition + group × condition model). A simple effects test for the interaction revealed a significant difference in the sigma value obtained from the A + S condition between the two groups (F1,26 = 6.65, p = 0.02, ηp2 = 0.20, BF10 = 3.62). Furthermore, the simple effects test further revealed significant differences in the sigma value between the A + S condition and the minimum sigma value of those obtained from the A and S conditions in both groups. In the pianists, the sigma value of the A + S condition was lower than that of the minimum sigma value (F1,13 = 11.29, p < 0.01, ηp2 = 0.46, BF10 = 9.87). By contrast, there was no significant difference in the sigma value between the A + S condition and the minimum one of those obtained from the A and S conditions in the nonmusicians (F1,13 = 3.47, p = 0.09, ηp2 = 0.21, BF10 = 1.05). These results indicate that the simultaneous presentation of auditory and somatosensory stimuli improved the discrimination perception by about 20% in the pianists, but not in the nonmusicians. For the mu value in the A + S condition and in either the A or S conditions corresponding to the minimum sigma value (supplementary Fig. S2A), a two-way mixed ANOVA yielded no significant main effects of the group (F1,26 = 2.34, p = 0.14, ηp2 = 0.08) and condition (F1,26 = 0.33, p = 0.57, ηp2 = 0.01) factors nor the interaction between the two factors (F1,26 = 0.35, p = 0.56, ηp2 = 0.01). A Bayesian two-way mixed ANOVA also supports the null hypothesis (supplementary Table S1).

Figure 3
figure 3

Group means of the sigma value obtained from the A, S, and A + S conditions in the pianists and nonmusicians. Each circle represents individual data.

Table 1 Bayesian Repeated Measures ANOVA for the sigma values in the A, S, and A + S conditions.
Figure 4
figure 4

Group means of the sigma value obtained from each condition in the pianists and nonmusicians. (A) The multisensory integration (A + S) vs the minimum sigma value of those obtained from the A and S conditions (minimum(A,S)). (B) The unimodal auditory (A) versus auditory selective attention (Aatt) conditions. (C) The unimodal somatosensory (S) versus somatosensory selective attention (Satt) conditions. Each circle represents individual data. *,**: p < 0.05, 0.01.

Table 2 Bayesian repeated measures ANOVA for the sigma values in the A + S condition and the minimum one of the A and S conditions.

Selective attention

Auditory modality

To examine a group difference in the effect of the task-irrelevant somatosensory stimuli on the auditory discrimination perception, we compared the mu and sigma values obtained from the A and Aatt conditions between the pianists and nonmusicians. Two-way mixed ANOVA revealed no significant effects of the group (F1,26 = 0.69, p = 0.42, ηp2 = 0.03) and condition (F1,26 = 0.52, p = 0.48, ηp2 = 0.02) factors nor the interaction between the two factors (F1,26 = 2.65, p = 0.12, ηp2 = 0.09) on the mu values (supplementary Fig. S2B). A Bayesian two-way mixed ANOVA also supports the null hypothesis (supplementary Table S2). By contrast, a two-way mixed ANOVA yielded a significant interaction between the group and condition factors on the sigma value (Fig. 4B; F1,26 = 5.25, p = 0.03, ηp2 = 0.17). The post-hoc power analysis for this data with an α error of 5% suggested the power was above 0.99, which confirmed that the sample size was sufficient to detect the interaction. Bayesian two-way mixed ANOVA also provided very strong evidence for the group + condition + group × condition model on the sigma values (BF10 = 123.75, Table 3). A simple effects test for the interaction further revealed a significant difference in the sigma value in the A condition between the two groups (F1,26 = 4.54, p = 0.04, ηp2 = 0.15, BF10 = 1.80). In addition, the simple effects test revealed a significant difference in the sigma value between the A and Aatt conditions in the pianists (F1,13 = 25.27, p < 0.01, ηp2 = 0.66, BF10 = 138.29), but not the nonmusicians (F1,13 = 2.18, p = 0.16, ηp2 = 0.14, BF10 = 0.66). These results indicate that the unimodal auditory discrimination perception and the effect of the task-irrelevant somatosensory stimuli on the auditory discrimination perception differed between the two groups.

Table 3 Bayesian repeated measures ANOVA for the sigma values in auditory selective attention.

Somatosensory modality

To examine a group difference in the effect of the task-irrelevant auditory stimuli on the somatosensory discrimination perception, we compared the mu and sigma values obtained from the V and Vatt conditions between the pianists and nonmusicians. Two-way mixed ANOVA revealed no significant effects of the group (F1,26 = 0.35, p = 0.56, ηp2 = 0.01) and condition (F1,26 = 1.83, p = 0.19, ηp2 = 0.07) factors nor the interaction between the two factors (F1,26 = 2.21, p = 0.15, ηp2 = 0.08) on the mu values (supplementary Fig. S2C). A Bayesian two-way mixed ANOVA also supports the null hypothesis (supplementary Table S3). Similarly, a two-way mixed ANOVA yielded no significant main effects of the group (F1,26 = 1.97, p = 0.17, ηp2 = 0.07) and condition factors (F1,26 = 1.30, p = 0.27, ηp2 = 0.05) and their interaction (F1,26 < 0.01, p = 0.95, ηp2 < 0.01) on the sigma value (Fig. 4C). A Bayesian two-way mixed ANOVA also supports the null hypothesis (Table 4). These results indicate that the somatosensory discrimination perception did not differ between the two groups and the task-irrelevant auditory stimuli did not affect the somatosensory discrimination perception in both groups.

Table 4 Bayesian Repeated Measures ANOVA for the sigma values in somatosensory selective attention.

Single modality

To examine the differences in the sigma and mu values between the auditory and somatosensory modalities, we performed a one-way ANOVA on the two values. For the sigma values, we found no significant main effect of the modality (F1,27 = 0.61, p = 0.44, ηp2 = 0.02). A Bayesian one-way ANOVA also supports the null hypothesis for the modality effect (BF10 = 0.36) on the sigma values. For the mu values, there were no significant main effect of the modality (F1,27 = 0.22, p = 0.65, ηp2 < 0.01, BF10 = 0.30). These results indicate that the discrimination perception in response to single sensory stimuli did not differ between the two modalities in this experimental setting. However, we found that the absolute difference in the sigma value between the A and S conditions were larger in the nonmusicians compared to that of the pianists (Supplementary Fig. S3, pianists: 0.43 ± 0.40, nonmusicians: 1.42 ± 0.94, mean ± SD, one-way repeated masures ANOVA, main effect of group: F1,26 = 13.18, p < 0.01, ηp2 = 0.34).

Correlation between the detection threshold and multisensory interaction functions

To examine relationships between the multisensory interaction functions and detection thresholds, we performed correlation analyses. Pearson’s correlation analyses yielded no significant correlation between the auditory detection threshold and the multisensory integration function (i.e., the difference between the sigma value obtained from the A + S condition and the minimum sigma value of those obtained from the A and S conditions) (pianist: r = 0.18, p = 0.53; nonmusician: r = 0.39, p = 0.17), and between the somatosensory detection threshold and the multisensory integration function in both groups (pianist: r =  − 0.14, p = 0.64; nonmusician: r =  − 0.41, p = 0.15). In addition, we also found no significant correlation between the auditory selective attention and detection threshold of each modality (Auditory detection threshold, pianist: r = 0.18, p = 0.53; nonmusician: r = 0.22, p = 0.46; Somatosensory detection threshold, pianist: r =  − 0.01, p = 0.98; nonmusician: r = 0.23, p = 0.44), nor between the somatosensory selective attention and detection threshold of each modality (Auditory detection threshold, pianist: r = 0.07, p = 0.81; nonmusician: r =  − 0.24, p = 0.41; Somatosensory detection threshold, pianist: r = 0.06, p = 0.84; nonmusician: r = 0.05, p = 0.87) in both groups. These results suggest that the individual differences in the detection threshold of the two modalities were not associated with the multisensory interaction functions observed in this study.

Discussion

The present study examined multisensory sensory interaction functions at the auditory and somatosensory modalities in pianists and musically untrained individuals. The psychophysical experiments revealed differences in the unimodal and multisensory functions of these modalities between the two groups. The perceptual functions of the unimodal intensity discrimination task were superior in the pianists to the nonmusicians with respect to the auditory modality but not to the somatosensory modality while the detection threshold did not differ between the two groups in both the auditory and somatosensory modalities. For selective attention, the task-irrelevant somatosensory stimuli interfered with the auditory discrimination perception in the pianists, but not the nonmusicians. By contrast, the discrimination perception was improved only in the pianists when the task-relevant auditory and somatosensory stimuli were simultaneously presented. These findings provide the first evidence of unique multisensory interaction functions in highly trained individuals.

Multisensory integration function

A novel finding of the present study was superior intensity discrimination in the pianists to the nonmusicians when the multisensory stimulus was presented. Several previous studies also demonstrated that musicians could react faster41 and perceive the frequency more accurately37 to multisensory stimulus than nonmusicians, suggesting superior multisensory integration function in musicians to nonmusicians. While these studies compared the reaction time and the frequency discrimination perception between the groups when multisensory stimuli were given, none of them compared those behavioral measures between the multisensory condition and the better one of the individual unimodal conditions. Thus, it remained unclear whether the superior behavioral performance in the multisensory condition in musicians resulted from a superior multisensory integration function or merely reflected the superior perception of each unimodal condition. To resolve this critical issue, the present study investigated both unimodal and multisensory functions and supported the former case. Only in the pianists, the sigma value was lower (i.e. more sensitive) in the A + S condition than in the better one of the two unimodal conditions. In piano performance, a weaker keystroke induces smaller intensity of sound and somatosensory feedback, and vice versa. Thus, pianists have abundant experiences of receiving such “matched” multisensory stimuli since childhood, which may develop the specialized multisensory integration function in the intensity domain between the auditory and somatosensory modalities. In the nonmusicians, by contrast, no significant difference in the discrimination perception between the A + S condition and the better one of the two unimodal conditions. One may wonder why there was no auditory and somatosensory integration effect in the non-musicians. There are at least two possibilities. First, we speculate that the difference in the spatial location where each sensory stimulus occurs may prevent integrating the two sensory stimuli in the nonmusicians. The somatosensory and auditory stimuli were generated by a piezo actuator and a speaker, respectively. Although both devices were put in front of the participants, the exact spatial locations differed between the two devices. The nonmusicians may be unfamiliar with this situation, which reduces the auditory-somatosensory integration in the nonmusicians. By contrast, the pianists may be familiar with such a situation because piano sounds are not generated from the piano-key, but rather are generated from strings located in the rear of the piano (i.e. soundboard). Thus, the difference in the spatial locations between these sensory stimuli may prevent the auditory-somatosensory integration specifically in the nonmusicians, which then augmented the group difference in the integration effect between the two modalities. Second, we found that a larger absolute difference in the sigma value between the A and S conditions in the nonmusicians than the pianists. Since the Bayesian integration or maximum likelihood models posit that multisensory interactions depend on the accuracy of inference of each sensory stimulus21, the large difference in the sigma value between the two unisensory conditions in the nonmusicians may reduce the auditory-somatosensory integration in the nonmusicians. In the present study, we challenged to reduce the difference in the sigma value between the two unisensory conditions by adjusting the stimulus intensity of the standard stimulus used in the discrimination test. While the averaged sigma value across participants did not differ between the two unisensory conditions in both groups, it was difficult to adjust the sigma value between the two unisensory conditions for each individual. Thus, this may augment the group difference in the auditory-somatosensory integration effect seen in the present study.

Selective attention in pianists and nonmusicians

We also found that the task-irrelevant auditory stimuli did not affect the somatosensory discrimination perception in both groups. In contrast, the task-irrelevant somatosensory stimuli worsened the auditory discrimination perception more in the pianists than nonmusicians, indicating that the function of selectively ignoring the somatosensory information during auditory processing deteriorated specifically in expert pianists. In the present study, the participants were instructed to ignore the task-irrelevant modality information, which is likely to involve top-down selective attention (suppression of task-irrelevant information and strengthening of task-relevant information)42,43,44. Our results, therefore, suggest the degradation of a function responsible for the suppression of somatosensory information through long-term piano training. Previous studies demonstrated that auditory perturbation applied during performing finger or singing movements less affected control of those movements in musicians than nonmusicians19,20,45, suggesting the improvement of suppressing auditory information during movements in musicians. One of these studies19 further reported that the pianists struck strongly immediately after the auditory perturbation, in order to volitionally facilitate somatosensory information for compensating the disturbed auditory feedback. In fact, musicians do not always play instruments in the same acoustic environment but play music using instruments with different acoustic properties at various concert halls. Somatosensory information would thus play essential roles in successful performance because of its robustness relative to the auditory information that varies largely depending on the environment. Supporting evidence was that disruption of somatosensory but not auditory information declined the singing performance of expert singers46,47, suggesting that expert singers rely more on somatosensory information than on auditory information during singing. It is thus possible that the somatosensory system in musicians has been reorganized through long-term musical training in order to efficiently process somatosensory information. Such plastic adaptation would shape the robust somatosensory processing but instead compromise inhibition of somatosensory information during auditory processing (i.e., auditory selective attention). This view is comparable to speech, in which somatosensory information also plays an important role in both production and perception of speech48,49.

Intensity discrimination in unimodal modalities

In line with several previous studies that demonstrated the superior auditory functions in musicians to nonmusicians50,51,52,53,54, we found that, in the intensity discrimination at the auditory modality, the sigma value of the psychometric curve was lower in the pianists than the nonmusicians. Fine perception of the sound intensity during listening and playing musical pieces is an essential skill for crystalizing piano performance. Furthermore, pianists have been exposed to abundant sound from the piano, which suggests that everyday practice induced training-dependent and/or use-dependent plasticity in the auditory functions, and thereby shaped the accurate perception of discriminating sound intensities55,56. By contrast, the intensity discrimination in the somatosensory modality did not differ between the two groups. Contrary to this, previous studies demonstrated superior somatosensory perceptions assessed by a two-point discrimination task and a tactile frequency discrimination task in musicians compared with nonmusicians28,37,57,58. These contrasting results suggest that neural mechanisms underlying the somatosensory discrimination perception differ between the frequency and intensity domains59. A discrimination task corresponding to the former domain uses static stimulation to a fingertip to which the slowly adapting mechanoreceptors may respond, whereas one corresponding to the latter domain uses the vibrotactile stimulation same as the present study, which activates fast adapting mechanoreceptors. Although the intensity and frequency discrimination tasks use vibrotactile stimulation with similar properties, different neural processes in the nervous system mediate these two types of stimuli60. Thus, it is possible that the effects of extensive piano training on somatosensory discrimination perception differ between the two domains. Indeed, a previous study demonstrated no difference in heaviness discrimination perception between pianists and nonmusicians57.

By contrast, the detection thresholds in the two modalities were not associated with the multisensory integration function, the auditory nor the somatosensory selective attention. In the present study, we adjusted the intensity of sensory stimuli used in the intensity discrimination task based on the detection threshold in each participant to reduce the difference in the intensity discrimination threshold of the unisensory condition between the two modalities. This procedure would remove the effects of interindividual differences in the detection threshold on the amount of multisensory interaction functions. Furthermore, the detection threshold did not differ between the two groups in both modalities, which suggests that, at least in this experimental design, the long-term piano training did not modulate sensory functions associated with detecting unimodal sensory stimuli.

Limitations

There are two limitations to this study. First, this study used a pure tone for the auditory stimulus, not a complex piano sound. Previous studies demonstrated that auditory-tactile interaction is more pronounced when the auditory stimulus consists of a complex tone than of a pure tone32,61. This suggests that the differences in the auditory-somatosensory interactions between the pianists and non-musicians found in this study are more emphasized if the piano tone is used as the auditory stimulus. However, the observed differences in the auditory-somatosensory interactions between the two groups, even with pure tone stimuli, indicate the remarkable differences in the interactions between the two groups. Second, the present study was conducted as a cross-sectional experiment, which thus prohibits any conclusions regarding the causality of whether the unique auditory-somatosensory interaction functions in pianists were acquired through musical training. Acquiring musical expertise requires long-term intensive training since childhood. It is thus difficult to conduct a longitudinal study to examine the effects of musical expertise on perceptual functions. But, it would be worthwhile to examine the effects of a short period of limited musical training on auditory-somatosensory interactions.

Conclusion

The present study examined the multisensory interaction functions in trained individuals. The results from the psychophysical experiments firstly demonstrated that highly trained pianists were superior in the multisensory integration function and inferior in the robustness of the auditory processing against task-irrelevant somatosensory stimuli compared with those of nonmusicians. The extensive auditory-somatosensory experiences through daily piano practicing would shape the unique multisensory interaction functions, which enables pianists to meaningfully integrate the auditory and somatosensory information but instead exacerbates the top-down selective inhibition of somatosensory information during auditory processing.