Introduction

In an iconic scene of the classic James Bond movie Dr. No (1962), the spy is shown taking a well-deserved rest in his bed when he suddenly feels a surprising touch on his upper arm. He slowly turns his head toward the source of this unexpected sensation and sees a tarantula crawling onto his shoulder, and as the fear sets in, beads of sweat form on his forehead as the spider crawls towards his head. This typical fear response arises, because Bond perceives that the spider poses a bodily threat to him. How does he infer that what he sees, the spider, is also stimulating what he feels on his arm? This is a classic example of multisensory integration, i.e., stimuli registered by distinct modalities (e.g., vision and touch) are inferred to be caused by the same source, i.e., the spider on the skin1,2,3. These inference and integration processes are highly plastic, and research on body ownership has explored how even body representations can be influenced and adjusted depending on incoming (multi-)sensory information4,5 using the Rubber Hand Illusion (RHI)6,7. The RHI is induced when the rubber hand and the participant’s real hand (hidden from sight) are stroked in a temporally and spatially congruent manner8,9. The visual and tactile information are integrated in the brain, while the incoming proprioceptive information from the real hand is seemingly down-weighted. At the same time, participants may experience the illusion of perceiving the rubber hand as part of their body. Functional neuroimaging studies using the RHI have revealed a brain network10,11, comprising the body-selective extrastriate body area (EBA), posterior parietal cortex (PPC), and ventral premotor cortex (PMv), which is thought to integrate sensory information in order to recalibrate peripersonal space12,13,14, to support action15,16. The RHI paradigm has also been used to investigate how emotion processing, related to threat, interacts with the illusionary self-attribution of the fake hand. Ehrsson et al.17 showed that the anterior insula (aIns) and the anterior cingulate cortex (ACC), which are part of an interoceptive network implicated in physiological and emotional processing18,19, were activated when the rubber hand was threatened with a needle while participants experienced the RHI, and that activation in these regions was correlated with participants’ subjective ratings of ownership. The researchers posited that the involvement of interoceptive brain regions during the RHI may add to the vividness of the body ownership experience by including homeostatic emotional components (e.g., pain anticipation, temperature)17. However, it is yet not well understood how the multisensory integration underlying the sense of body ownership may mediate emotion—especially at an early phase of this integration, before the full onset of the actual illusion is experienced8. Another key brain region involved in emotional and sensory processing is the amygdala, known for its role in processing fear and emotional salience of external stimuli20. Peelen et al.21 reported that the EBA showed greater activation when participants viewed body movements representing basic emotions (e.g., bodily expressions of fear or anger) compared to neutral body movements (e.g., walking, jumping), and that the activation of the amygdala was correlated with this modulation of EBA activity, implicating the amygdala in body-related emotional processing. However, whether activation of the amygdala is modulated by the self-relevance of body parts—i.e., their self-attribution or “embodiment”—in the presence of an aversive stimulus remains an open question. And more generally, it is unclear how the brain processes emotionally loaded visuo-tactile stimuli that are related to a self-attributed body part, as compared to a non-self-attributed one.

Recently, virtual reality (VR) has emerged as a new tool to advance the investigation of both body ownership and emotion processing. The brain’s flexibility in representing body ownership has been emphasized by studies using the RHI paradigm in VR9, as well as studies investigating a whole-body transfer illusion22,23. VR also provides greater ecological validity24,25; researchers are able to create more contextualized and realistic experiences, while still maintaining experimental control. Indeed, experiments using VR have shown that the elicitation of emotions is stronger when participants were more immersed in virtual environments26,27,28. Following this, stereoscopic rendering via MRI-compatible VR goggles has been shown to be more immersive than the presentation of 2D stimuli29 and a recent study on emotion regulation that combined VR and functional neuroimaging found activation of the amygdala when participants were immersed in a virtual environment combined with music designed to elicit anguish30. Altogether, VR gives a unique opportunity to investigate the early phase of the multisensory integration underlying the RHI in the presence of emotional stimuli, while being able to record brain activity with fMRI.

Here we intended to investigate the relevance of emotional processing in the context of bodily threat by focusing on the neuronal correlates of processing aversive vs neutral stimuli on embodied (artificial) limbs. Specifically, we aimed with our VR paradigm to extend the available methods of investigating multimodal integration in a controlled manner9,22,23 while also adding an affective component to the stimulus: aversiveness31. We designed an experiment in which we manipulated the congruency of visuo-tactile stimulation (congruent vs. incongruent) on a participant’s arm and the aversiveness of the stimuli (aversive versus neutral). Based on the RHI literature, we hypothesized that a similar network of regions, comprising the EBA, PCC, and PMv, will be activated for the integration of visual and tactile information. Secondly, considering the literature on emotion processing, we speculated that the aIns, ACC, and amygdala would show increased activation during the presentation of an aversive stimulus, as compared to a neutral one. Finally, we postulated that the amygdala, the aIns and the ACC would show an interaction effect, such that congruent-aversive stimulation will elicit higher activations than other stimulus pairings.

Methods

Participants underwent an fMRI scanning session followed by a retrospective questionnaire on their subjective body ownership experience and their emotional response to the stimuli. During the scan, participants were presented with visual objects moving either in the proximal–distal direction (i.e., from elbow to wrist) or distal–proximal direction (i.e., from wrist to elbow) on a virtual 3D-rendered forearm in the same position as their real arm. At the same time, tactile electric stimulation was applied moving in either the same direction as the visual stimulus or in the opposite direction, resulting in visuo-tactile congruent and incongruent trials, respectively. The moving objects were either aversive stimuli (spider) or neutral objects (toy car). Thus, the experimental design comprised a 2 \(\times\) 2 \(\times\) 2 factorial design with congruency, aversiveness, and direction of visual motion (left and right) as independent factors.

Participants

Thirty-three healthy participants (age range: 19–36 years; 20 females; all right-handed (mean Laterality Quotient = 90 as assessed with the Edinburgh Handedness Inventory32); normal or corrected-to-normal vision; no self-reported arachnophobia) participated in the experiment. Three participants’ datasets were excluded due to inattentiveness during the control task (see below), resulting in N = 30 datasets used for the analysis. All participants gave written informed consent before the experiment. The study was approved by the local Ethical Committee of Freie Universität Berlin and conducted in accordance with this approval and the relevant guidelines and regulations.

Experimental setup

The participant’s right arm was placed horizontally across the chest using pillows for support, in a position corresponding to the presentation of the virtual arm. To ensure that the location of visual stimuli in eye-centered coordinates remained the same, participants were instructed to fixate a small red dot in the middle of the virtual forearm (and center of the virtual field of view) throughout the whole experiment. For full, direct vision of the virtual arm, the participant’s head was slightly tilted down towards the chest within the head coil (approx. 20°–30°), and the head and neck were supported with foam padding. Stereoscopic goggles were attached both to the participant’s forehead and to the head coil with Velcro strips to minimize motion during the experiment. The participant’s real arm was completely occluded from view by the goggles. A fiber optic response button box (fORP, Current Designs, Philadelphia, PA) was placed in the left hand to collect responses to an attention task (see below).

Paradigm

There was a total of eight trial types (congruent vs. incongruent × aversive vs. neutral × proximal → distal vs. distal → proximal visual motion). Within each of the six runs, each condition was presented six times (i.e., 48 trials per run). Additionally, one attentional control trial, where the fixation dot briefly blinked (50 ms on/off period), was presented per condition and run (i.e., 8 trials per run); participants were instructed to respond to the blinking fixation dot with a button press. This resulted in a total of 56 trials per run and 336 trials overall. Trials were randomized within each run. Each trial lasted for two seconds, followed by a jittered inter-trial interval of two to six seconds (approx. 7 min per run). After the scan, participants were asked to fill out a questionnaire assessing participants’ subjective ratings of the congruency and aversiveness of the stimuli.

Virtual reality

Digital stereoscopic goggles (VisuaSTIM, 800 × 600 pixels, 30° eye field) and PsychToolbox 3.0.1433 with MATLAB 2016a 64bit (Mathworks, Massachusetts) were used to present a photorealistic 3D-rendered virtual arm in a plausible posture with respect to the real arm (i.e., an anatomically plausible configuration and location in space), with the hand palm down and in a fist (see Fig. 1b).

Figure 1
figure 1

(a) Image of the virtual environment seen by the participants. Only the right-eye view is shown. Stimuli used for aversive (spider) and neutral (car) conditions. The red fixation dot at the center of the forearm blinked briefly during attentional control trials. (b) Details of the set-up inside the MRI room. Five pairs of surface-adhesive electrodes were positioned on the lateral side of the right forearm, from the wrist to the elbow, to enable five stimulation sites. The right arm was placed horizontally on top of the participant’s chest using pillows, fist closed. The left hand was holding the response button box with the arm along the body. The head was slightly tilted in the direction of the chest within the head coil (approx. 20°–30°). Stereoscopic goggles were attached to the participant’s forehead and the head coil (not shown) with Velcro strips to minimize motion.

The stimulus presentation computer was equipped with a NVidia GeForce GTX 750Ti graphic card with two display outputs (one for each eye). For each condition, stereoscopic videos were created with the Unity3D 2017 software package (Unity Technologies, California) and 3D assets available on the Unity Store: Spider: https://assetstore.unity.com/packages/3d/characters/creatures/free-fantasy-spider-10104; Car: https://assetstore.unity.com/packages/3d/vehicles/land/retro-cartoon-cars-cicada-96158; Arm: https://assetstore.unity.com/packages/3d/characters/humanoids/vr-hands-and-fp-arms-pack-77815; Room: https://assetstore.unity.com/packages/3d/environments/morgue-room-pbr-65817. The aversive and neutral stimuli were designed to be as similar as possible, e.g. with the same size, moving speed and starting/end point. However, the legs of the spider were moving to simulate crawling, while the shape of the car was not changing. The background of the videos consisted of a neutral room with a bench, a cabinet and a ceiling light, as well as the virtual right arm with a red dot in the middle (see Fig. 1a). In Unity3D, the distance between the two recording cameras simulating both eyes (and generating the stereoscopic videos) was set to the mean adult interpupillary distance of 63 mm34 to create a 3D effect.

Electrostimulation

Prior to the scanning, five pairs of surface-adhesive electrodes were positioned on the lateral side of the right forearm, from the wrist to the elbow (see Fig. 1b). A constant current neurostimulator (DS7A, Digitimer, Hertfordshire, United Kingdom) was used to deliver electrical pulses (square wave, 0.2 ms duration, mean intensity = 5.94 ± 1.46 mA) to the five stimulation sites. During the experiment, the same stimulus intensity was used for all sites. Electrode positions were adjusted individually so that the pulses from each electrode had comparable intensity and could be spatially discriminated, without producing discomfort, radial stimulation, or muscle contractions. An 8-channel relay card (RX08-LPT, GWR Elektronik) was used to control the administration of pulses. The relay card was operated with MATLAB via the parallel port (LPT) of the computer. Five pulses were always delivered sequentially (500 ms delay) and could start at either the left (wrist) or right (elbow) electrode. The intensity was adjusted to each participant such that they reported a tingling sensation that resembled an insect crawling on their arm.

fMRI data acquisition

The experiment was conducted using a 3 Tesla scanner (Tim Trio, Siemens, Germany) equipped with a 12-channel head coil. T2*-weighted images were acquired using a gradient echo-planar imaging sequence (3 × 3 × 3 mm3 voxels, 20% gap, matrix size = 64 × 64, TR = 2000 ms, TE = 30 ms, flip angle = 70°). Six runs with 176 functional volumes each were recorded for each participant. After the functional runs, a gradient-echo field map (3 × 3 × 3 mm3 voxels, TR = 488 ms, TE1 = 4.92 ms, TE2 = 7.38 ms, 20% gap, flip angle = 60°), and a high-resolution T1-weighted structural image was acquired for each participant (3D MPRAGE, voxel size = 1 × 1 × 1 mm3, FOV = 256 × 256 mm2, 176 slices, TR = 1900 ms, TE = 2.52 ms, flip angle = 9°).

Data preprocessing and analysis

Data were processed and analyzed using SPM12 (Welcome Department of Cognitive Neurology, London, UK: www.fil.ion.ucl.ac.uk/spm/). Images were realigned to the first image of each run to correct for head motion. Each participant’s structural image was co-registered with the realigned functional images, and segmented into white matter (WM), gray matter (GM), and cerebrospinal fluid (CSF). Functional images were spatially normalized to the MNI space using DARTEL35 and spatially smoothed by an isotropic Gaussian kernel of 8 mm full width at half maximum. Data were detrended using a linear mean global signal removal script36. To reduce physiological and systemic noise in the functional data, the first five principal components accounting for the most variance in the CSF and WM signal time-courses, respectively, and the six realignment parameters, were added to the first-level general linear models (GLMs) as regressors of no interest37. Each trial type was modeled as a regressor with a boxcar function (2-s duration) and convoluted with the standard hemodynamic response function from SPM. Attentional control trials were not included.

Behavioral data

During the scan, button presses were recorded and d’ was calculated as an index of maintained attention. Hits were defined as a button response during an attentional control trial; false alarms as a button response during test trials. Perfect rates (phits = 1 or pfalse alarms = 0) were corrected according to the 1/2 N rule38,39.

The post-scan questionnaire consisted in 15-items assessing participants’ subjective ratings of the congruency and aversiveness of the stimuli. To validate that participants perceived the visual and tactile stimuli as synchronous, they were asked “Was the visual moving object synchronized with the tactile stimulation?” The questionnaire also inquired whether participants were aware there was congruent and incongruent visual-tactile stimulation (“Was the tactile stimulation for some trials going the same/opposite direction as the visual moving object?”; one question per congruency condition). To assess the degree to which participants might have experienced the “ownership illusion”, they were asked to rate the following statements (based on Refs.6,7), for congruent and incongruent visual-tactile stimulation separately: “I felt as if I was looking at my own arm and hand”; “I felt as if the virtual arm and hand was part of my body”; “I felt as if the virtual arm and hand were my arm and hand. Body Ownership scores for congruent and incongruent trials were then calculated separately by averaging the ratings. Finally, aversiveness of the spider and car were assessed by asking “Did the moving object make you feel uncomfortable/scared/pleased?” Each item was rated on a seven-point numerical rating scale ranging from “not at all” (0) to “definitely yes” (6).

Statistical analyses

For the fMRI data, on the first level, four trial conditions (congruency × aversiveness; left and right motion direction pooled together) were modeled as regressors, as well as four first-order time-modulated regressors. These latter regressors modeled a linear change over time of the height of the stick functions that were convolved with the canonical HRF40. In addition, five CSF/WM components and the six motion parameters were added, resulting in 19 regressors. Four t-contrasts corresponding to the trial conditions were computed for each participant. On the second level, the first-level images of contrast estimates were used to perform a 2 × 2 ANOVA with Congruency (Congruent, Incongruent) and Aversiveness (Aversive, Neutral) as factors.

Predefined regions of interest (ROI) masks for the bilateral amygdala, insula, and ACC were created using the SPM Anatomy toolbox v3.041. Because no atlas includes a map specifically for EBA and/or PMv, a 10 mm radius spherical ROI was created, centered on coordinates reported in an independent study11 (left EBA, x =  − 50, y =  − 74, z = 6; right EBA, x = 54, y =  − 68, z = 2; left PMv, x =  − 52, y = 8, z = 28; right PMv, x = 52, y = 10, z = 32). This study was chosen because it was one of the first showing the involvement of EBA in the RHI and the reported coordinates are also consistent with later studies (e.g.42).

We then performed a whole-brain analysis with family-wise error (FWE) correction at the cluster level (p < 0.05) using an initial voxel-wise threshold of p < 0.001, uncorrected. Then, following our a priori hypotheses, we additionally report results at p < 0.001, uncorrected within the predefined ROIs, i.e., left/right EBA, left/right PMv, left/right aIns, left/right ACC and left/right amygdala.

For post-hoc tests, we extracted the contrast estimates at the peak activation voxel for each pair of conditions and each subject. Pairwise t-tests were then performed, correcting for multiple comparisons using Bonferroni correction.

To check for potential effect of habituation of amygdala activity in response to aversive stimuli, we carried out an additional analysis, a 2 × 2 ANOVA with this time the first-order time-modulated regressors at the first level. We computed a negative contrast on the aversive congruent and incongruent parametric regressors, in order to investigate a time parametric modulation in the aversive trials.

Regarding the attention task, individual d’s at each run, reflecting the performance of participants in their responses in the attention task, were entered into a one-way ANOVA with Run as a factor (six levels), to test for potential fluctuations in attentiveness to the task.

All ratings collected with the post-scan questionnaire were tested for normality with Shapiro–Wilk tests. As they did not pass these tests for normality, they were analyzed using non-parametric Wilcoxon’s signed-rank tests with α = 0.05. The p-values concerning the stimulus affective ratings (“uncomfortable”, “scared”, and “pleased”) were corrected for false discovery rate (FDR).

Control analyses

After the data collection, we aimed to investigate the potential effects of different individual levels of fear of spiders in our sample. The Fear of Spiders Questionnaire (FSQ)43 was administered retrospectively via email. A subset of 25 participants responded (FSQ scores mean 17.4 ± 18.91). We carried out two additional control analyses on this subset of data. First, we performed a 2 × 2 ANOVA with congruency (congruent, incongruent) and aversiveness (aversive, neutral) as factors, in which the individual FSQ scores were added as a covariate of no interest. Second, we split participants into two groups (based on Ref.44): low (FSQ scores < 15; n = 15) and high (n = 10) fear. We then performed a 2 × 2 ANOVA with Aversive-Congruency (aversive-congruent, aversive-incongruent) and Fear Group (low, high) as factors.

Results

Behavioral results

Participants’ attention, as indexed by d′, ranged from 0 to 3.07 per run. Across participants, d′ did not significantly differ between runs, F(5,179) = 0.18, p = 0.27. Each participant’s mean d′ across runs was calculated and an exclusion criterion of mean d’ = 1.66 was set. Three participants were excluded due to poor performance on the attention task.

Participants reported that they were able to identify that there were congruent trials (mean = 5.70, STD = 0.99), Z = 5.12, p < 0.001, and incongruent trials (mean = 5.73, STD = 0.74), Z = 5.15, p < 0.001. Tactile stimulation subjectively synchronized with the movement of the objects (mean = 5.33, STD = 1.18), Z = 4.73, p < 0.001. Body ownership ratings were higher for congruent stimulation (mean = 3.12, STD = 1.62) than incongruent stimulation (mean = 2.40, STD = 1.69), Z = 3.27, p < 0.001. The p-values of the stimulus aversiveness ratings were FDR corrected; ratings for “uncomfortable” were significantly higher for spider (mean = 2.30, STD = 2.31) than for car (mean = 0.60, STD = 1.07), Z = 3.72, p < 0.001. Ratings for “scared” were higher for spider (mean = 1.8, STD = 1.91) than for car (mean = 0.3, STD = 0.79), Z = 3.51, p < 0.001. Ratings for “pleased” were significantly lower for spider (mean = 0.8, STD = 1.26) than for car (mean = 1.6, STD = 1.81), Z = 2.56, p = 0.009 (see also Fig. S1, Table S2 in Supplementary Material).

fMRI results

Congruence vs. incongruence

Contrasting congruent versus incongruent stimulation revealed higher activation within the PPC for congruent compared to incongruent trials. The left superior parietal lobule (SPL, area 7A; see Fig. 2a, Table 1) showed higher activation in the whole-brain analysis with FWE-correction. When testing in the a priori defined ROIs, the PMv and EBA showed no significant difference in activation between trials at a significance threshold of p < 0.001.

Figure 2
figure 2

(a) Congruent versus incongruent visual-tactile stimulation produced significant activation differences in the left SPL (area 7A; p < 0.05, FWE corrected on the cluster level). (b) Aversive versus neutral stimuli showed significant activation differences in left aIns (area Id7; p < 0.001, uncorrected), and left and right middle temporal area (LOC/hMT+/V5), left and right V1, and right fusiform gyrus (p < 0.05, FWE corrected on the cluster level). (c) Interaction congruency × aversiveness revealed activations in right amygdala (area SF; p < 0.001, uncorrected). Here, activations within anatomical masks of the bilateral amygdala and insula (SPM Anatomy Toolbox41) are shown. Mean contrast estimates of peak activations for both regions are plotted; error bars represent standard error.

Table 1 Significant activation differences obtained from contrasting congruent versus incongruent visual-tactile stimulation, aversive versus neutral conditions, and the interaction between congruency and aversiveness.

Aversive vs. neutral

Contrasting aversive (spider) versus neutral (car) conditions revealed significantly higher activation in left and right middle temporal area, comprising Lateral Occipital Complex (LOC) and hMT+/V5, as well as left and right V1, and right fusiform gyrus (FWE corrected; see Fig. 2b, Table 1).

When testing in the a priori defined ROIs, activation differences were present in the left dorsal aIns (area Id7; p < 0.001, uncorrected; see Fig. 2b, Table 1). No activation differences were found in the amygdala and ACC at a significance level of p < 0.001.

Interaction congruency × aversiveness

The interaction effect of congruency × aversiveness did not reveal any significant clusters of activation in the whole brain analysis. When testing in the a priori defined ROIs, activation differences were seen in the right amygdala (superficial area [SF]; p < 0.001, uncorrected; see Fig. 2c, Table 1). This interaction effect had a Cohen’s d of 1.08 (large effect size). In the congruent condition, amygdala activity was higher in the presence of an aversive stimulus (vs. neutral), but in the incongruent condition, activity was lower in the presence of an aversive stimulus. Moreover, the difference due to congruency was greater for the aversive condition than the neutral condition. The aIns did not show differences in activation at a significance level of p  < 0.001.

Post-hoc pairwise comparisons revealed higher amygdala activation for aversive-congruent than aversive-incongruent condition (p = 0.002 Bonferroni corrected; t(29) = 4.12), as well as for aversive-congruent compared to neutral-congruent (p = 0.029, Bonferroni corrected; t(29) = 3.05). Finally, the amygdala showed higher activity for the incongruent-neutral compared to incongruent-aversive conditions (p = 0.015, Bonferroni corrected; t(29) =  − 3.30). No significant differences in amygdala activity between aversive-congruent and neutral-incongruent were found (p = 1, Bonferroni corrected, t(29) = 0.85).

When checking for habituation of amygdala activity in response to aversive stimuli, we found modulations of activation in the left (p < . 001 within ROI, peak t = 3.49; peak z = 3.36; kE = 3; peak x =  − 18, y =  − 6, z =  − 14) and right (p < . 001 within ROI, peak t = 4.16; peak z = 3.96; kE = 8; peak x = 24, y =  − 4, z =  − 12) amygdala.

Control analyses

Finally, we conducted control analyses to investigate potential effects of individual differences in fear of spiders. Notably, adding individual FSQ scores as a covariate to the 2 × 2 ANOVA with factors congruency and aversiveness did not change the results. Particularly, in the interaction congruency × aversiveness, a significant activation in the right amygdala was still present with the same activation profile (i.e., contrast estimates) for each pair of conditions, though with a slightly smaller t-value compared to our main analysis (p = 0.001, uncorrected within ROI; peak t = 3.19; peak z = 3.07; kE = 3; peak x = 14, y =  − 6, z =  − 14). Furthermore, in the 2 × 2 ANOVA with factors Aversive-Congruency (aversive-congruent, aversive-incongruent) and Fear Group (high, low), no interaction effects were seen in the amygdala (p > 0.05, uncorrected within ROI). However, left V5/hMT+ showed an activation (p = 0.001 uncorrected; peak t = 3.48; peak z = 3.09, kE = 4, peak x =  − 58, y =  − 66, z = 12).

Discussion

In this study, we investigated brain activity of participants experiencing visual stimulation on a VR arm synchronized with tactile stimulation of the real arm. The aversiveness of the visual stimuli was manipulated, as well as the congruency of the visual and tactile stimuli. The goal was to explore the interplay of emotional processes and self-related multisensory integration.

In response to the retrospective questions regarding body ownership, participants reported experiencing more body ownership during congruent versus incongruent trials. The aversive stimulus (spider) was rated as significantly more “uncomfortable” and “scary”, and less “pleasant” than the neutral stimulus (car). Participants could also clearly discriminate between congruent and incongruent stimulation, and the tactile stimulation was experienced as temporally synchronized with the movement of the visual stimuli. In addition, the results of the control task also revealed that participants were able to focus their attention—as indexed by d’, consistently throughout all runs.

The fMRI results showed higher PPC activity during congruent (vs. incongruent) visuo-tactile stimulation, but neither in the EBA nor the PMv at a significance threshold of p < 0.001. Additionally, the aIns and visual areas showed higher activation during aversive (vs. neutral) visual stimulation, though neither the amygdala nor the ACC at a significance threshold of p < 0.001. Finally, testing for interaction effects of congruency × aversiveness revealed higher amygdala activity during congruent and aversive trials, suggesting that the activation of the amygdala while viewing aversive stimuli depended on the success of the multisensory integration—and embodiment of the artificial limb.

Concerning the manipulation of congruency, the area 7A, corresponding to the posterior SPL, showed significantly higher activation when the direction of the tactile stimulation was congruent (vs. incongruent) with the direction of the visual movement. This result is in line with previous research showing that the posterior SPL is associated with visuo-tactile integration and encoding the internal representation of the body45,46,47. This suggest that the virtual arm was more integrated into the participant’s own body representation during congruent visuo-tactile stimulation. However, we did not find a significant difference of activation in the EBA between congruent and incongruent conditions. This could be due to the different set-ups between our experiment and previous RHI studies. In particular, in classical RHI paradigms, the fake arm is displaced from the real arm (e.g.48), whereas in the current experiment, the VR arm was at the same visual location as the participant’s real arm. In the former, this displacement creates a visuo-proprioceptive conflict, which has been linked to the activation of the EBA. The EBA is thought to be involved in the integration of the fake arm in the brain’s internal visual body representation and its activity might largely reflect the process of minimizing the prediction error related to conflicting sensory (visual and proprioceptive) signals11,48. In the latter, there was potentially less visuo-proprioceptive conflict, thus no strong involvement of the EBA. Earlier studies have also revealed activation of the PMv, related to multisensory integration and preparation for action15,16. In our study, we did not find significant differences in PMv activity between the congruent and incongruent conditions. This could be due to the relatively short trial duration (2 s), the visuo-tactile stimulation ending before the full onset of the RHI. Indeed, previous studies investigating the RHI typically applied stimulation for longer period of time (i.e., 30–35 s), and participants reported the start of the illusion 6 to 10 s after beginning stimulation48,49. In that context, Ehrsson et al.10 found that the PMv activity was associated with the after-onset period of the RHI (i.e., approx. 11 s after the start of the stroking). Taken together, the questionnaire and these fMRI and results indicate that during the congruent condition, visual and tactile stimuli were more integrated than during the incongruent condition, consistent with previous studies of multisensory integration in the context of body ownership.

Concerning the manipulation of aversiveness, the aIns, which was previously linked to emotional processing19,50, showed higher activation during aversive (vs. neutral) visual stimulation. The aIns is thought to be a hub where the multiple sensory inputs, affective/motivational signals, and visceral information converge and are integrated in order to detect salient stimuli51. More specifically, the area Id7 belongs to the dorsal part of the aIns52. This area is thought to be involved in various functions, including processing of logic (i.e., negation)52, integration of sensory, emotional and cognitive information53, interoception54. Together with the dorsal ACC and amygdala, the dorsal aIns is also part of the salience network55. While ACC appeared in our aversiveness contrast at a more liberal threshold, there was no significant difference of activation in the amygdala between aversive and neutral conditions. The activation of dorsal aIns could be due to a difference of saliency between the aversive and neutral stimuli: the aversive stimulus could have been detected as salient, triggering an attentional reorienting in order to facilitate its processing56. This interpretation would also be in line with the activations found in visual areas. The bilateral middle temporal area, comprising LOC and hMT+/V5, the fusiform gyrus and V1 showed significantly higher activations for the aversive conditions than for the neutral conditions. It has been shown that higher visual areas comprising LOC and V5 are more activated for aversive than neutral visual stimuli57, even when controlling for non-emotional potential confounds58 (i.e., colors, visual complexity) or accounting for basic visual perception effects59 (i.e., face and scene perception). Moreover, the middle temporal and fusiform gyri were more activated in spider-phobic participants than in controls when viewing pictures of spider60,61. However, although the two visual stimuli were designed to have identical movement characteristics (i.e., starting and finishing points, distance, speed), we cannot exclude that this higher activation during aversive trials may have resulted from the difference in quality of movement of the stimuli (see Limitations below for an additional control experiment to address this point). Particularly, hMT+/V5 is thought to process visual and tactile motion direction62,63,64, and whereas the car moved along the arm without changing its shape, the spider’s legs moved to simulate crawling. In addition to that, the aversive and neutral stimuli were not perfectly matched in terms of low-visual features (i.e. colors, shapes). Therefore, one possibility is that the activation in early visual regions (V1) found in the current study could be rather due to a difference in visual features between the two stimuli, and the activation in higher visual areas (fusiform gyrus, LOC, hMT+/V5) could be purely due to the difference of aversiveness. Finally, our control analyses did not show significant effects of FSQ score in the amygdala, though effects were seen in V5/hMT+. It has been shown previously that amygdala activation, in response to viewing pictures of spiders, was correlated to FSQ scores in phobic patients65. In our non-phobic sample, it seems that, while the activity in V5/hMT+ was modulated by the different levels of spider fear, the activity in the amygdala was not.

The main aim of the study was to investigate the interplay between emotional processing and multisensory integration, thus the congruency × aversiveness effect. The choice of the stimuli in the present experiment was based on a previous study, which found activation of the amygdala when using a video of a spider as a (phylogenetic) threat stimuli with non-phobic participants66. Contrary to what we expected, the amygdala showed no higher activation during aversive trials (versus neutral). Amygdala activity has been shown to be modulated by novelty of the emotional stimuli67 and to decrease when participants are repeatedly exposed to spiders66,68. The results from our control analysis pointed towards a habituation effect during aversive trials, but further studies designed specifically to investigate this phenomenon are needed. Importantly however, the contrast for the interaction between congruency and aversiveness revealed a significant effect on amygdala activity. In the congruent condition, there was higher activation for the aversive compared to the neutral stimuli, but in the incongruent condition this was reversed. The interaction appeared mostly driven by the difference between congruent and incongruent trials in the aversive condition. Surprisingly, the amygdala was less responsive to aversive incongruent than to neutral conditions. The pattern of the interaction suggests that the effect of aversiveness depended on the strength of visual-tactile integration. This pattern could be linked to threat detection and selective attention69. The modulation of attention and the increased response in the visual regions due to emotional stimuli is thought to be modulated by the amygdala70. Previous research has shown that evolutionarily fear-relevant stimuli (including spiders) were detected more quickly (vs. neutral) among distractor stimuli71, and that the amygdala might mediate the capturing of attention when a threat is detected72. Indeed, in healthy participants, attentional blink (i.e., an impairment in the detection of a target if another stimulus precedes it too closely in time) is reduced in the presence of aversive stimuli (vs. neutral), but not in patients with bilateral damage to the amygdala73, indicating that the amygdala plays an important role in the affective modulation of perceptual sensitivity. Therefore, in the context of our study, one possible interpretation could be that, during visuo-tactile congruent trials, the spider may have captured participants’ attention and enhanced perception of the aversive stimulus when the VR arm was perceived more strongly as part of their body, that is, when the stimulus represented a more relevant threat to the bodily self74. The finding of a reduced amygdala activity for the aversive compared to the neutral stimulus in the incongruent condition is puzzling, and more research is needed to understand this effect. Moreover, given the absence of evidence for a difference of amygdala activity between aversive-congruent and neutral-incongruent conditions, one possibility is that incongruent visuo-tactile stimulation could be considered unpleasant in neutral situations. Finally, as the ratings for body ownership were not assessed separately for each aversiveness condition, it is possible that the strength of body ownership was either higher or lower during aversive trials than neutral trials. That is, amygdala activation might have been due to the aversiveness that the spider elicited by threatening the arm which was believed to be part of the participant’s body. Alternatively, it is possible that the virtual arm was less incorporated when the spider was seen on it (compared to the toy car), in an attempt to distance the aversive stimulus from the body. In this case, amygdala activation may be rather due to a higher aversive reaction to incorporating the arm into one’s body schema when there is a stimulus threatening it. In summary, we found that amygdala activity in response to an affective stimulus is influenced by the strength of multisensory integration underlying body ownership. This may suggest enhanced perception of aversive stimuli when they represent a bodily threat.

Limitations

One limitation of our study was that the aversive and neutral stimuli were not perfectly matched in terms of low-level visual properties (e.g., colors, shapes), motion type (e.g., biomotion vs. rigid), realism (e.g., possible vs. not possible), familiarity (e.g., more vs. less), visual complexity (e.g. high vs. low), visual and tactile movement consistency (e.g., discrete vibration and discrete footsteps vs. discrete vibration on constant pressure from wheels), and animacy (e.g., agent vs. non agent). This limitation makes it difficult to distinguish between stimulus—versus emotion-driven differences in brain responses. However, in more ecologically valid settings, identical stimuli only differing in emotional loading are almost never found. Our stimuli were matched in terms of size, moving speed, and starting/end point. The stimulus material could nevertheless be improved by generating for example a variety of aversive and neutral stimuli with balanced low-level, non-emotional features. To address some of these possible confounds, we created new aversive and neutral stimuli and ran a visual control experiment (N = 23). The new aversive stimuli comprised four spiders only varying in color (brown, red, black, gray), and the neutral stimuli consisted of four non-aversive insects (eight-legged “ladybugs”) in the same colors as the spiders. To perfectly match the amount of motion in the stimuli, both types of stimuli had the same leg movements and number of legs, as well as similar body-shapes. It should be noted, however, that some participants mentioned perceiving the ladybugs as “unnatural” because of the number of legs. Contrasting aversive vs. neutral conditions in this control experiment revealed similar results as found in the current analyses, although voxel clusters were more spatially constrained. Namely, we found significantly higher activation in left and right middle temporal area (comprising LOC and hMT+/V5), as well as left and right V1 and right V3 (p < 0.05 FWE corrected; see Supplementary Materials for more details). These results suggest that the current findings are not driven solely by visual differences, but indeed by different emotional processing of aversive and neutral stimuli. Nevertheless, the interpretation of the aversiveness contrast remains somehow inconclusive until the effect is replicated with additional stimuli which can control for other potential visual differences between conditions. Further investigation should also seek to elucidate the mechanisms underlying the interaction between aversiveness and body ownership.

A further limitation was that behavioral ratings (e.g., congruency, body ownership, aversiveness) were assessed only retrospectively and could therefore have been influenced by expectations and memory biases. Future studies could avoid this problem by collecting ratings after each individual trial.

Finally, future studies could make the VR experience even more immersive by improving the tactile stimulation and incorporating electrical pulses that more closely mimic the movement of the visual stimuli, therefore potentially increasing both the emotional arousal27 and the body ownership illusion.

Conclusion

Using a novel, fully automated VR-fMRI setup, the interaction between emotion and multisensory integration underlying body ownership was investigated. The findings from this study add to the evidence that the PPC is recruited during visuo-tactile integration and that the aIns is related to aversiveness. More importantly, we found an interaction effect of congruency × aversiveness in the amygdala. This new finding points towards an enhanced processing of aversive stimuli when body-related information is more strongly integrated into the bodily representation of the self. Overall, the results show that, with the help of sophisticated VR-fMRI paradigms, important scientific questions can be addressed in a novel way, but that the complexity of the setup also poses new challenges in the interpretation of these findings.