Navigation through three-dimensional space requires keeping track of self-motion relative to an external reference frame (DeAngelis & Angelaki, 2012). Self-motion perception relies on information provided by the vestibular end organs, which are sensitive to both rotation (semicircular canals, SCCs) and translation (otolith organs). Everyday motion typically leads to combined rotatory and translatory input. Indeed, the otolith and SCC signals strongly interact in order to correctly estimate self-motion perception. The processing of vestibular information requires both inputs at the earliest level (Carriot, Jamali, Brooks, & Cullen, 2015; Cullen, 2012). For example, otolith information is ambiguous with respect to the physical motion stimulus (tilt-translation ambiguity), and combined otolith and SCC signals are necessary to disambiguate the sensory input (Angelaki, McHenry, Dickman, Newlands, & Hess, 1999; Merfeld, Zupan, & Gifford, 2001). Interestingly, scant attention has been paid to the perception of vestibular stimuli and the underlying perceptual decision-making process when they occur in succession. Vestibular thresholds are commonly determined when a defined motion stimulus is presented in isolation (e.g. Grabherr, Nicoucar, Mast, & Merfeld, 2008). Two previous studies stressed the necessity to look at the influence of intravestibular interaction—that is, the interplay of otolith and SCC signals—on vestibular direction detection thresholds for better understanding real-world situations (Crane, 2016; MacNeilage, Turner, & Angelaki, 2010). So far, a small number of studies have used nulling and staircase paradigms to investigate motion sequences for vestibular or visuo-vestibular stimuli, and they have found self-motion aftereffects. These aftereffects resulted in the increased intensities of motion required to cancel out the adaptor stimulus (Crane, 2012a, 2012b; Cuturi & MacNeilage, 2014). However, in these studies, the adaptor and target stimuli always consisted of either linear translations or rotations separately.

In the visual domain, a motion aftereffect (MAE) results after prolonged exposure to coherent visual motion: A subsequently presented pattern of stationary dots appears to move in the opposite direction (the waterfall illusion; Adams, 1834). This perceptual phenomenon provides a window into the neural and computational mechanisms that underlie visual motion perception (Anstis, Verstraten, & Mather, 1998; Cuturi & MacNeilage, 2014; Konkle, Wang, Hayward, & Moore, 2009). The dominating explanation for this static MAE suggests a selective adaptation mechanism of motion-sensitive populations of neurons in the primary visual cortex (Huk, Ress, & Heeger, 2001), which implies that MAEs occur as a result of altered stimulus processing (sensitivity). According to such models, prolonged exposure to a specific motion subsequently leads to a reduced firing rate and responsiveness of those neurons (Huk et al., 2001). Assuming that distinct populations of neurons code different directions that are constantly compared, prolonged stimulation with coherent motion in one direction would result in a decreased firing rate of the neurons with that specific directional sensitivity. Thus, when subsequently viewing static patterns, the patterns appear to move in the opposite direction (Anstis et al., 1998). More recent studies started using dynamic instead of static probes. The dynamic MAE differed from the static MAE, leading to the idea that different neural populations are involved (see Mather, Pavan, Campana, & Casco, 2008, for a review). A simplified model associates the static MAE with early areas of visual cortex, whereas higher-level aspects of the dynamic MAE are associated with the middle temporal (MT) and medial superior temporal (MST) areas. Although the dorsal part of area MST is involved in processing self-motion, the role of area MT seems to be limited to motion perception in general (Chowdhury, Takahashi, DeAngelis, & Angelaki, 2009; Ilg, 2008).

However, effects of sequential motion may alternatively manifest themselves in the form of a bias in perceptual decision-making, such that a leftward cue stimulus results in an increased tendency to give a leftward response, and vice versa. This is in line with the spatial-attention account (Posner, Snyder, & Davidson, 1980), according to which an abstract higher-level spatial (direction) cue exerts an influence on the perception of the second motion stimulus. Spatial congruence of the cue and target speeds up responses to the target for short interstimulus intervals (ISIs). Long ISIs, however, result in an impeding effect for spatially congruent trials. This phenomenon is known as inhibition of return (IOR). Importantly, this effect is not caused by low-level sensory properties of the cue, but rather by abstract spatial information. In fact, the allocation of spatial attention can also be induced cross-modally (for reviews, see Driver & Spence, 1998; McDonald, Green, Störmer, & Hillyard, 2012; Spence, 2010).

To date, research on spatial cueing in the vestibular system has been rather scarce (see Figliozzi, Guariglia, Silvetti, Siegler, & Doricchi, 2005, for an exception). This is striking, since vestibular stimuli are essential in the spatial processing of self-motion perception. Moreover, it has been hypothesized for a long time that vestibular processing is closely related to spatial attention (Ferrè, Longo, Fiori, & Haggard, 2013; Figliozzi et al., 2005; Shuren, Hartley, & Heilman, 1998; Silberpfennig, 1941; Vallar, Sterzi, Bottini, Cappa, & Rusconi, 1990).

Here we present three experiments with sequences of motions that focus on intravestibular interaction and their influence on the perceptual decision-making process. Yaw rotation or translation preceded a spatially congruent or incongruent translation or rotation, where all trials consisted of either translations followed by rotations, or vice versa. Participants performed a self-motion direction discrimination task for the second motion stimulus, and we also measured reaction times.

Taken together, the self-motion aftereffect and spatial-attention accounts predict different response patterns in terms of participants’ choices and reaction times. On the one hand, inspired by the MAE literature, a self-motion aftereffect suggests a diminished ability to process a motion stimulus when its direction is congruent with that of the preceding cue stimulus. This effect would be reflected mainly in the participants’ responses (correct or incorrect). On the other hand, a spatial-attention account suggests either facilitation in terms of faster responses (at short ISIs) or IOR (at long ISIs) for congruent motion directions. Facilitation could imply a bias for the motion direction that is congruent with the direction of the preceding stimulus, resulting in decreased reaction times (RTs), while IOR would imply interference leading to longer RTs. Signal detection theory distinguishes between a response bias (decision criterion/intercept) and stimulus processing (d-prime/slope). However, the classical signal detection approach does not take RTs into account, and it is not appropriate for dealing with uncertainty in the decision-making process. (Clark, Yi, Galvan-Garza, Bermúdez Rey, & Merfeld, 2018). When combined with choices, the time taken to respond contains important information about the cognitive process leading to a perceptual decision (Shadlen & Kiani, 2013). Here we apply the drift diffusion model (DDM; Ratcliff & McKoon, 2008) to better exploit the information available in the data. The DDM, based on the joint analysis of participants’ choices and RTs, has become a widely used cognitive model (Shadlen & Kiani, 2013). It allows participants’ performance to be decomposed into different subprocesses, represented by the parameters of the DDM. According to the model, noisy sensory evidence is accumulated over time until a decision boundary is reached (Ratcliff & McKoon, 2008). The speed of evidence accumulation is called the drift rate. Since the DDM deals with two-alternative forced choice tasks, evidence accumulated for one of the response options is counted as evidence against the alternative. The point at which the process of evidence accumulation starts is called the starting point. Two other relevant parameters are the boundary separation, which defines the amount of evidence necessary for a response, and the nondecision time. Hence, DDMs are helpful in capturing the relevant processes underlying decision-making in the perception of subsequent motion stimuli.

Experiment 1

Method

Experiment 1 was based on a modification of the spatial-cueing paradigm by Posner, Snyder, and Davidson (1980), set up to examine potential influences of self-motion stimuli on the perception of subsequent motion stimuli. Similar to spatial cueing in the visual system, a cue stimulus (first motion) was expected to direct spatial attention to one side of space, leading to faster accurate responses when the cue and target were congruent at short ISIs, and slower accurate responses when the cue and target motion were congruent at long ISIs. The spatial-cueing effect is usually found for briefly presented stimuli with a short ISI (Ruz & Lupiáñez, 2002). Therefore, each of the two motion stimuli lasted 200 ms, and the ISIs ranged from 50 to 600 ms.

Participants were first translated horizontally (interaural y-axis, cue) and then rotated about the earth-vertical axis (yaw rotation, target). The first motion was stronger than the second motion, in order to ensure that participants would correctly perceive the cue and shift their attention accordingly. A trial was considered congruent when both motions had the same spatial direction (e.g., leftward interaural translation followed by leftward rotation). A trial was defined as incongruent when the second motion was directed in the opposite spatial direction (e.g., leftward interaural translation followed by rightward rotation). Performance in neutral trials served as a baseline; in these trials, yaw rotation was preceded by a horizontal forward or backward motion (x-axis). On the basis of the spatial-cueing paradigm, neutral trials were not expected to influence the subsequent motion. For both leftward/rightward and forward/backward translations, the first motion was 3.6 times above the threshold established in a preliminary study and similar to that in published data (Valko, Lewis, Priesol, & Merfeld, 2012), and the second motion was 2.6 times above the threshold reported by Grabherr, Nicoucar, Mast, and Merfeld (2008).

Participants

Twelve healthy participants participated in the first experiment (eight male, four female; mean age 27, range 24–30 years). All participants were right-handed according to a German version of the handedness questionnaire by Chapman and Chapman (1987). None of the participants reported a history of neurological, vestibular, or attentional disorders. The study was approved by the ethics committee of the University of Bern, and all participants gave written informed consent prior to the experiment in accordance with the Declaration of Helsinki.

Motion stimuli

A six-degree-of-freedom motion platform (6DOF2000E; Moog Inc., East Aurora, NY) and in-house software were used to generate the motion stimuli. The cue stimuli consisted of translations with single-cycle sinusoidal acceleration and a frequency f of 5 Hz [a(t) = A sin(2πft), T = 1/f] along both the y-axis (left/right) and the x-axis (forward/backward), as had been used in previous studies (see, e.g., Crane, 2012a; Grabherr et al., 2008). The acceleration amplitude (A) was set to 0.25m/s2, resulting in a peak velocity of 0.016m/s (vmax = AT/π) and a displacement of 0.0016 m (Δp = AT2/2π). For the target stimuli, we used the same acceleration profile as for the cue stimuli, but with yaw rotations about an earth-vertical axis (left/right). The acceleration amplitude was set to 24 deg /s2 (vmax = 1.53 deg /s, Δp = 0.153 deg).

Experimental design

Cue congruence (three levels: congruent, incongruent, and neutral) and ISI (four levels: 50, 100, 200, and 600 ms) were varied within each participant. In congruent trials, interaural y-axis translations were followed by yaw rotations to the corresponding side. In incongruent trials, y-axis translations preceded yaw rotations in the opposite direction. Neutral trials consisted of a translation along the naso-occipital x-axis and a subsequent yaw rotation. In-house software based on LabVIEW (National Instruments, Austin, Texas) was used to record participants’ response and RTs.

Experimental procedure

Participants were seated in a car seat with a five-point harness, which was mounted on the motion platform. Their head was fixated by means of a helmet. The experiments were conducted in darkness, and participants were blindfolded to prevent the perception of surrounding visual cues. White noise was delivered to in-ear headphones at approximately 60 dB to mask sounds from the engine of the motion platform. Participants were instructed to indicate the perceived direction of the second motion as fast as possible by pressing the corresponding button with either their left or right hand. The first motion served as a time-varying warning cue, which indicated the beginning of the second motion but did not predict its direction. Practice trials were administered until participants understood the task. Once a participant was comfortable with the task, the motion sets were presented in three blocks. Participants were given sufficient time to rest between blocks. In total, 192 trials were presented randomly (48 congruent, 48 incongruent, and 96 neutral). A trial consisted of a cue stimulus (200 ms), a variable ISI (50–600 ms), a target stimulus (200 ms), time to respond (max. 2,500 ms), time for return to the origin (1,100 ms), and an intertrial interval of about 1,000 ms.

Data analysis

All analyses were computed in R (R Core Team, 2013) using the brms (Bürkner, 2017) and rstan (Guo et al., 2017) packages, which implement Bayesian inference procedures. These procedures provide posterior probability distributions for the estimated parameters. For all calculated statistical models, samples of each parameters’ posterior distribution were drawn with a Hamiltonian Monte Carlo sampling algorithm implemented in Stan. Samples were generated by four independent Markov chains, each with 1,000 warm-up samples, followed by another 1,000 samples drawn from the posterior distribution. Those 1,000 samples for each Markov chain were retained for further statistical inference. To confirm that the samples for each chain converged to the same posterior distribution, the R-hat statistic was used, along with visual inspection of the shape of the posterior distribution and the chains (Gelman et al., 2014). For all calculated models the R-hat statistics were below 1.05. Together with the visually inspected chains, this indicated that all Markov chains converged to the same posterior distribution of the estimated parameters. These posterior distributions can be interpreted as Bayesian credible intervals. Calculations were performed on UBELIX (http://www.id.unibe.ch/hpc), the HPC cluster at the University of Bern. All analyses, models, and data are freely accessible on the Open Science Framework (OSF; https://osf.io/46nqw/).

To see whether participants were able to correctly perceive the direction of the yaw rotation, their performance was analyzed with a multilevel Bayesian logistic regression model implemented in brms, including the factors cue congruence (congruent, incongruent, neutral) and ISI (50, 100, 200, 600 ms). To analyze RTs in more detail, multilevel Bayesian multiple regression models, with a lognormal likelihood function including the factors correctness (correct, wrong), cue congruence (congruent, incongruent, neutral), and ISI (50, 100, 200, 600 ms), were calculated for Experiments 1 and 2.

To jointly analyze participants’ responses and RTs, and to study the bias introduced by the first motion, we modeled the data using two different Bayesian multilevel DDMs (Vandekerckhove, Tuerlinckx, & Lee, 2011) estimated in brms (Bürkner, 2017). This procedure allows for parameter estimates for groups (fixed effects) while considering variation between participants. For the first experiment, the neutral conditions were not included in the statistical models, since they were uninformative with respect to the underlying processes of congruent and incongruent trials. Moreover, the congruent and incongruent conditions were split into their specific directions in order to obtain a separate factor for the cue and target motion. This allowed us to fit a model in which the cue motion influenced the starting point for the direction of the target motion.

To investigate the decision-making process of self-motion perception in more detail, two different DDMs with either a flexible drift rate or a flexible starting point were calculated for Experiments 1 and 2. For the flexible drift rate, we estimated the fixed effects of the cue motion, the target motion, and the ISI, together with by-participant random effects. For the starting point, we estimated random intercepts for participants, but included no covariates. In the model with the flexible starting point for the drift rate, we estimated the fixed effects of the cue motion and the ISI, together with by-participant random effects. Moreover, for the starting point, fixed effects were estimated of the cue motion and the ISI, along with by-participant random effects. Models for Experiments 1 and 2 were compared using the leave-one-out cross-validation (LOO; Vehtari, Gelman, & Gabry, 2017). The LOO is a measure of a model’s predictive accuracy. In the following comparisons, we report the model-specific LOO information criterion (LOOIC) and the LOOIC difference between the two models, as well as their standard errors of the LOOIC (see also Wallis et al., 2017, for a similar procedure). The LOOIC estimates the expected log pointwise predictive density. Multiplication by – 2 converts the measure to the deviance scale. Hence, lower LOOIC values indicate a better model fit. For all DDM models, we used weakly informative prior distributions for the fixed-effect parameters (see the supplementary online material). All other priors were set to the default implemented in brms and can be checked in the analysis file available on the OSF.

To assess whether the DDMs calculated for Experiments 1 and 2 provided a good description, we generated 500 complete datasets (responses and RTs) from each model’s posterior predictive distribution. In a first step, within each dataset, we calculated the mean response probabilities and median RTs for the upper and lower responses for each participant in each experimental condition. In a second step, the 500 datasets were summarized by different quantiles (.025, .1, .5, .9, and .975) for both the predicted response probabilities and the median RTs for lower and upper responses, for each participant in each experimental condition. The predicted medians of the response probabilities and RTs were then compared to the observed responses and RTs. To quantify the distance between the models’ predictions and the observed data, a distance measure was calculated. This measure for the specific models was calculated as the sum of the squared differences between observations and the median predictions for each participant in each experimental condition. The distance measures for the different models are presented in Table 1. Lower values correspond to a closer match between the generated and the observed datasets.

$$ \mathrm{Distance}={\sum}_{i=1}^n{\left(\mathrm{observation}-\mathrm{prediction}\right)}^2 $$
Table 1 Distance measures for the different models in Experiments 1 and 2

The medians and quantiles of the response probabilities and RTs were then averaged across all participants. A comparison of the generated medians to the observed values over all participants is presented in Supplementary Tables 14. Moreover, they are visually compared to the observed data in Figs. 3, 4, and 7 below. The described model checks are based on standard procedures to assess the model fit (Singmann, 2017, 2018).

Results

Accuracy

The logistic regression showed that motion discrimination performance was below chance level when the cue and target motions were congruent. This suggests that participants misperceived the direction of the target motion. Direction discrimination performance was above chance level when a yaw rotation was preceded by an incongruent or a neutral forward–backward motion (Fig. 1, Exp. 1), suggesting that participants correctly perceived the direction of the yaw rotation. The discrimination performance did not differ between the incongruent and neutral motions, since all 95% credible intervals (CIs) for the posterior distributions of the difference included zero. Overall, the response patterns indicate that the direction of the translation leads to a misperception of the direction of the rotation when the directions were congruent—that is, a self-motion aftereffect—except at an ISI of 600 ms. When taking into account the RTs presented in Fig. 2, interpretation of the combined measures (accuracy and RTs) remained difficult and inconclusive. Thus, DDMs were calculated in order to provide more insights into the underlying process of the self-motion aftereffect.

Fig. 1
figure 1

Participants’ performance in all three experiments. Dots, triangles, and squares represent the inverse logit-transformed parameter estimates of the different conditions in the logistic regressions. The lower and upper ends of the lines surrounding them represent the 95% credible intervals. The black lines at .5 illustrate chance level

Fig. 2
figure 2

Participants’ RT data in Experiments 1 and 2. Dots, triangles, and squares represent the transformed parameter estimates of the means from of the different conditions in the lognormal regressions. The lower and upper ends of the lines surrounding them represent the 95% credible intervals

Reaction times

The analysis of the RTs indicated no difference between correct (estimate of the mean = 1.52 s, 95% CI = [1.36, 1.71]) and wrong (estimate = 1.51 s, 95% CI = [1.40, 1.62]) responses when the cue and target motions were congruent at an ISI of 50 ms. For the same ISI, participants were slightly faster for correct (estimate = 1.50 s, 95% CI = [1.40, 1.61]) than for wrong (estimate = 1.73 s, 95% CI = [1.54, 1.95]) responses when the cue and target stimuli were incongruent. This difference was even more pronounced in the neutral condition (correct responses: estimate = 1.31 s, 95% CI = [1.25, 1.38]; wrong responses: estimate = 1.66 s, 95% CI = [1.54, 1.80]). RTs in wrong trials for an ISI of 50 ms did not differ between the congruent, incongruent, and neutral conditions. The slowest RTs for wrong responses were observed at an ISI of 600 ms over all conditions (estimate = 2.41 s, 95% CI = [2.16, 2.69]). Parameter estimates of the means and 95% CIs for all conditions are depicted in Fig. 2.

Drift diffusion models

The LOOICs for the model with the perceived direction of the response and the drift-rate parameter depending of the direction of the first motion (no-bias model), the model with the starting point also depending on the direction of the first motion (bias model), and their LOOIC difference are presented in Table 2 (Exp. 1 columns). The LOOIC values indicate that the fit for the model with the drift-rate parameter depending on the direction of the first motion (no bias) was better, which is reflected in its lower LOOIC.

Table 2 LOOIC values for the models in Experiments 1 and 2

A comparison of the posterior predictive distributions and the observed data for both models in Experiment 1 is presented in Fig. 3; in the left panels are comparison results for the model that included cue and target motion as well as ISI as predictors of the drift rate (no-bias model), and in the right panels are results for the model that included target motion and ISI as predictors for the drift rate, with the starting point depending on the cue motion in combination with ISI (bias model). The first row in Fig. 3 shows comparisons of the predicted (circles) and observed (crosses) response probabilities of responding “right” for the no-bias and bias models across all participants. This comparison shows that the predicted and observed response probabilities of responding “right” are closer in the no-bias than in the bias model. Moreover, the 80% (fat gray bars) and 95% (thin gray lines) credible intervals constructed from the quantiles averaged across participants are narrower in the no-bias model. These results indicate that the model with no bias describes the observed response probabilities better than the bias model. Moreover, the ranges of the predictions for the response probabilities made by the no-bias model are narrower. This reflects decreased uncertainty about the model’s posterior parameter estimates and means that the model is able to make more precise predictions.

Fig. 3
figure 3

Comparison of the posterior predictive distributions and the observed data from Experiment 1, for the model with the drift rate depending on cue and target motion but no shift in the starting point (on the left; no-bias model) and for the model with the drift rate depending on target motion and ISI and with the starting point depending on the cue motion in combination with ISI (on the right; bias model). The median predictions for response probabilities and median RTs averaged across all participants are plotted as black circles. The bold gray bars represent 80% credible intervals, computed from the .1 and .9 quantiles. The thinner gray bars represent 95% intervals, computed from the .025 and .975 quantiles. Together, these indicate the uncertainty of the models’ predictions, with larger intervals indicating more uncertainty. The means of the observed response probabilities and the medians of the observed RT data, averaged across all participants, are shown as crosses. The different ISIs are presented on the x-axis

The second and third rows in Fig. 3 show the predicted (black circles and gray lines and bars) and observed (color crosses) median RTs for “left” and “right” responses. The intervals of the predicted median RTs are equally wide for both models. The observed and predicted median RTs over all conditions are also equally distant. Overall, the model for which the drift rate depends on the cue and target motions as well as the ISI, but with no predictor for starting point (no-bias model), describes the observed data better, especially the response probabilities. This conclusion is also supported by that model’s lower LOOIC. Therefore, the model with no shifted starting point was preferred.

To have more trials for each condition, and therefore more precise parameter estimates for our models, we decided not to distinguish between the explicit directions of the cue and target motions and to summarize the directions in the factor cue congruence. The response was then coded in terms of accuracy. Thus, we estimated an additional model with fixed effects of congruence and ISI, together with random effects of participants for the drift rate (no-bias model, but with correctness as the response variable). For the starting point, we estimated a by-participant random intercept model.

The goodness of fit for this model, based on posterior predictive checks, is presented in Fig. 4, left panels. The model predicts participants’ responses well, since the observed median response probabilities (crosses) are close to the predicted response probabilities (circles) and are within both the 80% and 95% intervals of the predictive posterior distribution. Median RTs for the correct responses match the observed median RTs, as well. However, the model has difficulties describing the RTs for the wrong responses, especially in the incongruent conditions, since the observed median RTs are at the upper end of the 80% and 95% intervals of the posterior predictive distributions.

Fig. 4
figure 4

Comparison of the posterior predictive distributions and the observed data from Experiment 2, for the model with the drift rate depending on cue and target motion but no shift in the starting point (on the left; no-bias model) and for the model with the drift rate depending on target motion and ISI and with the starting point depending on the cue motion in combination with ISI (on the right; bias model). The median predictions for response probabilities and median RTs averaged across all participants are plotted as black circles. The bold gray bars represent 80% credible intervals, computed from the .1 and .9 quantiles. The thinner gray bars represent 95% intervals, computed from the .025 and .975 quantiles. Together, these indicate the uncertainty of the models’ predictions, with larger intervals indicating more uncertainty. The means of the observed response probabilities and the medians of the observed RT data, averaged across all participants, are shown as crosses. The different ISIs are presented on the x-axis

The DDM with correctness as a response variable revealed that evidence for wrong responses was accumulated when the cue and target motions were spatially congruent for ISIs of 50, 100, and 200 ms. The parameter estimates for both incongruent cues and congruent cues at an ISI of 600 ms indicate sensory evidence accumulation for the correct response (Fig. 5).

Fig. 5
figure 5

Comparison of the posterior predictive distributions and the observed data for Experiments 1 (left panels) and 2 (right panels), with the drift depending on the congruence of the directions of cue and target motion and the ISI, no shift in the starting point, and correctness as the response variable. The median predictions for response probabilities and median RTs averaged across all participants are plotted as black circles. The bold gray bars represent 80% credible intervals, computed from the .1 and .9 quantiles. The thinner gray bars represent 95% intervals, computed from the .025 and .975 quantiles. Together, these indicate the uncertainty of the models’ predictions, with larger intervals indicating more uncertainty. The means of the observed response probabilities and the medians of the observed RT data averaged across all participants are shown as crosses. The different ISIs are presented on the x-axis

Discussion

In Experiment 1, the results of logistic regression showed that there are compelling intravestibular self-motion aftereffects in terms of direction discrimination performance when a yaw rotation is preceded by a translation in the same spatial direction. Thus, stimulation of the otoliths influences a sensory decision-making process that is based on information provided by the semicircular canals, so that congruent motions are systematically misperceived. We concluded that the perceptual decision about the direction of the target motion uses an estimate of spatial direction obtained not only from the semicircular canals, but also from the preceding stimulation of the otoliths. Importantly, self-motion aftereffects were found for a specific time window of ISIs of 50–200 ms, but not at 600 ms. This time window is much shorter than that reported in the vestibular MAE literature (Crane, 2012a). A graphical summary of these misperceptions for congruent trials and the underlying decision-making processes is illustrated in Fig. 6. The RT data in Experiment 1 show that participants generally responded faster for correct responses. This was not the case in the congruent conditions for ISIs of 50, 100, and 200 ms. In these conditions, the RTs for wrong responses were even slightly faster. For these ISIs, self-motion aftereffects were observed. This might suggest that participants were convinced to respond correctly but did in fact accumulate evidence for the wrong direction.

Fig. 6
figure 6

Estimates for the effects of congruence and ISI on the drift rate of the drift diffusion models in Experiments 1 and 2. The estimates for Experiment 1 are represented by squares, and the estimates for Experiment 2 by triangles. Estimates for congruent conditions are in a cool color, whereas estimates for incongruent conditions are in a warm color. Vertical lines around the estimates show the 95% credible interval for each estimate

Experiment 2

Method

The results of Experiment 1 raise the question of whether the consistent directional misperception of the second motion (self-motion aftereffect) would also occur if the rotation and translation were reversed. Specifically, we investigated the influence of a rotation (cue) on a subsequent translation (target). The motion trajectories remained unchanged. Only congruent and incongruent cues were presented, since no rotation could serve as a “neutral” cue for lateral translations in the horizontal plane. Both pitch and roll rotations in an upright participant would inevitably lead to otolith stimulation, due to a resulting deviation from the direction of gravity.

Participants

Twelve new participants were recruited for Experiment 2. Two of the participants had to abort the experiment, and we included the remaining ten participants in the analysis (six male, four female; mean age 29, range 23–56 years). All participants were right-handed according to a German version of the handedness questionnaire by Chapman and Chapman (1987). None of the participants reported a history of relevant neurological, vestibular, or attentional disorders. The study was approved by the ethics committee of the University of Bern, and all participants gave written informed consent prior to the experiment in accordance with the Declaration of Helsinki.

Motion stimuli

The cue stimuli were rotations about an earth-vertical axis (right/left) with single-cycle sinusoidal acceleration and a frequency of 5 Hz [a(t) = A sin(2πft) = A sin(2πt/T)]. The acceleration amplitude was set to 33 deg /s2 (vmax = 2.10 deg /s, Δp = 0.021 deg). The target stimuli consisted of translations in the y-axis (right/left) with the same acceleration profile as the cue stimuli. The acceleration amplitude was set to 0.18m/s2 (vmax = 0.011m/s, Δp = 0.0011 m). The linear acceleration was always along the participants’ y-axis.

Experimental design

Within each participant, cue congruence (two levels: congruent, incongruent) and ISI (four levels: 50, 100, 200, and 600 ms) were manipulated. In congruent trials a yaw rotation was followed by a translation along the interaural y-axis to the corresponding side. In incongruent trials a yaw rotation was followed by an interaural y-axis translation to the opposite side. Again, participants’ responses and RTs were recorded.

Experimental procedure

The experimental procedure was similar to that of Experiment 1, differing only in the type of motion sequences presented. In this experiment, 192 trials were randomly presented (96 congruent, 96 incongruent).

Data analysis

Participants’ response accuracy was analyzed with a multilevel Bayesian logistic regression similar to that in Experiment 1, but with the factor cue congruence consisting of only two levels (congruent, incongruent). The details of the joint analysis of accuracy and RTs (DDM) can be found in the Method section of Experiment 1.

Results

Accuracy

The logistic regression model (Fig. 1, Exp. 2 panel) revealed that the motion discrimination performance was below chance level in the congruent conditions for ISIs of 50, 100, and 200 ms, indicating a misperception of the spatial direction of the translation. For an ISI of 600 ms, the discrimination performance was around chance level. Direction discrimination performance was above chance level when a yaw rotation preceded an incongruent motion, meaning that the spatial direction of the target stimulus was correctly perceived. Just as in Experiment 1, the response patterns indicate that the direction of the rotation led to a misperception of the direction of the translation when the directions were congruent—that is, a self-motion aftereffect—except for an ISI of 600 ms.

Reaction times

The analysis of RTs indicated that participants were slightly faster on correct (estimate = 1.21 s, 95% CI = [1.15, 1.27]) than on wrong (estimate = 1.31 s, 95% CI = [1.24, 1.40]) responses when the cue and target motions were congruent at an ISI of 50 ms. For the same ISI, participants were meaningfully faster for correct (estimate = 1.18 s, 95% CI = [1.13, 1.23]) than for wrong (estimate = 1.60 s, 95% CI = [1.47, 1.74]) responses when the cue and target stimuli were incongruent. Additionally, RTs for wrong responses were clearly faster in the congruent than in the incongruent condition. This difference was absent for the RTs of correct responses. The slowest RTs for wrong responses were observed at an ISI of 600 ms over all conditions (estimate = 2.03 s, 95% CI = [1.85, 2.23]). Parameter estimates of the means and the 95% CIs for all conditions are depicted in Fig. 2.

Drift diffusion model

The LOOIC values for the model with the perceived direction as response and the drift-rate parameter depending of the direction of the first motion (no-bias model), the model with the starting point depending on the direction of the first motion (bias model), and their LOOIC difference are presented in Table 2 (Exp. 2). As in Experiment 1, the lower LOOIC value indicates that the model fit for the model with the drift-rate parameter depending on the direction of the first motion (no-bias model) was better.

A comparison of the posterior predictive distributions and the observed data for both models in Experiment 2 is presented in Fig. 7; in the left panels are comparison results for the model that included cue and target motion as well as ISI as predictors of the drift rate (no-bias model), and the right panels are results for the model that included target motion and ISI as predictors for the drift rate, with the starting point depending on the cue motion in combination with ISI (bias model). As in Fig. 3, the first row in Fig. 7 shows comparisons of the predicted and observed median response probabilities of responding “right” for the no-bias and bias models across all participants. This comparison shows that the predicted and observed response probabilities of responding “right” are much closer in the no-bias than in the bias model. Moreover, the 80% (fat gray bars) and 95% (thin gray lines) credible intervals reflecting the posterior predictive distributions are quite narrow in the no-bias model. These observations indicate that the model with no bias describes the observed response probabilities substantially better than the bias model. Moreover, the ranges of the predictions for the response probabilities made by the no-bias model are narrower, indicating the precision of the predictions.

Fig. 7
figure 7

An illustration of the self-motion aftereffects and the underlying perceptual decision-making processes in Experiment 1. In this example, left/right translations are used as the cues and yaw rotations as the targets. The reverse pairing yielded the same results (Exp. 2). Thought bubbles indicate the perceived self-motion direction. Checkmarks represent correct perceptions of self-motion direction, and Xs represent misperceptions. The accumulation of sensory evidence is depicted in the right graph for each cue–target combination. A decision is made when the evidence accumulation process reaches a threshold (left or right). The sensory evidence accumulation always starts at the same starting point, equidistant from both decision thresholds—that is, there is no response bias. In contrast, the sign of the drift rate is switched, depending on the congruency of cue and target motions. In the upper half, the cue and target motions are incongruent. For those conditions, evidence for the actual physical stimulus is accumulated, leading to a correct perceptual decision. In the lower half, the cue and target motions are spatially congruent. In these conditions, sensory evidence is accumulated for the opposite target direction, leading to an incorrect decision

Again, the second and third rows in Fig. 7 show the predicted and observed median RTs for “left” and “right” responses. The intervals of the predicted median RTs are equally wide for both models. The distance of the observed and predicted median RTs over all conditions is slightly closer in the bias model (right panels). However, the no-bias model is also able to describe the observed median RTs.

Overall, the model for which the drift rate depends on the cue and target motions as well as the ISI (no-bias model) describes the observed data better, especially the response probabilities. Again, this conclusion is also supported by that model’s lower LOOIC. Therefore, the model with no shifted starting point was preferred.

As in Experiment 1, we decided not to distinguish between the explicit directions of the cue and target motions and to summarize the directions in the factor cue congruence. Responses were coded in terms of accuracy. Thus, we estimated an additional model with fixed effects of congruence and ISI together with random effects of participant on the drift rate. For the starting point, we estimated a by-participant random intercept model.

The goodness of fit for this model based on posterior predictive checks is presented in Fig. 4, right panels. The model predicts participants’ responses well, even better than in Experiment 1, since the observed median response probabilities (crosses) are both close to the predicted response probabilities (black circles) and within the 80% and 95% intervals of the predictive posterior distribution. The intervals of the posterior predictive distributions are very narrow, even narrower than in Experiment 1. The median RTs for correct responses match the observed median RTs, as well. Again, the model has difficulties describing the RTs for wrong responses, especially in the incongruent conditions, where the observed median RTs are at the upper end of the 80% and 95% intervals of the posterior predictive distributions. This is probably due to the fact that incorrect responses were rare in these conditions.

The accuracy-coded DDM revealed that evidence for the wrong response was accumulated when the cue and target motions were spatially congruent for ISIs of 50, 100, and 200 ms. The parameter estimate for congruent cues and an ISI of 600 ms indicates no systematic evidence accumulation for either response (Fig. 5), and this is also demonstrated by the fact that participants’ accuracy was at chance level (Fig. 1).

Discussion

The results of the logistic regression clearly indicate self-motion aftereffects for short ISIs, with participants misperceiving the direction of translation in the congruent trials. In Experiment 2, we have demonstrated that intravestibular self-motion aftereffects are also present when a yaw rotation is followed by a translation in the same spatial direction. Thus, intravestibular self-motion aftereffects are bidirectional: The processing of otolith stimulation is altered depending on the direction of a previous stimulation of the semicircular canals. Otolith and canal receptors function differently, and yet we found a striking similarity in both combinations (otolith preceded by canal stimulation in Exp. 2, and the opposite order in Exp. 1). The RT data in Experiment 2 also show that participants generally responded faster on correct trials. However, this was less the case in the congruent conditions for ISIs of 50, 100, and 200 ms—that is, when self-motion aftereffects were more pronounced. In contrast to Experiment 1, the RTs were slightly faster for correct than for incorrect trials.

Experiment 3

Method

In the third experiment, we wanted to test whether self-motion aftereffects were coupled with the intensity of the cue and target motions. For rotations about the earth-vertical axis and for horizontal y-axis translations, we chose the same intensities as in Experiment 1; however, the order of motion types was the same as in Experiment 2. Thus, in proportion to the detection threshold, the cue stimulus was weaker than the target stimulus, in contrast to Experiments 1 and 2.

Participants

For Experiment 3, 12 new participants were recruited. One participant pressed the left button on 92% of all trials and was excluded from the study. The remaining 11 participants (nine female, two male; mean age 25, range 21–38 years) were included in the further analysis. All participants were right-handed according to a German version of the handedness questionnaire by Chapman and Chapman (1987). None of the participants reported a history of relevant neurological, vestibular, or attentional disorders. The study was approved by the ethics committee of the University of Bern, and all participants gave written informed consent prior to the experiment in accordance with the Declaration of Helsinki.

Motion stimuli

The cue stimuli consisted of yaw rotations about an earth-vertical axis (left/right). They consisted of single cycles of sinusoidal acceleration at a frequency of 5 Hz. The acceleration amplitude was 24 deg /s2 (vmax = 1.53 deg /s, Δp = 0.0153 deg). The target stimuli were translations along the y-axis (left/right). They consisted of the same acceleration profile as the cue stimuli. Their frequency was again 5 Hz, and the acceleration amplitude was 0.25m/s2 (vmax = 0.016m/s, Δp = 0.0016 m).

Experimental design and procedure

The design and experimental procedure were identical to those of Experiment 2, but using the motion intensities of Experiment 1.

Data analysis

The statistical model for the accuracy data in Experiment 3 did not differ from the analysis of Experiment 2.

Results

Accuracy

The logistic regression model showed that the motion discrimination performance was around chance level (for ISIs of 50 and 100 ms), or even above chance level (for ISIs 200 and 600 ms), when the prior stimulus was spatially congruent with the subsequent motion (e.g., a leftward yaw rotation followed by a left translation). Direction discrimination performance for the second motion was higher when a yaw rotation was followed by an incongruent translation. In this experiment, there were no clear self-motion aftereffects (see also Fig. 1, Exp. 3 panel).

Discussion

In Experiment 3 we wanted to investigate whether the same self-motion aftereffect could be observed if the relative strengths of the cue and target motions were reversed, resulting in a cue that was relatively weaker than the target. Although we observed that accuracy was impaired and that participants performed at chance level, this manipulation did not result in self-motion aftereffects.

General discussion

In Experiment 1, we showed that translations influence the ability to discriminate the direction of yaw rotations, depending on the spatial congruency of the translations and rotations. These same response patterns were also observed in Experiment 2 when a yaw rotation preceded a translation. These results are striking, in that participants systematically misjudged strong motion stimuli well above the detection threshold as a function of the preceding motion stimulus. The RT data from Experiment 1 further suggest that in the congruent conditions participants were confident about correctly detecting the direction of the target stimulus, as they responded slightly faster on incorrect trials. Usually, incorrect responses are accompanied by slower RTs, as was shown for the 600-ms ISI and in the incongruent conditions in Experiments 1 and 2, as well as in the neutral condition in Experiment 1 for all ISIs. The application of a DDM to the results of these two experiments revealed that erroneous self-motion perception is explained better by an altered accumulation of vestibular sensory evidence than by a biased starting point. This implies that the altered perceptual decision-making process and the resulting misperceptions are likely due to altered processing of vestibular information. It is possible that decision-making circuits dynamically accumulate evidence for the wrong direction, because they receive less or no input from the correct motion direction because its sensitivity has been altered.

These results contradict the predictions made by a mechanism that relies on attentional shifts by means of cueing. In fact, from previous studies in other modalities (see Spence, 2010, for a review) one could have expected facilitating effects for spatially congruent trials with short ISIs and an IOR for spatially congruent trials with long ISIs. In the present study, participants showed a consistent response pattern that indicates an actual misperception and not just longer response times as one could predict within the framework of the spatial cueing paradigm. Also, the comparison of the results from Experiments 1 and 3 contradicts spatial cueing. Reversal of the relative magnitude of cue and target intensity is not supposed to alter the results as the directional information given by the cue remains unchanged. However, the results from Experiment 3 differ completely from those of Experiment 1, and therefore, a mere shift of spatial attention can be ruled out. This also suggests that sensory evidence accumulation is not independent of the intensity of the first motion stimulus.

The present study has demonstrated intravestibular self-motion aftereffects (from the otoliths to the SCCs, and vice versa) for the first time. This self-motion aftereffect is interesting for several reasons. The cue and target stimuli in the present experiments relied on different sensory inputs—that is, to the otoliths and SCCs. Previous research has shown that otolith and SCC signals converge at an early stage of processing in the vestibular nuclei (Carriot et al., 2015; Dickman & Angelaki, 2002). Albeit linear and translational angular movements are fundamentally different, we speculate that, at this stage, a common spatial representation of the directional information implied by the rotations and translations might be obtained by integrating rotational and translational sensory afferents. The application of DDMs to the perceptual decision-making process in self-motion aftereffects provided confirmatory evidence. In fact, the drift rate changing as a function of the congruence of the cue and target suggests that the cue motion alters sensory evidence accumulation during real-time vestibular processing. Because the cue and target stimuli relied on different sensory systems, the evidence accumulation process in self-motion aftereffects can be pinpointed to a stage at which the SCC and otolith signals have already converged. A conceivable alternative would have been a strategic bias that operated prior to processing of the sensory evidence. However, this explanation can be ruled out by the results of the DDM.

The observed phenomenon in Experiments 1 and 2 resembles the description of MAEs in the vestibular modality (Coniglio & Crane, 2014; Crane, 2012a, 2012b). Yet, the paradigm used in this study clearly differs from what is normally used to investigate (vestibular) MAEs. Classically, the target motion is adapted to the participants’ previous responses and the adaptor is presented longer. In contrast, in the present experiments strong target stimuli were presented. Although participants are capable of correctly perceiving the direction of self-motion when it is preceded by a neutral or incongruent motion, they consistently misperceive the direction if the preceding motion was congruent, despite the fact that the motion intensity was well above the direction detection threshold (Grabherr et al., 2008; Valko et al., 2012).

The temporal intervals between the two sequences of motion are a crucial factor for self-motion aftereffects. In the present study, the ISIs were selected on the basis of the literature on the spatial-cueing paradigm, ranging from 50 to 600 ms. Again, as compared to the literature on vestibular MAEs, these ISIs are rather short. For example, Crane (2012a) used ISIs ranging from 500 to 3,000 ms and only found consistent MAEs starting at 1,000 ms. Similarly, Coniglio and Crane (2014) observed weaker MAEs with increasing ISIs. In contrast to those studies, here the self-motion aftereffects were only observed at ISIs of 50, 100, and 200 ms. The short duration of the cue stimuli in the present experiments marks the most crucial difference from classical MAE studies: In fact, whereas MAEs possibly arise due to adaptation at a neural level, it is rather unlikely that cue stimuli of 200 ms could lead to adaptation. Hence, although the outcomes of the present experiments may resemble MAEs, the underlying mechanism of this self-motion aftereffect is most likely not neural adaptation.

The present data suggest that self-motion aftereffects arise as a function of the interaction of stimulus duration, stimulus intensity, and ISI. Important data to back up this claim stem from Experiment 3 in the present study. In fact, in Experiment 3 the intensity relative to the threshold was weaker in the cue than in the target motion. Thus, in comparison to Experiment 2, the intensities for the cue and target were reversed. This subtle change resulted in virtually no self-motion aftereffect in Experiment 3, although direction discrimination performance was still worse in the congruent than in the incongruent condition. Notably, previous studies on perceptual aftereffects for prolonged asymmetric passive vestibular stimulation by means of different velocities for different rotation directions have reported asymmetric perceptual responses (Panichi et al., 2011; Pettorossi et al., 2013): Where perceptual responses were enhanced for fast rotations, reduced responses were observed for slower rotations. These results underline the importance of the intensity of a cue stimulus in the perception of a following vestibular stimulus.

The intravestibular self-motion aftereffects suggest that a representation of spatial direction is obtained by combining information from both otoliths and SCCs. The activation of this representation through translations or rotations could lead to an altered sensitivity in self-motion perception. This is in accordance with a study by Nooij, Nesti, Bülthoff, and Pretto (2016), who found that the linear addition of translational and rotational components of the sensory input does not suffice to explain the perceived motion; instead, the components must be combined in a more complex manner. Interestingly, two other studies looking at the intravestibular interaction between otoliths and SCCs found that translation detection thresholds were increased with concurrent rotation (Crane, 2016; MacNeilage et al., 2010). However, yaw rotation detection thresholds were barely influenced by simultaneous translations. This seemingly unidirectional relationship underlines the rather complex intravestibular interaction of the otoliths and SCCs. These results are in line with our findings indicating that yaw rotations as the cue stimuli produced stronger self-motion aftereffects.

Our results suggest that altered sensitivity produces self-motion aftereffects at very short ISIs, whereas at an ISI of 600 ms the effect disappears. The results from Experiment 3 also imply that the amount of sensitivity alteration depends on the intensity of the vestibular stimulation. Importantly, altered sensitivity is in line with altered evidence accumulation. Future studies aiming at better understanding the mechanisms underlying self-motion aftereffects should systematically study the interaction of the ISI, duration, and intensity of cue and target motions, to delineate the limits of the self-motion aftereffect. In particular, the notion of attention in the context of passive self-motion needs to be refined (cf. Figliozzi et al., 2005). Attention can be viewed as a gain control mechanism that modulates sensory processing (Hillyard, Vogel, & Luck, 1998). Brown, Friston, and Bestmann (2011) have discussed this mechanism in the context of motor planning and active inference, and it would be beneficial to investigate the potential role of attentional modulation as a gain control mechanism in the context of evidence accumulation and self-motion aftereffects.

Another important aspect is the influence of proprioception. Participants in typical self-motion aftereffect studies are seated and fixated on a motion platform. Accelerations and decelerations not only stimulate the vestibular system, but also generate proprioceptive cues. The processing of those proprioceptive signals could interfere with the vestibular afferent signals of the second motion and lead to incorrect information processing, which would result in altered sensory evidence accumulation. Thus, the reported self-motion aftereffects might occur at a processing stage at which vestibular information has been combined with proprioceptive information. The misperception of the direction in spatially congruent conditions could foster exploratory behavior similar to the phenomenon of spontaneous alternation behavior (Vecera, Rothbart, & Posner, 1991). Spontaneous alternation behavior is a memory-driven tendency to avoid previously selected spatial locations in order to explore the environment. We speculate that the observed altered evidence accumulation process might lead to a similar behavior. However, it remains to be investigated whether and how these two phenomena interact.

In conclusion, we have shown intravestibular self-motion aftereffects and suggest that this effect may be rooted in altered evidence accumulation in the perceptual decision-making process at the level of a common representation of spatial direction from the SCCs and otoliths. Higher-level decision-making processes have been rather neglected in vestibular psychophysics for a long time. However, in the last few years several studies have acknowledged the importance of such higher-level aspects in self-motion perception and have started to investigate the underlying processes in vestibular (Clark et al., 2018; Ellis, Klaus, & Mast, 2017; Merfeld, Clark, Lu, & Karmali, 2016; Wertheim, Mesland, & Bles, 2001) and multisensory (Drugowitsch, DeAngelis, Angelaki, & Pouget, 2015; Drugowitsch, DeAngelis, Klier, Angelaki, & Pouget, 2014; Lim, Wang, & Merfeld, 2017) perceptual decision-making. Yet, they have important theoretical and practical implications regarding the complex nature of biological self-motion perception. In daily life, we are constantly exposed to stimuli targeting the otoliths and semicircular canals, both simultaneously and in rapid succession. Here we showed the necessity to further investigate such sequences of motions, since they can alter perceptual decision-making processes.

Author note

Authors G.M. and A.W.E. were supported by the Swiss National Science Foundation (SNSF, grant numbers 162480 and 147164, PI: F.W.M.).