Ethics statement
The experiment was performed in accordance with the Declaration of Helsinki. Participants provided written informed consent before participation in the study. The experiment protocol was approved by the ethical committee of the Eberhard Karls University in Tübingen, Germany (reference: 355/2019BO1).
Participants
Seven participants (mean age 33.3, SD 14.0, 4 females) were recruited for the study. Three of them were employees of the Max Planck Institute for Biological Cybernetics; the remaining four were recruited from the institute participant pool. Fitness to participate in a simulator study was assessed by questionnaire. Participants were informed of the experimental goals and procedures in compliance with the notion of informed consent. External participants were compensated for their time at a rate of \(\EUR {8}\)/h.
Setup
Stimuli were presented using an eMotion 1500 hexapod motion system (Bosch Rexroth AG, Lohr am Main, Germany) available in our laboratory (Nesti et al. 2017; de Winkel et al. 2017, 2018). The platform was controlled using Simulink software (The MathWorks, Inc., Natick, MA, USA). Participants were seated in an automotive style bucket seat (RECARO GmbH, Stuttgart, Germany) that was mounted on top of the platform. Participants were secured in the seat with a five-point safety harness (SCHROTH Safety Products GmbH, Arnsberg, Germany). To minimize head movements, participants wore a Philadelphia-type cervical collar. Actuator noise was masked by having participants wear earplugs with a 37 dB signal-to-noise ratio (UVEX Arbeitsschutz GmbH, Fürth, Germany) as well as a wireless headset (Plantronics, Santa Cruz, California, United States) that provided active outside noise cancellation and played white noise during stimulus presentation.
Stimuli and tasks
To determine the relative contribution of acceleration and jerk to perceived motion intensity, we created 25 motions. These motions were 1 s forward translations (surge motions), consisting of an acceleration phase, a constant velocity phase, and a deceleration phase. The acceleration phase was defined as \(A_t = A_{\text {max}}\sin ^2(\pi t /t_1)\). \(t_1\) was varied to achieve different combinations of acceleration and jerk within individual motions: there were five levels of maximum acceleration (\(A_\text {max}=[0.5, 1.0, 1.5, 2.0, 2.5]\text {m/s}^2\)) and five levels of jerk (\(J_\text {max}=[20, 30, 40, 50, 60]\text {m/s}^3\)), resulting in 25 different motion profiles. As an illustration, the five motion profiles for the highest acceleration level (\(2.5\;\text {m/s}^2\)) and each jerk level are shown in Fig. 1.
The ability of the platform to accurately reproduce the motion profiles was tested by comparing the commanded motion to actual accelerations, recorded using an accelerometer. The recordings indicated that the platform reproduced the motion profiles accurately (see Fig. 2).
The motions were used in two tasks: a magnitude estimation (ME) task and a two-interval forced choice task (2IFC). To block visual cues, which would provide additional information on velocity (Howard 1982; Pretto et al. 2009), participants performed both tasks with their eyes closed.
ME task In the ME task, participants were asked to attribute intensity ratings to the motions, using an interval measurement scale (Stevens et al. 1946). Participants were presented with a motion after which they provided a response, and then were moved back to the initial position. Participants first completed three practice trials. On these trials, the motion with the middle acceleration and jerk levels (i.e., \(1.5\;\text {m/s}^2, 40\;\text {m/s}^3\)) was presented, which they were told to attribute the value ‘100’. This value served as a reference for subsequent motions: for instance, a motion feeling twice as strong should be attributed twice the reference value. After the practice trials were completed, participants were presented with each of the 25 motions in a random order. The ME task took about 5 min to complete.
The ME task was always performed after the 2IFC task. This was done to be sure participants were familiar with the range of motions, without explicitly informing them of the range of motions and thereby potentially truncating their responses.
2IFC task In the 2IFC task, participants performed pairwise comparisons on 300 experimental trials. To generate the trials, we first formed pairs of the 25 different motion stimuli that were defined. These pairs of motions were generated using the MATLABnchoosek function, which gives all the possible combinations of drawing 2 items out of 25 items \((25!/2!(25-2)!=300)\).
Second, we randomized the order of the motions within the pairs. This was necessary because the MATLAB function returned the pairs as an ordered list in which the first motion tended to have larger acceleration/jerk values than the second. This would be problematic because predominantly presenting motions with larger peak accelerations and/or jerks first would bias the responses toward stating that the first motion of a pair was more intense.
From the five values that peak acceleration could take on and the five for jerk, theoretically nine difference values \(\Delta A_\text {max} = A_\text {max}(\text {motion} 2) - A_\text {max}(\text {motion} 1)\), and nine \(\Delta J_\text {max} = J_\text {max}(\text {motion} 2) - J_\text {max}(\text {motion} 1)\) can be obtained, ranging from the smallest minus the largest to the largest minus the smallest (e.g., 0.5–2.5 = \(-2;\) 2.5–0.5 = 2). Consequently, the \(\Delta\) values for the 300 trials are distributed over a \(9\times 9\) grid. This distribution is not uniform but peaks around the smaller \(\Delta\), because there are relatively more combinations that lead to smaller \(\Delta\) values (e.g., 1–0.5 = 0.5; 1.5–1 = 0.5, 2–1.5 = 0.5, 2.5–2 = 0.5, but only 2.5–0.5 = 2). Because of the randomization of the order of motions within the trials, the distribution of \(\Delta A_\text {max}\) and \(\Delta J_\text {max}\) also differed slightly between participants, and not all points on the grid were presented. This is illustrated in Fig. 3. Finally, we randomized the order of the trials.
Participants initiated each trial themselves by means of a button press. The first motion of a pair was presented 1 s after the trial was initiated; the second motion was presented 2 s after completion of the first. At the end of the trial, participants indicated which motion of the pair was more intense by means of a button press (i.e., ‘first’, or ‘second’). After the response was received, the simulator was moved back to its initial position over 3 s. Including breaks, the 2IFC task took approximately 1.5 h to complete.
Instructions The instructions given to participants were formulated to reflect the idea that responses on the tasks are based on intensity percepts, which result from a combination of information on acceleration and jerk. For the ME task, the (written) instructions were: “your task is to provide subjective ratings on the perceived intensity of motion by attributing numbers to stimuli verbally”. For the 2IFC task, the instructions were: “you will be presented with sequences of two motions, and asked to rate which of the two is more intense”. In addition, we provided the following instruction: “try to perform the tasks intuitively: when the motions have stopped, attribute a number or make a judgment on which motion was more intense based on your first impression”. These instructions were deemed sufficient, as the participants did not ask for additional explanations.
Perception model
We hypothesized that a percept of motion intensity \(\psi\) is constructed from observations \(A_{\text {max}}^*, J_{\text {max}}^*\) (i.e., internal representations) of the maximum acceleration \(A_{\text {max}}\) and jerk \(J_{\text {max}}\) for a given motion, and that the contribution of each variable may depend on the value of the other. The latter means that, for instance, the effect of acceleration may be larger for small jerks than it is for large jerks.
We model the percept as a combination of \(A_{\text {max}}^*, J_{\text {max}}^*\), and their interaction \(A_{\text {max}}^* \times J_{\text {max}}^*\). We also assume that the observations are unbiased and have normally distributed errors. Consequently, the percept is modeled as a normally distributed random variable, with mean \(\mu _\psi\) as
$$\begin{aligned} \mu _\psi = \omega _{A} A_{\text {max}} + \omega _{J} J_{\text {max}} + \omega _{AJ} (A_{\text {max}}\times J_{\text {max}}) \end{aligned}$$
(1)
and variance \(\sigma _\psi ^2\). In Eq. (1) above, \(\omega _{A}, \omega _{J}, \omega _{AJ}\) are the weights for the observations on acceleration, jerk and their interaction, respectively. In the remainder of the text, we will use the same symbols to refer to coefficients for these effects. It should be noted that stimulus-dependent noise (i.e., Weber’s law) is not included in this model. We tried to fit a version of the model that also included stimulus-dependent noise, but in this case the model fitting routine could not find a unique solution. In the following two sections, we describe how the percept may be transformed into responses on the different tasks.
ME task In the ME task, participants have to express their percept verbally, as a number. We assume that this process involves a linear transformation from the perceptual domain to a numerical domain. Using the expressions for the mean and variance of the percept, the probability of a response on the ME task \(r_\text {ME}\) is given by
$$\begin{aligned} \Pr (r_\text {ME}) = \Phi (r_\text {ME},\mu _\text {ME}, \sigma _\text {ME}) \ , \end{aligned}$$
(2)
where \(\Phi (\cdot )\) is the normal distribution function, and
$$\begin{aligned} \mu _\text {ME}&= K \mu _\psi + r_0 \end{aligned}$$
(3)
$$\begin{aligned} \sigma _\text {ME}&= \sqrt{K^2 \sigma _\psi ^2} \ . \end{aligned}$$
(4)
K is the scaling factor from the perceptual domain to the numerical domain; \(r_0\) is an intercept. The factor \(K^2\) is included in the equation for the standard deviation because when a variable (\(\psi\)) is scaled by a factor K, its variance increases by the square of that factor (Freund 1962).
2IFC task In the 2IFC task, the response is the binary outcome of a comparison between the magnitude of two intensity percepts \(\psi _a, \psi _b\). We assume that these percepts are independent, and that participants respond positively, namely that the second motion b of a pair was more intense than the first a, if \(\psi _b > \psi _a\). This particular response is coded as \(r_\text {2IFC} = 1\); the opposite response is coded \(r_\text {2IFC} = 0\).
Consequently, responses reflect the difference between \(\psi _b-\psi _a\). When the difference is positive, this means \(b>a\); when the difference is negative, \(a>b\). Using that the percepts are normal distributed random variables with mean as in Eq. (1) and variance \(\sigma _\psi ^2\), their difference is also a normal distributed random variable with mean
$$\begin{aligned} \mu _{\text {2IFC}} = \mu _{\psi _b}-\mu _{\psi _a} \end{aligned}$$
(5)
and variance and standard deviation as
$$\begin{aligned} \sigma _{\text {2IFC}}^2&= \sigma _{\psi _a}^2 + \sigma _{\psi _b}^2 \end{aligned}$$
(6)
$$\begin{aligned} \sigma _{\text {2IFC}}&= \sqrt{2}\sigma _\psi . \end{aligned}$$
(7)
For a given pair of stimuli a, b, the probability of a positive response is 1 minus the integral over this distribution from \((-\infty ,0]\). This is equivalent to
$$\begin{aligned} \Pr (r_{\text {2IFC}} = 1) = \Phi ^{-1}\left( \frac{\mu _{\psi _b}-\mu _{\psi _a}}{\sqrt{2}\sigma _\psi } \right) , \end{aligned}$$
(8)
where \(\Phi ^{-1}\) is the normal cumulative distribution function, and \(\sigma _\psi\) the common noise parameter. Note that this is effectively a probit model (Bliss 1934), which will be used in separate analyses of the data collected in the 2IFC task.
Model comparisons To evaluate the performance of the model, we compared its overall fit (referred to as ‘full’) to a number of partial models. These partial models either account for subsets of the data, or include a subset of the effects. Comparing the fit of the partial models to the full model allows us to assess whether it is indeed likely that responses on both tasks result from the same perceptual process, and what this process is. Three comparisons were made: as a first alternative model, we combined individual model fits for the two tasks (referred to as ‘add’). This comparison allows us to evaluate whether participants used the same information/strategy in both tasks. We also compared the fit of the full model to different versions of the perception model: one omitting the interaction term, where perceived intensity is a linear combination of acceleration and jerk, named ‘main’; and one that additionally omits the term for jerk, named ‘acc’, where perception depends on acceleration only. Based on these comparisons, we choose the model that provides the most parsimonious description of participants’ behavior. We used the Bayesian information criterion (BIC) score as the basis for these comparisons (Schwarz 1978).
Linear time-invariant systems model
As noted in the introduction, much research on the vestibular system has been performed from an (aerospace) engineering perspective; modeling motion perception based on otolith stimulation as a linear time-invariant (LTI) system (Walsh 1961; Fernandez and Goldberg 1976; Benson et al. 1986; Soyka et al. 2009, 2011; Heerspink et al. 2005; Grant and Haycock 2008; Mayne 1974; Hosman and Van der Vaart 1978; Hosman and Stassen 1999). Whereas an in-depth treatment of LTI systems is beyond the scope of the present paper (for an introduction, see for instance: Soyka et al. 2011), we do include an analysis using this method. The purpose of this analysis is to provide a benchmark for comparison between methodologies typical for psychology and engineering. Moreover, the parameters of available LTI models have all been determined from absolute (direction) detection thresholds. Inclusion of this analysis for above-threshold motion thus also serves to validate these models for a novel range of motions.
The LTI model is a transfer function (Eq. 9), which is based on a simplified model of how acceleration inputs bend the sensory hair cells (cilia) of the otoliths, leading to an output that can be interpreted as proportional to a neural firing rate:
$$\begin{aligned} H(s) = K \times \frac{(1+\tau _N s)}{(1+\tau _1 s)(1+\tau _2 s)}, \end{aligned}$$
(9)
here s is the complex number frequency parameter, K is a gain, which scales the output, and \(\tau _N, \tau _1\) and \(\tau _2\) determine how the output signal changes relative to the input signal in terms of frequency content. The behavior of this transfer function using parameters found by Soyka et al. (2011)Footnote 1 is illustrated in Fig. 4. The figure shows the model output for one of the presently used motion profiles according to this transfer function.