The role of vision and proprioception in self-motion encoding: An immersive virtual reality study

Past research on the advantages of multisensory input for remembering spatial information has mainly focused on memory for objects or surrounding environments. Less is known about the role of cue combination in memory for own body location in space. In a previous study, we investigated participants’ accuracy in reproducing a rotation angle in a self-rotation task. Here, we focus on the memory aspect of the task. Participants had to rotate themselves back to a specified starting position in three different sensory conditions: a blind condition, a condition with disrupted proprioception, and a condition where both vision and proprioception were reliably available. To investigate the difference between encoding and storage phases of remembering proprioceptive information, rotation amplitude and recall delay were manipulated. The task was completed in a real testing room and in immersive virtual reality (IVR) simulations of the same environment. We found that proprioceptive accuracy is lower when vision is not available and that performance is generally less accurate in IVR. In reality conditions, the degree of rotation affected accuracy only in the blind condition, whereas in IVR, it caused more errors in both the blind condition and to a lesser degree when proprioception was disrupted. These results indicate an improvement in encoding own body location when vision and proprioception are optimally integrated. No reliable effect of delay was found. Electronic supplementary material The online version of this article (10.3758/s13414-021-02344-8) contains supplementary material, which is available to authorized users.


Introduction
In this report the statistical analyses of the article "Integration of Vision and Proprioception Facilitates Encoding but not Storage: an Immersive Virtual Reality Study" are presented. The aim of the study was to investigate whether integration of vision and proprioception is beneficial for encoding the movement (i.e. the rotation) and if an advantage of this integration is also evident during the storage phase. To this end, sensory conditions, environment conditions, movement amplitude, and recall delay were manipulated.
Here, we focus only on the statistical analysis. For theoretical aspects, aims and hypotheses, experimental design, and interpretation of the results, the reader can refer directly to the manuscript.

Study summary
Participants completed a self-turning task. Using a customized swivel chair, participants were rotated by one of the experimenters and were then asked to rotate themselves back to their starting position.
The angle reproduction error produced by the participant, computed as the absolute rotation difference between the starting position and the final position, was considered as the outcome variable to evaluate participants performance. Each participant performed the task in different sensory and environment conditions. In particular, participans completed the task in a real environment (R) and in Immersive Virtual Reality (IVR). In each of these two environments, one blind condition removed all visual information such that only proprioceptive information could be used (P condition). Another condition limited the access to visual landmarks (removing visual information about the body and corners of the room while retaining the overall use of vision) in order to disrupt proprioception (V condition). Finally, there was a condition where reliable visual and proprioceptive information were available (VP condition).
Moreover, to control whether the amplitude and recall delay would affect performance, each condition was completed with two rotation amplitudes (approximately 90 and 180 degrees) and with three delays (0, 3, and 6 seconds) respectively.

Supplemental material
The Supplemental material is divided into different sections: • Section 2: the statistical approach and the plan of analysis are presented.
• Section 3: the dataset is presented with a brief description of each variable.
• Section 4: the descriptive statistics are presented.
• Section 5: the model selection procedure is presented.
• Section 6: the selected model is presented.

Statistical Approach
In this section, the statistical approach and the plan of the analysis are presented. First, the Bayesian model selection approach is described. Subsequently, we discuss the reasons why Bayesian Generalized Mixed-Effects Models are used. Finally, the plan of the analysis is summarized.

Bayesian model selection
Bayesian model selection approach is used to explore which variables and interactions influence accuracy in the self-turning task.
• Bayesian refers to the fact that analysis is performed in a Bayesian framework. Bayesian statistical approach is presented in the next section (see Section 2.2).
• Model selection is a widely used exploratory approach that allows to identify the set of variables related to the outcome of interest. Starting from the most general model that encompasses all the variables and interactions of interest, at each step predictors are removed according to some criteria until a "preferred" model is obtained. In the present analysis, models are evaluated according to information criteria (Wagenmakers & Farrell, 2004;Yamashita et al., 2007). At each step, predictors are removed if the resulting model is better than the previous model according to information criteria.
• Information criteria provide an estimate of the average deviance (i.e. error) of the model's ability to predict new data, penalizing for model complexity (i.e., number of parameters; McElreath, 2016). Therefore, information criteria allow to evaluate models considering the trade-off between parsimony and goodness-of-fit (Vandekerckhove et al., 2015;Wagenmakers & Farrell, 2004): as the complexity of the model increases (i.e. more parameters) the fit to the data increases as well but generalizability (i.e. ability to predict new data) decreases. The researcher's aim is to find the right balance between fit and generalizability in order to describe, with a statistical model, the important feature of the studied phenomenon and not the random noise of the observed data. In the present analysis, the Watanabe-Akaike Information Criterion (WAIC; Gelman et al., 2014;Piironen & Vehtari, 2017;Vehtari et al., 2017) was used as the information criterion to evaluate the models. WAIC is the corresponding Bayesian version of the commonly used Akaike information criterion (AIC; Akaike, 1973). In line with the Bayesian approach (see Section 2.2), WAIC has the desirable property of averaging over the posterior distribution rather than conditioning on a point estimate. Lower WAIC values are interpreted as indicators of a better model (as with all the information criteria).
In this exploratory analysis, Bayesian model selection approach allows us to identify the "preferred" model that offers a parsimonious and "good enough" fit for the data.

Bayesian generalized mixed-effects models
Given the complex structure of the data, Bayesian generalized mixed-effects models are used. Specifically, data are characterized by: (1) a continuous non-normally distributed dependent variable (i.e. rotation error); (2) a within-subject factor (i.e. delay); (3) within-subject factors (i.e. perception and environment); (4) a quantitative independent variable (i.e. rotation amplitude).
• Mixed-effects models allow to take into account the repeated measures design of the experiment (i.e. observations nested within subjects). Thus, participants are treated as random effects, with random intercepts that account for interpersonal variability, whereas other variables were considered as fixed effects.
• Generalized linear models are used considering the Gamma distribution, with logarithmic link function, as the probability distribution of the dependent variable. Generalized linear models allow to model non-normally distributed data using appropriate probability distributions that reflect the characteristics of the data (Fox, 2016). Selecting an appropriate probability distribution provides better fit to the data and more reliable results (Lo & Andrews, 2015). Gamma distribution is advised in the case of positively skewed, non-negative data, when the variances are expected to be proportional to the square of the means (Ng & Cribbie, 2017). These conditions are respected by our dependent variable (see Section 4.5): only positive values were possible, with a positive skewed distribution, and we expected a greater variability of the possible results as the model predicted mean increases (i.e. a greater dispersion of participants' scores when greater mean values are predicted by the model).
• Bayesian approach is a valid alternative to the traditional frequentist approach (Gelman et al., 2013;Kruschke & Liddell, 2018) and in the case of mixed-effects models it allows for accurate estimation of complex models that would otherwise fail to converge (i.e. unreliable results) in a traditional frequentist approach (Bolker et al., 2009;Fong et al., 2010). Without going into philosophical reasons, which are beyond the scope of the present work (if interested consider Romeijn & van de Schoot, 2008), Bayesian inference has some unique elements that make the meaning and interpretation of the results different from the classical frequentist approach (Etz & Vandekerckhove, 2018). In particular, in the Bayesian approach, parameters are estimated using probability distributions (i.e. a range of possible values) and not a single point estimate (i.e. a single value). Bayesian inference has three main ingredients (van de Schoot et al., 2014): (1) Priors, the probability distributions of possible parameter values considering the information available before conducting the experiment; (2) Likelihood, the information given by the observed data about the probability distributions of possible parameter values; (3) Posteriors, the resulting probability distributions of possible parameter values, obtained by combining Priors and Likelihood through Bayes' Theorem. As a result, the Bayesian approach assesses the variability (i.e. uncertainty) of parameter estimates and provides associated inferences via 95% Bayesian Credible Intervals (BCIs): the range of most credible parameter values given the prior distribution and the observed data. Thus, the Bayesian approach allows for the description of the phenomenon of interest through probabilistic statements, rather than a series of simplified reject/not-reject dichotomous decisions typically used in the null hypothesis significance testing approach (McElreath, 2016).

Plan of the analysis
Analysis are conducted with the R software version 4.0.2 (R Core Team, 2018). First, descriptive statistics are presented in Section 4 to observe the main characteristics of the collected data. Number of observations, summary statistics, and graphical representations of the variables of interest are reported according to experimental conditions. Moreover, the correlation between amplitude of the passive rotation and duration of the passive rotation is evaluated to assess consistency across trials (see Section 4.6). After, the model selection procedure is conducted to evaluate the possible interactions between Self-turn error, environment, perception, delay, and amplitude (see Section 5).
Models are estimated using the R package brms (Bürkner, 2017) which is based on STAN programming language (Stan Development Team, 2017a, 2017b) and employs the No-U-Turn Sampler (NUTS; Hoffman , an extension of Hamiltonian Monte Carlo (Neal, 2012). The basic idea of estimating parameters through a sampling procedure is to iteratively poll possible parameter values from pre-specified prior distributions. Parameter values that support the data are rewarded, whereas those values that do not support the data are penalized. After a large number of iterations, convergence upon the model parameters that optimally represent the data is obtained. Iterations of the estimation procedure are split among independent chains (i.e. independent sampling processes) to ensure that the model reliably converges on the same parameter values. Since each chain has a random starting point, it requires a certain number of iterations before reaching an optimal convergence. Thus, for each chain the initial samples, (i.e., "warm-up"), are discarded and only the subsequent samples are used for inference purposes.
The "preferred" model is presented in Section 6. Posterior distributions of the model parameters are reported. Model fit (i.e. the model's ability to explain the data) is evaluated using the alternative Bayesian definition of R 2 proposed by Gelman et al. (2018) (see Section 6.2) and qualitative visual inspection of the posterior predictive check (Kruschke, 2013). Moreover, the plausibility of effects is evaluated according to the evidence ratio between the selected model and the model without the effect considered (see Section 6.3).
To interpret the effects, posterior predicted means are computed and represented graphically, together with Bayesian pairwise comparisons (i.e. predicted score differences between experimental conditions). Finally, to quantify the differences between experimental conditions, the effects are expressed as the ratio between the conditions of interest.

Data Presentation
The dataset was formed by 1723 observations on 14 variables.
1. ID. Categorical variable with the unique ID (from 1 to 48) of each participant.
2. Age. Numerical variable representing participants' age in years.
3. Environment. Categorical variable for the environmental conditions of the experiment. Levels of the variable are "R" for Reality and "IVR" for Immersive Virtual Reality.

4.
Perception. Categorical variable for the sensory conditions of the experiment. Levels of the variable are "P" for Proprioception, "V" for Vision, and "VP" for Vision + Proprioception.

5.
Condition. Categorical variable between 1 and 6 to specify the experimental condition.
• 1 ) Reality and Proprioception • 2 ) Reality and Vision • 3 ) Reality and Vision + Proprioception 14. Error. Numerical variable representing the absolute error committed by the participant in the Selfturn task, computed as the absolute rotation difference between Start_Position and Return_Position.
Below we report the structure of the dataset.

Delay as categorical variable
Note that the variable Delay is considered as a categorical variable and not as an ordinal variable or a continuous variable. No definite hypothesis regarding the relationship between the delay and the error committed by the participant in the Self-turn task are available. Greater errors might be expected with the increasing of the delay but the type of interaction (i.e., linear or non linear) and possible thresholds above which it would be possible or not to observe the effect are unknown. Therefore, also given the limited number of the delay conditions, Delay is considered as a categorical variable. Participants ranged from 18 to 26 years of age. The mean age of the participants is 21.6 years with a standard deviation of 2.4 The distribution of the age of participants is reported in Figure 1.

Number of observations
Out of the 48 participants, 43 participants completed the task in all 36 conditions and 5 participants completed 35 conditions (Table 1). Thus, the final dataset consists of 1723 observations nested in 48 participants. Note: n participants = 48; n observations = 1723.
The number of observations according to experimental conditions is reported in Table 2.

Passive duration
In Figure 4  To obtain interpretable results in the analyses, Passive duration values are standardized (i.e., Z-scores are obtained).

Self-turn error
Given that observations are not independent but nested within participants, to compute descriptive statistics in each condition first the mean Self-turn error is obtained for each participant and then mean and standard deviation are computed.
In the present sample, the mean Self-turn error is 11.85 degrees (SD = 3.26).

## [1] 3.261765
Means and standard deviations according to the different experimental conditions are reported in Table   3. Note: IVR = Immersive Virtual Reality. n participants = 48; n observations = 1723 The frequency of the observed values is reported in Figure 5. Finally, we represent in Figure 6 the distribution of the observed Self-turn error according to the different experimental conditions.

Correlation beetween amplitude and duration
Tp evaluate the consistency across trials, the correlation between amplitude and duration of thee passive rotation is evaluated.

Model Selection
In this section, the model selection evaluating the possible relations between Self-turn error, Environment, Perception, Delay, and Amplitude is presented. First, the initial model is described considering model definition and prior settings. Subsequently, the different steps of the model selection are reported.
5.1 Initial model

Generalized Linear Model
We specify the Gamma distribution with logarithmic link function as the family distribution for our Bayesian Generalized Mixed-Effects Models. Given that there are some 0 values in the Self-turn error, in order to use the logarithmic link function we need first to add 1 to all the errors. This is done to avoid indefinite values (i.e. log(0) = −∞).

Model formula
The initial model considers the Self-turn error (Error1, see Section 5.1.1) as the dependent variable and as independent variables the environment of the experiment (Environment), the perception condition In the model all the interactions between Environment, Perception, Delay, and Amplitude_st are considered up to the 4-way interaction. The initial model is presented below using R syntax where "∼" stands for "is predicted by", "+" stands for additive effect,":" stands for interaction effect, "*" stands for factor crossing (i.e., all the interactions between variables are considered), and "(1|variable name)" stands for random effect on the intercept. Model formula: To illustrate all the interactions considered in the model, all the effects are reported in Table 4. 4-way interaction Environment:Perception:Delay:Amplitude_st Note: ":" stands for interaction effect. "(1|variable name)" stands for random intercept

Models Priors
All models use default prior specification of the R package brms (Bürkner, 2017) that are considered to be only weakly informative in order to influence the results as little as possible, while providing at least some regularization to considerably improve convergence and sampling efficiency. This allows posterior distributions to be mostly influenced by the observed data rather than by prior information.
In particular, for the population-level effects (i.e. fixed effects) brms uses an improper flat prior over the reals. Group-level effects (i.e. random effects) are restricted to be non-negative and, by default, they are set to student_t(3, 0, 2.5), that is a half student-t prior with 3 degrees of freedom and a scale parameter of 2.5. Intercept prior is set to student_t(3, 2.1, 2.5), that is a student-t with 3 degrees of freedom, location parameter of 2.1 and scale parameter of 2.5. Finally the prior for the shape parameter for the gamma distribution is set to gamma(0.01, 0.01). In Figure 7 the different prior distributions are presented.
Remember that by default, the population-level intercept is estimated separately and not as part of population-level parameter vector b. Moreover, to increase sampling efficiency, the population-level design matrix X is centered around its column means X_means. Thus, intercept prior is defined on the centered design matrix and not on the real intercept.

Models Estimation
Each model is estimated using 6 independent chains of 4,000 iterations with a "warm-up" period of 2,000 iterations, resulting in 12,000 usable samples. Convergence is evaluated considering R-hat diagnostic criteria, where values close to 1 indicate convergence and 1.100 is the proposed threshold for convergence (Gelman & Rubin, 1992

Model selection steps
Starting from the initial model m_0, predictors are removed to obtain a better model according to the WAIC (i.e., lower WAIC values means better model). Each step in the model selection represents a new model which is better than the model in the previous step. The R code for the model selection procedure is reported below and model selection steps with relative information for all models are presented in Table 5.
Relative WAIC weights are computed to assess the probability of each model to make the best prediction on new data, conditional on the set of models included in the model selection procedure (McElreath, 2016). In Figure 8   Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Step 7

First Model Presentation
Model m_9 is selected as the "preferred" model by the model selection procedure at step 7. The selected model has the probability of being the best model of 0.64 that is superior to all the other models (m_8 has a probability of 0.2 and all the others models have probabilities lower than 0.09).

selected_model<-m_9
In this section, first, the estimated parameters of the selected model are reported. Then, the model fit is

Selected model
The selected model includes the 3-way interaction between Environment, Perception, and Amplitude_st, the 2-way interaction between Perception and Delay, and the random intercept (1|ID). The formula of the selected model is: To illustrate all the interactions considered in the model, all the effects are reported in Table 6. 3-way interaction Environment:Perception:Amplitude_st Note: ":" stands for interaction effect. "(1|variable name)" stands for random intercept

Parameter posterior distributions
The 95% Bayesian Credible Intervals (BCIs) of the parameter posterior distributions are reported in Table   7. The 95% BCIs represent the range of the 95% most credible parameter values given the prior distribution and the observed data. Posterior parameter distribution and trace plots are presented in Figure 9 Note: Baseline category for Environment is reality (textttR). Baseline category for Perception is proprioception (textttP). Baseline category for Delay is texttt0 seconds. IVR = Immersive Virtual Reality. V = Vision. VP = Vision + Proprioception. n participants = 48; n observations = 1723

Model Fit
Commonly, the model fit (i.e. the model's ability to explain the data) is assessed using statistical indexes as R 2 (Xu, 2003). However, the usual definition of R 2 (i.e. the variance by the model divided by the total variance of the data) is problematic within the Bayesian framework since the numerator can be larger than the denominator (Gelman et al., 2018). To overcome this issue, Gelman et al. (2018) proposed a Bayesian alternative definition of R 2 : where r n = y n −ŷ n are the residuals of the fitted model. Given this definition, Bayesian R 2 is always between 0 and 1 by construction, and thus it follows the same interpretation as the common R 2 : an estimate of the proportion of variance explained.
The estimated value of Bayesian R 2 for the selected model is 0.35 (95%BCI = 0.3; 0.4), which means that the model explains 35% of the variability of the data (Table 8). Posterior predictive check is an alternative way to evaluate the goodness-of-fit of a model (Kruschke, 2013). Qualitative visual inspection of the posterior predictive check allows to assess whether data predicted by the model resemble the observed data or if the residuals between actual and simulated data follow certain patterns.
In Figure 10

Effects evidence ratio
To evaluate the relative plausibility of the 3-way interaction between Environment, Perception, and Amplitude_st and the 2-way interaction between Perception and Delay, the evidence ratios between the selected model and the models without the effects of interest are considered.

Posterior effect distributions
In order to interpret the results, the posterior parameter distributions are used to compute the predicted distributions of Self-turn error means considering the experimental conditions of interest and marginalizing for the other variables (i.e., averaging over the other variables). To obtain results on the original scale, the value 1 is subtracted from the predicted values (see section5.1.1).
The predicted values are computed using the R function fitted() that does not include the residual (observation-level) variance and re_formula = NA is declared to not include the random effect of participants ID.

Interaction Environment:Perception:Amplitude_st
The predicted Self-turn error mean according to Environment, Perception, and Amplitude of the passive rotation is presented in Figure 11. The predicted values at 90 degrees and 180 degrees of Amplitude are reported in Table 10.  Figure 11: Distributions of the predicted Self-turn error mean according to environment, perception, and amplitude (n participants = 48; n observations = 1723). To quantify the relative increase in the different conditions, the ratios of the Self-turn errors between 180-degrees and 90-degrees Amplitude are reported in Table 11.
In the reality environment, participants are expected to make greater Self-turn errors with increasing Amplitude only in the Proprioception condition. On the contrary, in the Vision and Vision+Proprioception conditions the Self-turn error remains the same. In the IVR environment, greater Self-turn errors with increasing Amplitude are expected in all conditions. However, the increase is smaller in the Vision+Proprioception condition and similar in the Proprioception and the Vision conditions. Below, Bayesian pairwise comparisons of interest (i.e. predicted score differences between conditions) are presented: • IVR_P_180-IVR_V_180. In the IVR environment considering 180 degrees rotation amplitude, participants are expected to make greater Self-turn errors in the Proprioception condition than in the Vision condition (Mean = 7.73 degrees; 95% BCI=2.63; 13.02).
• IVR_P_180-IVR_VP_180. In the IVR environment considering 180 degrees rotation amplitude, participants are expected to make greater Self-turn errors in the Proprioception condition than in the Vision+Proprioception condition (Mean = 17.87 degrees; 95% BCI=13.5; 22.82).
• IVR_V_180-IVR_VP_180. In the IVR environment considering 180 degrees rotation amplitude, participants are expected to make greater Self-turn errors in the Vision condition than in the Vi-sion+Proprioception condition (Mean = 10.14 degrees; 95% BCI=6.81; 13.73) • (IVR_VP_180-IVR_VP_90) -(R_VP_180-R_VP_90). The expected difference between the difference considering 180 and 90 degrees rotation in the IVR environment Vision+Proprioception condition and the difference considering 180 and 90 degrees rotation in the reality environment Vi-sion+Proprioception condition is 4.18 degrees (95% BCI=1.9; 6.56).

Interaction Perception:Delay
The predicted Self-turn error mean according to Perception and Delay is presented in Figure 12 and values are reported in Table 12.
# Marginalize over other conditions Interaction_2way <-post_data%>% group_by(Perception,Delay,iter)%>% summarize(value=mean(value)) Proprioception Vision Vision + Proprioception 10 20 30 Self-turn error Perception Delay 0 sec 3 sec 6 sec Figure 12: Distributions of the predicted means of Self-turn error according to perception and delay (n participants = 48; n observations = 1723).  Table 13. Only in Vision+Proprioception, participants are expected to make less Self-turn errors in the "0 sec" delay condition compared to the "3 sec" and "6 sec" delay conditions. In all the other conditions, the differences are less plausible as the range of most plausible values includes zero values. Note: n participants = 48; n observations = 1723 To quantify the relative difference between conditions, the ratios of the Self-turn errors are reported in Table 11.