Modeling inter-trial variability of pointing movements during visuomotor adaptation

Trial-to-trial variability during visuomotor adaptation is usually explained as the result of two different sources, planning noise and execution noise. The estimation of the underlying variance parameters from observations involving varying feedback conditions cannot be achieved by standard techniques (Kalman filter) because they do not account for recursive noise propagation in a closed-loop system. We therefore developed a method to compute the exact likelihood of the output of a time-discrete and linear adaptation system as has been used to model visuomotor adaptation (Smith et al. in PLoS Biol 4(6):e179, 2006), observed under closed-loop and error-clamp conditions. We identified the variance parameters by maximizing this likelihood and compared the model prediction of the time course of variance and autocovariance with empiric data. The observed increase in variability during the early training phase could not be explained by planning noise and execution noise with constant variances. Extending the model by signal-dependent components of either execution noise or planning noise showed that the observed temporal changes of the trial-to-trial variability can be modeled by signal-dependent planning noise rather than signal-dependent execution noise. Comparing the variance time course between different training schedules showed that the signal-dependent increase of planning variance was specific for the fast adapting mechanism, whereas the assumption of constant planning variance was sufficient for the slow adapting mechanisms.


Introduction
Programming visually guided movements requires associating visual errors with the appropriate motor corrections. For example, the visual representations of the distance and direction of a target from the hand can guide a pointing movement to the target. Random and unexpected errors of such movements can be corrected by visual feedback used either for driving a series of visually guided corrective movements or by online corrections integrated in the ongoing primary movement. In contrast, repeated occurrence of systematic errors gradually changes the relation between the visual stimulus and the feedforward component of the motor command. This gradual distortion of the mapping of the visual space onto the motor space is called motor adaptation and represents a basic form of motor learning. Motor adaptation was investigated in different motor modalities and error types, such as prism adaptation (Held and Schlank 1959), saccade adaptation (McLaughlin 1967), force field adaptation (Lackner and Dizio 1994;Shadmehr and Mussa-Ivaldi 1994), and adaptation to visuomotor rotation (Cunningham 1989).

Present state of modeling adaptation dynamics
Previous studies modeled force-field adaptation (Smith et al. 2006), saccade adaptation (Chen- Harris et al. 2008; Communicated by Benjamin Lindner. Ethier et al. 2008), and visuomotor adaptation (McDougle et al. 2015) by the linear dynamics of one or two adaptive memory states driven by the experienced errors. The two essential features of linear adaptation dynamics are the following. First, small errors induce small adaptive changes and larger errors induce larger adaptive changes. The ratio between the adaptive change and the error is called error sensitivity. Second, acquired adaptive changes decrease in the absence of errors, faster for large than for small errors. The fraction of the adaptation that is retained after a time step is called the retention rate. In the simplest case, with only a single memory state, these dynamics predict exponential time courses for learning and forgetting. The studies mentioned above showed that motor adaptation involves two components with different dynamics, a fast component with a large error sensitivity and a small retention rate, and a slow component with a small error sensitivity and a large retention rate (Fig. 1). This two-rate model (described in detail in "Appendix A1") is very successful in explaining an effect that occurs during error-clamp trials (i.e., when errors are clamped to zero by the experimenter) after a reversal training period. Under this condition, a spontaneous recovery of the adaptive changes toward the initial adaptation direction is observed. This so-called rebound effect was observed in force-field adaptation (Smith et al. 2006), in saccade adaptation , and in adaptations to visuomotor rotations (McDougle et al. 2015) and is adequately predicted by the two-rate model. This classical model of sensorimotor adaptation has two different sources of trial-to-trial variability: The first source is called execution noise, subsuming all motor effects on the initial pointing direction that are unrelated to the visually guided motor plan. The second source of trial-to-trial variability is additive noise on the internal memory states reflecting the variability of the fast and slowly adapting components of the visually guided motor plan. Therefore, we call Fig. 1 Structure of a classical multi-rate adaptive system, modeled as a time-discrete linear filter. The visuomotor distortion (u) is the input, and the pointing direction (d) is the output of this system. The two components of the vector x represent the slow and fast adapted memory states. These are equally weighted by the scalar product with the vector c = [1; 1] and added to the motor goal (g) to form the adapted visually guided pointing direction (vd). This internal motor plan is superimposed with other components (v) contributing to the executed pointing direction (d). These non-visual components include planned exploration, peripheral motor noise, and potential external mechani-cal perturbations. In the current experiment, they are not determined by the experimental stimulus of the experiment and are therefore subsumed under the term "execution noise" (v). The visual error (e) is the input to the adaptation dynamics modeled as a 2D state-space model with error sensitivities b = b s ;b f and error sensitivities A = a s 0 0 a f . The model is identical with the one proposed by (Smith et al. 2006) except the additional optional feature (for q > 0) that the execution noise v can partly be accounted for by the expected visual feedback this second noise component planning noise. The distinction was made originally by van Beers (2009) who investigated the trial-to-trial variability of repeated visually guided pointing movements. In his study, the autocorrelation (lag 1) of repeated pointing movements was close to zero and differed significantly from the negative autocorrelation expected if, at positive error sensitivities ( b in Fig. 1), the trial-to-trial noise were exclusively due to peripheral execution noise ( v in Fig. 1). In contrast, planning noise entering at the level of the internal memory states (noise vector w in Fig. 1) induces a positive autocorrelation. Qualitatively, these predictions follow directly from the fact that execution noise transfers with a negative sign to the next trial (because of the adaptive error correction), whereas the planning noise transfers with a positive sign to the next trial (since the retention factors A are positive). Therefore, van Beers (2009) could explain the missing negative autocorrelation by the superposition of execution noise and planning noise. These experiments did not involve adaptation to artificially altered visual feedback (u in Fig. 1), but the distinction between execution noise and planning noise was also made in modeling trial-to-trial variability in visuomotor adaptation (Zarahn et al. 2008;Albert and Shadmehr 2017). Under the assumption of linear adaptation dynamics, this seems a natural generalization since in linear systems the noise superimposes additively with the deterministic inputs and is therefore not expected to depend on additional input such as an externally applied visuomotor distortion. The model shown in Fig. 1 is most generally described by the following formulation: The detailed definitions of this system and of the involved variables are provided in "Appendix A1" (Eq. A16a-f). **Here we only briefly mention the most important ones: The system output y n denotes the relative pointing direction with respect to baseline and z n denotes the deterministic input dependent on either the visuomotor distortion ( u n ) in closed-loop trials or on the direction of the cursor movement ( h n ) in error-clamp trials. A linear combination of two noise signals ( v n , w n ) contaminates the update of the system states x n (Eq. 1a), whereas the output equation (Eq. 1b) is only affected by v n . The two noise signals were assumed to be independent: The variable n denotes the gain by which the execution noise v n is transferred to the error signal driving adaptation.
(1a) x n+1 = A n x n + b n z n + γ n v n + D n w n (1b) y n = c T x n + v n .
This gain differs between closed-loop trials and error-clamp trials and is directly related to the noise-compensation gain (0 ≤ q ≤ 1), i.e., the fraction of the execution noise that is accounted for in the expected feedback signal ( ĥ n ).

Research questions
The current study focusses on the identification of the variance parameters in visuomotor adaptation, a problem that is of major interest also for the analysis of movement pathologies involving increased motor variability (e.g., cerebellar diseases). Before addressing such advanced questions, it is essential to develop a model suitable to describe motor variability observed under varying feedback conditions in healthy subjects. Therefore, we investigated here whether trial-to-trial variability of the pointing direction during complex adaptation experiments involving error-clamp and closed-loop conditions can be explained similarly as during simple repetitive pointing by a superposition of constant components of execution noise and planning noise or whether more complex mechanisms are involved. This is known to occur in motor skill acquisition where motor variability increases during the early training phase (Cohen and Sternad 2009). Also in motor adaptation, it has been suggested that, similar to reinforcement learning, trial-to-trial motor variability may also reflect explorative behavior that supports learning (Dhawale et al. 2017). This notion has been supported by the observation that motor variability improves error-based force-field adaptation (Wu et al. 2014). From that point of view, one could also expect an increase of motor variability during the early phase of visuomotor adaptation. A related question is whether increased motor variability affects visuomotor adaptation. To our knowledge, these hypotheses have not been tested explicitly so far. Therefore, we developed two different models that could explain such an early increase of motor variability and test their explanatory power when fitted to experimental data. Both models extend the assumption of noise sources with constant variances by additional signal-dependent noise, i.e., white Gaussian noise with unity variance multiplied by an internally accessible signal and scaled by a constant coefficient of variation ( ). The first model implements a signal-dependent planning noise, which increases with the magnitude of the adaptive change. We assumed that such a strategic increase of explorative planning is most plausible for the fast adaptive process of the classical two-rate model since the fast process contributes mostly during the early training phase. Signal-dependent fast planning noise would therefore predict increased variability during the early training phase. The second model implements an execution noise that increases proportionally to the feedback error in the previous trial. This model would also predict an increase of the noise during the early adaptation phase when the feedback error is large. Both models differ in their prediction of how temporally increased planning or execution variance affects the inter-trial autocorrelation. Increased planning variance causes increased autocorrelation since it propagates positively to the next trial, whereas increased execution variability predicts the opposite. We will test these predictions here.
To compare these different noise models with experimental data, it is essential to correctly estimate not only the parameters determining the adaptation dynamics, i.e., the expected temporal evolution of the mean pointing direction for a given visuomotor distortion, but also the involved variance parameters. Our goal is to develop a maximum likelihood estimation method for all parameters of the system depicted in Fig. 1 observed under an arbitrary sequence of trial types involving closed-loop conditions and errorclamp conditions when the visual feedback is manipulated by the experimenter to become independent of the pointing direction.
The application of this method will not only allow the modeling of the variance/covariance structure of inter-trial noise in visuomotor adaptation to be refined, but it will also permit estimates of adaptation dynamics and variance parameters in individual subjects to be obtained with optimized precision. The best of the considered variance models will then be used to assess baseline data of the distribution of the estimated parameters across a healthy population.

Previous approaches of model identification and their problems in estimating the variance/ covariance structure of sensorimotor adaptation
In the past, methods of system identification were applied to identify the learning parameters (error sensitivities and retention rates) of linear adaptation models under closedloop conditions. Especially the so-called direct approach to closed-loop system identification (Ljung 1997) has been applied for model identification in sensorimotor learning (Cheng and Sabes 2006;Tanaka et al. 2012;Albert and Shadmehr 2017). This approach treats the sequence of measured feedback errors ( {e} N−2 0 ) as a deterministic input to the linear adaptation dynamics and the sequence of relative pointing directions {y} N−1 0 as its output: These equations express the input/output relation of the adaptive system and ignore the fact that, under closed-loop conditions, the error is directly related to the system states (3a) x n+1 = A x n + be n + w n (3b) y n = c T x n + v n .
(Eq. A10). Under the assumption that both the input e n and output y n are available to the observer, the direct approach identifies the parameters of Eq. (3a/b) rather than those of Eq. (1a/b). This is much easier because the noise terms of the update (Eq. 3a) and output (Eq. 3b) are independent of each other (Eq. 2), whereas this is not the case in the closed-loop formulation (Eq. 1a/b). This independence is a necessary precondition for the use of the standard Kalman observer (Kalman 1960). The details of the standard Kalman filter are described in "Appendix A2." For the direct approach, it is of major interest that the Kalman filter provides a highly efficient algorithm to compute the sequence of likelihoods L y n | | {y} n−1 0 , conditioned on the previous output observations {y} n−1 0 and the system parameters = x 0 , A, b, 2 , W, 0 . By multiplying these conditional likelihoods (see Eq. A24), one obtains the likelihood of the entire output sequence (0 ≤ n < N).
This likelihood can then be maximized for optimal estimation of the system's filter coefficients A and b . Notably, the likelihood maximized with this approach is conditioned on the observed sequence of feedback errors ( {e} N−2 0 ) and does not reflect the unconditional likelihood of the observed response with respect to the set of all possible responses for a given visuomotor distortion {u} N−2 0 and fixed system parameters ( ). For estimating the filter coefficients ( A, b ), this is appropriate because maximizing the likelihood (Eq. 4) provides the minimum variance estimate for A, b without bias.
In contrast, for estimating the variance parameters ( 2 , W ), the direct approach to closed-loop system identification induces systematic biases. This becomes obvious when considering the system in the absence of planning noise ( W = 0 ). In this case, the residual error is white Gaussian noise: r n ∼ N 0, 2 . In contrast, in the closed-loop system, the absence of planning noise will not lead to a white residual error, because the execution noise v n will not only contaminate the output y n but will also transfer with a negative gain to the state x n+1 and thereby also to the output y n+1 (Eq. 1a). Consequently, the residuals r n and r n+1 will negatively correlate with each other (van Beers 2009). Estimating the variance parameters ( 2 , W ) with the direct approach to closed-loop system identification causes the systematic error that the execution noise transferred to the internal memory states via feedback will be misattributed to planning noise. A second fundamental drawback of this method is that it does not account for the fact that the distribution of the observed residual ( {r} N−1 0 ) depends on whether the control loop is opened or closed, even with constant variance parameters ( 2 , W ). This is because the recursive transfer of both the execution noise and the planning noise onto the observed output y n differs between the error-clamp and closed-loop conditions as shown in Eq. A16a-f. The direct approach (Eq. 3) cannot correctly identify the variance parameters in the closedloop condition because it treats the input/output relation of the systems inside of the control loop as if this relation had been observed under open-loop conditions. Therefore, it will attribute all differences of the output and state variances between the error-clamp and closed-loop conditions to differences of the variance parameters ( 2 , W ). This is not correct because differences of the output variance occur between open-and closed-loop conditions also at constant 2 and W . Therefore, the classical direct approach to closedloop system identification is not suitable for estimating the variance parameters in experimental designs in which errorclamp and closed-loop conditions alternate such as in the classical experiments demonstrating the rebound effect.

Approach of the current study
Since the direct approach to closed-loop system identification, as described in the previous paragraph, does not allow the variance parameters in complex paradigms to be estimated correctly, we used the closed-loop formulation (Eq. 1a/b) throughout this study for model simulation, analytical computation of the expected model output and its variance/covariance structure, as well as for maximum-likelihood estimates of the model parameters. Even when the noise signals ( v n , w n ) were modeled as stationary processes, the model predicts that the variance and the autocovariance of the model output are not stationary as a consequence of opening the loop during the error-clamp trials. To test these predictions, we evaluated the empiric within-subject, trialto-trial variance/autocovariance of the relative pointing direction ( y n ) within a window moving along the trial sequence. The model prediction for these measures and their changes across the adaptation experiment was computed based on the full covariance matrix Y of the output vector of y = {y} N−1 0 of the system defined in Eq. (1a/b). The model predictions of the mean output y , the covariance matrix Y , and their dependence on the model parameters = x 0 , A, b, 2 , W, 0 are derived in "Appendix A3." In principle, y and Y could be used to compute the likelihood L y | | | of the observation for a given parameter set and thereby for implementing a maximum-likelihood estimator for the parameters (Eq. A30). However, this method is computationally costly because it involves numerical inversion of the large matrix Y , which has the square dimension of the numbers of trials (220 in our experiment). This is practically important because numerical maximization of the likelihood requires L y | | | to be evaluated very often for each parameter fit. We also wanted to obtain an approximation of the expected within-subject distribution of the fitted parameters by parametric bootstrapping. For that purpose, it is necessary to first simulate many model responses for a given parameter set and then to repeat the fitting procedure on each of those. In such a procedure, the numerical efficiency in computing the likelihood L y | | | is highly important.
To provide an efficient method for computing the likelihood L y | | | according to Eq. 4, we developed a generalized version of the standard Kalman observer, which allows the series of conditional likelihoods (the right side of Eq. 4) for the general closed-loop formulation of the system (Eq. 1a/b) to be computed and does not rely on the assumption of the independence of variance components of the update equation and the output equation as the standard Kalman observer does. The details of this generalized Kalman observer are presented in "Appendix A4." A further advantage of the method we propose here is the possibility to correctly deal with missing observations that cause a problem in the traditional "direct approach to closed-loop system identification" because the Kalman filter can deal with missing observations only if its deterministic input is known. Consequently, treating the error signal as deterministic input causes the problem that the error depends on the missing observations. In contrast, in our approach, the deterministic input to the system is not the error but the visual distortion, which is always known, even for trials with missing observations. The technical details of how our algorithm deals with missing observations are explained in "Appendices A3 and A4." We applied this new parameter estimation method to experimental data acquired with healthy volunteers performing two different variants of a manual reach adaptation experiment with visuomotor rotation. To achieve a more solid empirical basis, we included subjects in a wide age range to test for potential age effects on adaptation dynamics and on inter-trial variability. Pointing to a visual target was performed without vision of the hand under visual feedback of a cursor movement. The training protocol involved a block-wise sequence of baseline-, training-, washout-, and error-clamp trials where the first three blocks were performed under closed-loop conditions (cursor direction controlled by the subject). This training protocol was applied because it involves the rebound effect, which depends on the difference in the retention rates between fast and slow processes. Therefore, this protocol is most suitable to estimate the adaptation dynamics of both processes simultaneously in individual subjects. In the final error-clamp trials, the cursor movement was directed straight to the target (independent of the movement direction of the hand). To quantify the adaptive change, we evaluated for each trial the relative initial pointing direction (see "Appendix A1," definition of d n and y n ) expressed with respect to its (target-specific) value during the non-adapted state (baseline). The two variants of the adaptation tasks differed in the perturbation schedule. The first introduced the visuomotor distortion gradually and did not expose the subjects to very large errors. In the second experiment, a 45 deg rotation was introduced abruptly and provoked large errors during the early adaptation phase. The comparison between abrupt and gradual training protocols was done because differences in the adaptation dynamics (error sensitivities and retention rates) between these protocols would point either to nonlinear effects of error size (Criscimagna-Hemminger et al. 2010) or to a non-stationarity of the system (Turnham et al. 2012;Herzfeld et al. 2014).

Subjects
Forty-nine healthy subjects were examined partly at the Centre for Vision Research, York University, Toronto, CA, and partly at the University Hospital LMU, Munich, GE. The experimental setup, the protocol, and the task were identical at both locations. The used hardware differed in only minor details. The age distribution involved two age groups with 23 subjects below 30 years (mean ± sd = 20.8 ± 1.4 years) and 20 above 50 years (62.5 ± 8.4 years). Only 6 subjects were between 30 and 50 years old. All subjects performed the task with the dominant hand (Oldfield 1971). All except two subjects were right dominant. None of them had any history of movement disorders or neurological disease.

Apparatus and setup
The experiment setup was in a semi-dark room, with a height-adjustable chair so that the subjects could sit comfortably while facing the apparatus. Subjects performed pointing movements without direct visual feedback of the hand. Hand movements were recorded with a writing tablet (WACOM Cintiq 21UX, width × height = 43.2 cm × 32 .4 cm). Vision of the hand and of the arm was prevented by a reflective surface mounted horizontally and vertically centered between the surface of the writing tablet and an LCD-screen (HPL2245wg, 22″, 60 Hz) oriented downward. Subjects viewed the image on the monitor by viewing from above to a reflective surface. The reflecting surface was parallel to both monitor and tablet, so that the virtual images of the targets appeared on the plane of the writing tablet. The starting position of the movements was indicated by a green circle (diameter: 1 cm) horizontally centered and at about 15 cm from the subject (Fig. 2). The target was indicated by a blue circle (diameter: 1 cm) and was located at ± 25 or ± 35 deg to the right or to the left of the midline and at a distance of 12 cm from the starting position. Visual feedback of the hand movement was provided only by a cursor (yellow circle, diameter: 1 cm) the distance of which from the (virtual) starting point was always the same as that of the pen. Artificial distortions of the visual feedback were induced by rotating the cursor around the start position. The visuomotor rotation angle ( r ) was specified with respect to the pointing direction (defined as the interconnecting line between pen and the start position). Under closed-loop conditions, the visual feedback of the hand direction was directly controlled by the subject's action since the direction of the cursor was at any time identical to the sum of the visuomotor rotation and the direction of the hand. In contrast, under error-clamp conditions, the cursor always moved on the straight line between the starting point and the target. The distances of the hand and of the cursor from the starting point were still identical. Thus, no visual feedback of pointing direction was available. The cursor movement direction became independent of the subject's action. Therefore, in the context of the Visuomotor reach training task with rotated cursor (yellow circle). Reach targets (blue circles) were located at ± 25 and ± 35 deg from the midline. Subjects point toward the target without visual feedback of the hand. In each training block, only the two targets on the one side of the midline were shown. The distance of the cursor from the start position (green circle) was always identical to the distance of the pen from the starting point. Visuomotor distortions were induced by rotating the yellow cursor against the pointing direction of the hand around the start position (color figure online) model ( Fig. 1), the feedback loop is opened during errorclamp condition.
The pen position was acquired by a custom C-program that communicated with the tablet driver, which provided event-based position signals with variable sampling intervals. The acquisition process transferred these data online into a shared memory buffer. In this way, the MATLAB process controlling the graphics and running synchronized with the 60-Hz frame rate of the graphics card could access the actual pen position from the shared memory even though the recording of the tablet signals and the graphics output were running asynchronously. During the hand motions, the average sampling rate of the pen position signal was 136 Hz.

Task and procedure
At the beginning of each trial, subjects moved the cursor onto the starting point and waited until the target appeared. They were instructed to move the cursor immediately after target onset toward the target. During the practice trials, they learned by verbal instruction from the experimenter to achieve three features of the cursor movements: (1) the cursor should move in a straight line to the target, (2) the movement duration should be about one second, and (3) the movements should be initiated with a clear rapid acceleration. Care was taken that these three movement features were learned in practice trials under closed-loop conditions with zero distortion before the start of the experiment. Participants were also informed that at some point during the experiment the cursor might move in an unexpected way and that their task was to maintain or restore these three features of the cursor movements as well as possible. These instructions were crucial to provide the subjects with a clear adaptation goal and to ensure that the initial movement direction could always be clearly evaluated.
The trial started with the appearance of the blue target and the simultaneous disappearance of the green starting point, which provided the go-signal for the movement. The target disappeared when the cursor stopped at the target. Immediately afterward, the green starting point reappeared, and the subject had to move the cursor back to the starting position. The next trial started when the cursor stopped at the starting point. The stopping criterion for the cursor was that it reached a distance of less than 0.5 cm from the target or the starting point at a velocity of less than 0.5 cm/s. During closed-loop trials, the cursor was permanently visible with the same visuomotor rotation. During error-clamp trials, the cursor disappeared at the end of the outward movement simultaneously with the target. The return to the starting point was performed without visual feedback until the cursor reappeared when the y-component of the distance between the pen and the start position became smaller than 2 cm.

Experimental design
After the practice trials, the experiment started with the recording of the baseline pointing directions for each target. Subjects performed 40 trials under closed-loop conditions with zero visuomotor rotation and 40 trials under errorclamp conditions. During these baseline recordings, all four target positions were presented in pseudorandom order with balanced counts (10 presentations per target and condition).
After the baseline recordings, two blocks with 220 trials each followed. Each of these blocks was composed of an additional 40 closed-loop trials with zero visuomotor rotation, followed by a training section with 120 trials under closed-loop conditions with nonzero visuomotor rotation, 20 closed-loop trials with zero rotation (washout) and finally 40 error-clamp trials. During these 220 trials, only two of the four target positions, either on the right or on the left, were presented in randomized and balanced order. The two targets directions differed by 10 deg and occurred both with the same frequency. The randomization implied that the frequencies of the inter-presentation intervals were similar for each target and that the probability of increasing intervals decreased in an exponential fashion. Intervals 1 and 2 accounted for about 75% of all cases. The two blocks were separated by a short pause of a few minutes.
The two blocks differed only with respect to the learning sections. In the abrupt training, a visuomotor rotation of 45 deg was applied for all 120 trials of the learning section. In the gradual training, the visuomotor rotation increased during the first 60 trials of the learning section linearly from 0 to 45 deg (rate: 0.75 deg per trial) and remained constant at 45 deg for the remaining 60 trials of the learning section. Each subject performed one of four possible experimental protocols in which the first block comprised either an abrupt or a gradual training section with visuomotor rotation either to the right or to the left. Only the two targets on the left side (target directions − 25, − 35 deg) were presented during blocks with clockwise visuomotor rotation, and only the targets on the right side (+ 25, + 35 deg) during blocks with counterclockwise visuomotor rotation.

Data analysis
For each trial, the outward movement was automatically detected based on the largest peak of the tangential hand velocity. The start and the end of the movement were defined as the time when the hand tangential velocity increased above or fell below 10% of the peak velocity. The initial movement direction was defined as the direction of the line connecting the start point of the movement with the intersection of the movement trajectory and a circle with a radius of 3 cm around the start point. In our data set, the first 3 cm of the trajectory was completed after 224 ± 53 ms, which is shorter than the reaction time (275 ms) of manual pointing movements toward (non-cued) visual targets (Barthélémy and Boulinguez 2002). Therefore, this initial movement direction was taken as an estimate of the movement direction of the feedforward component of the motor command and is called the pointing direction ( d n ) hereafter for brevity. The pointing direction was defined with respect to the straight-ahead direction. Positive values indicate rightward directions. The movement was marked as invalid if the peak tangential velocity stayed below 8 cm/s, if the distance between the green start marker and the start point of the movement was larger than 2.5 cm, or if the distance between the green start marker and the movement end point was smaller than 6 cm. From the pointing direction of each movement, the relative pointing direction ( y n ) was computed by subtracting the mean pointing direction averaged across the 10 error-clamp trials of the baseline recording with the same target position. The movement n was also marked as invalid if the relative pointing direction y n was classified as an outlier within the sample y| n+4 n−4 . The outlier criterion was fulfilled if the distance of y n from the median was larger than 4 times the median-quartile distance. (For normal distributions, this criterion corresponds to an outlier fraction of 3%.) On average, 9% of the movements were marked as invalid.
The parameters of the two-rate model of Smith et al. (2006) were fitted to the observed relative pointing directions by maximizing the likelihood of the observed relative pointing directions. These fits were performed separately for each subject and for each training condition (gradual/ abrupt). The details of this procedure are described in "Appendices 4/5/6." In all models, the two components of the planning noise were assumed to be independent The investigated models included not only constant variances of execution noise ( 2 ) and planning noise ( W ) but also additional signal-dependent noise components characterized by their coefficients of variation (planning noise: = s ; f , execution noise: e ). The mathematical details of these signal-dependent noise components are described in "Appendix A5." Model extensions with signal-dependent execution noise were previously proposed (Albert and Shadmehr 2017;Cheng and Sabes 2006;Wolpert 1998, 2006) but not modeled quantitatively to explain trialto-trial variability in visuomotor adaptation. A possible reason is that signal-dependent execution noise was primarily introduced to describe the increase of force variability with increasing mean muscle force, whereas visuomotor adaptation does not adapt muscle force but movement direction.
Fitting the adaptation time courses of many individuals raises the question to what extent the residuals of theses fits were due to systematic or random errors of the model. To address this question, we submitted the residuals to a standard repeated measures ANOVA with one factor (trial number) and computed the variance components of the two random effects (subject and the interaction subject*trial), as well as the variance of the fixed effect (trial number). This analysis was performed with the MATLAB-function "anovan.m" (The MathWorks, Inc. Version 2017b).
In our study, we compared four different models of intertrial noise. The first model (M1) included only constant execution noise and no planning noise. The second model (M2) included constant execution noise and constant planning noise and is identical to the one used by Albert and Shadmehr (2017). Furthermore, we extended previous variance models by two different types of noise increase, which could possibly account for the increase of motor variability that is generally observed in early motor-skill acquisition and was quantified by the so-called N-Costs (Cohen and Sternad 2009). One possibility to explain such an increase is an increase of execution noise with increasing feedback error as modeled by Eq. (A46). The strength of this errordependent execution noise is characterized by the additional variance parameter e (M3). An alternative explanation of N-Costs is an increase of planning variance during early training stages, which is proportional to the size of the adaptive change (M4). Such a signal-dependent planning noise is characterized by two coefficients of variation = [ s ; f ] , one for each planning state (see Eq. A49). Such a mechanism implies that the precision of visuomotor planning decreases with the magnitude of the adaptive change. The "Appendix A5" shows that both types of signal-dependent noise can smoothly be embedded in our approach of a maximumlikelihood estimation of the closed-loop system. The tested variance models differed in the constraints imposed on these parameters as shown in Table 1.
Model comparisons were performed using the Akaike information criterion (Akaike 1974) evaluated as where N p denotes the number of fitted model parameters. The difference ΔAIC i,k = AIC i − AIC k between two models k and i fitted to the same data set is an unbiased estimator of how much larger the expected likelihood of the observed data is under the assumptions of model k than under that of model i. Since ΔAIC i,k measures the relative likelihood on a logarithmic scale, positive ΔAIC i,k indicates that model k is preferable to model i.
To characterize the distribution of fitted model parameters or AIC-differences across the population, we report here in general median [interquartile range (iqr)] because none of these parameters was normally distributed. This applies in particular to the slow retention rates, which were close to the upper limit one and to the variances of the slow planning noise, which were close to zero. AIC-differences frequently showed skew and long-tailed distributions. Therefore, group comparisons between younger and elderly subjects were performed using a multivariate generalization of the two-sample Wilcoxon-Mann-Whitney test of Oja and Randles (2004).
The exact version of this multivariate rank-sum test is based on the distribution of its test statistic across all possible N! permutations of the N subjects. We approximated this test by evaluating the fraction of false positives from a sample of 500,000 random permutations. Multivariate paired comparisons of the adaptation dynamics (represented by the error sensitivities and retention factors) between training conditions were performed with the Oja and Randles multivariate nonparametric sign test. Univariate group comparisons were performed with the Mann-Whitney test and paired univariate comparisons (e.g., on ΔAIC ) with the Wilcoxon signedrank test.

Results
In the following, we will first compare the different models in their ability to mimic the adaptation dynamics and the inter-trial variance/covariance structure. After identifying the best variance model, in the last three sections of the Results, it will be used to assess the distribution of the estimated parameters across our subject group. Finally, evaluation of the within-subjects precision of our estimates will show how strongly it affects the between-subjects distribution.
The red line in Fig. 3 shows the mean expected relative pointing direction ( y n ) as predicted by the model with signal-dependent planning noise (M4). The line shows the average of y n across all individual fits. At the end of the training phase, subjects compensated for 90% (gradual: 40.52 ± 5.15 deg; abrupt: 40.29 ± 4.21 deg) of the adaptation requirement (45 deg). The difference in this final adaptive state between the training conditions (gradual/abrupt) was not significant (paired t-test: T(48) = 0.29; p = 0.77). As in previous studies, the model successfully explained the rebound effect in the final error-clamp trials. We also evaluated the maximum differences of the population mean of y n predicted by model M4 and those predicted by the other models across the entire time course (0n < N). All of these three differences stayed below 1.5 deg for both gradual and abrupt training and are hardly resolvable in Fig. 3. Thus, the mean time course of the adaptation predicted by the four different models did not differ systematically. This is not surprising since the expected relative pointing directions ( y n ) were identical across all models for any given values of their learning parameters ( x 0 , A, b ) and did not depend on their variance parameters ( q, 2 , W, 0 , s , f , e ).
T h e i n d i v i d u a l m e a n s q u a r e d r e s i d u a l s � 1 N ∑ N−1 n=0 � y n − y n � 2 � differed systematically between models (Friedman ANOVA: ch 2 (3) = 26.95; p < 0.0001). The meansquared residuals (Table 2, last column) were smaller for the model with constant execution noise only (M1: median = 29.36 deg 2 ) than for the three models with more complex variance models (median = 31.85 deg 2 ). The underlying reason is that the model with execution noise only predicts a sequence of residuals that resembles white noise more closely than the residuals explained by the more complex variance models. Consequently, minimizing the likelihood under the assumptions of model M1 leads to smaller residuals than under the assumptions of the other models. 1 Figure 3 shows also that the model predicted the expected adaptation dynamics very well. To investigate to what extent the residuals of this fit were due to systematic or random errors of the model, we applied a standard repeated measures ANOVA with one factor (trial) to the residuals of all subjects. The estimated variance component of the random 1 Minimizing the mean-squared residual and minimizing the likelihood are identical for a variance model predicting white and Gaussian residuals.
1 3 interaction (subject*trial: 32.2 deg 2 ) was much larger than the variance of the fixed effect (trial: 0.43 deg 2 ). The variance component of the random factor subject was also small (0.68 deg 2 ). Thus, the great majority of the residual errors between model and data reflected random errors due to intertrial noise, not systematic errors. This is important because it further confirms that likelihood differences between the different models do not reflect differences of the predicted mean adaptation time course but differences in the predicted noise distributions.

Comparison between different variance models
Out of all the investigated four models (see Table 1), the model with signal-dependent planning noise (M4) showed the best performance as indicated by the smallest AICvalues (Fig. 4). The median of the AIC-differences of both ΔAIC 1,4 (gradual: 10.3; abrupt: 7.4) and ΔAIC 2,4 (gradual: 4.7; abrupt: 3.7) was positive (> 2), showing that the models with constant noise sources (M1 & M2) were not or considerably less (Burnham and Anderson 2002) supported by the data when compared with the model with signal-dependent planning noise (M4). The comparison with the model with error-dependent execution noise was less clear since the median ΔAIC 3,4 (gradual: 2.3; abrupt: 2.2) was smaller but still positive. These results show that, during adaptation to visuomotor rotation, the assumption of constant execution noise and constant planning noise was clearly rejected by the data. The AIC analysis did not allow a clear distinction between error-dependent execution noise (M3) and signaldependent planning noise. We also evaluated the AIC-difference ΔAIC 1,2 the median of which was also clearly positive (gradual: 2.8; abrupt: 2.3). The fitted variance parameters are shown in Table 2. Compared to the model with pure constant execution noise (M1), the execution variance 2 decreased with increasing influence of other noise sources (M2, M3, M4). The constant component of the planning variance w 2 ss ( = w 2 ff = 1.77 , M2) did not change when the model with constant planning and constant execution noise (M2) was extended by the errordependent execution noise (M3). In contrast, when model 2 was extended by signal-dependent fast planning noise (M4) the constant components of planning noise w 2 ss ( = w 2 ff = 1.03 , M4) decreased in favor of a signal-dependent fast planning noise with a coefficient of variation of f =0.29. Consequently, in the best model (M4), the mean ( w 2 ff = 7.09 deg 2 ) of the fast planning variance averaged across all trials was almost 7 times larger than the slow (constant) planning variance ( w 2 ss ). This shows how the relative contributions of execution noise, fast planning noise, and slow planning noise to the entire structure of inter-trial variance depend critically on the assumptions of the underlying variance model: Considering constant planning noise only (M2), previous studies (Albert and Shadmehr 2017) did not observe evidence for differences between slow and fast planning noise. In contrast, including signal-dependent planning noise, we showed that fast planning noise was much larger, but this difference was specific to the signal-dependent components and not the constant components of planning noise. This difference seems to be implied by the fact that all our three models including planning noise (M2, M3, M4) were fitted under the constraints w 2 ss = w 2 ff and s = 0 . However, these constraints were not arbitrary but resulted from the analysis of two further models: The first one (M5) was identical to the model with constant planning noise (M2) but allowed the (constant) variances for the fast and for the slow state to differ ( w 2 ss ≠ w 2 ff , N p = 9 ). We found that in that case w 2 ff (median [iqr] = 6.50 [7.48] deg 2 ) was larger than w 2 ss (1.23 [ 2.26] deg 2 ). However, the AIC-difference between this less constrained model and  Albert and Shadmehr (2017), suggesting that the constant components of planning noise did not differ strongly between fast and slow planning processes. It also provides a first justification for our constraint w 2 ss = w 2 ff . The second additional model (M6) was identical with our model M4 but included unequal constant planning noise components ( w 2 ss ≠ w 2 ff ) and slow signal-dependent planning noise s > 0 ( N p = 11 ) . Therefore, omitting the two constraints w 2 ss = w 2 ff and s = 0 did not improve the perfor-mance. These results suggest that a signal-dependent variance of the fast planning process is the most efficient of the considered extensions of previous variance models.

Changes of the inter-trial variance and autocovariance of the pointing direction during adaptation
The model comparison based on AIC-differences revealed that the inter-trial variability of the pointing direction was best described by constant execution variance, signaldependent fast planning variance, and constant slow planning variance (M4). Even though each of these three noise sources was modeled as white noise, their recursive processing in the closed-loop system (Fig. 1) results in a non-white and time-dependent noise of the observed pointing direction.
To understand better why model M4 provided the best match to the probability distribution of the observed inter-trial noise, we estimated the time course of its inter-trial variance ( var_r n ) and autocovariance ( acov_r n (Δ = 1) ) directly from the observed residuals. These two parameters were estimated in a moving window of length 21 centered around the trial n and compared with the expectation predicted by the different models (see "Appendix A3"). The results are shown in Fig. 5. During the closed-loop condition, the model with constant planning noise (M2, magenta) predicts small positive autocovariance (Fig. 5c, d) and fits the observed data in that respect better than the negative autocovariance predicted by the model without planning noise (M1, cyan). But both of these models with constant noise sources only do not explain the prominent temporal changes of the noise during the training phase (Fig. 5 a, b, d). The measured data show a slow increase of the variance with progressing gradual training (Fig. 5a) and a fast increase of the variance and the autocovariance immediately at the beginning of the abrupt training (at trial 40, Fig. 5b, d). In approximate agreement with the data, both model M4 and M3 predicted a gradual variance increase of the gradual training (Fig. 5a) and an abrupt variance increase of the abrupt training (Fig. 5b). The later was overestimated by model M3 (blue line in Fig. 5b,  trial40). Both models differed in their predictions of the change of the autocovariance at the onset of the abrupt training onset. The model with signal-dependent planning noise (M4) correctly explains the sudden increase of the autocovariance (Fig. 5d, trial40), whereas the model with errordependent execution noise (M3) makes here the opposite, incorrect prediction. This is because planning noise ( w n ) at trial n enters into the pointing direction y n+1 of the following trial with a positive sign and execution noise ( v n ) with a negative sign (Fig. 1). Model M4 also outperforms model M3 in that it does not predict a strong variance increase at the beginning of the washout phase (at trial 160, Fig. 5a, b). This difference between the models reflect the fact that the increase of the feedback error is larger than that of the fast state. This can be seen, for example, in the gradual training (Fig. 3a) where the absolute feedback error changed by 130% from 5.24 deg at trial 160-12.05 deg at trial 165, whereas the fast planning state changed only by 57% (from 6.98 to 10.98 deg) in the same interval.
Opening the loop (at trial 180) induced an increase of the observed autocovariance in the gradual training (Fig. 5c) that was less pronounced as predicted by all models. The differences of the observed time courses (black) between the training conditions (gradual/abrupt) were explained by the signal-dependent planning noise of model M4, even though its variance parameters ( q , 2 , w 2 ss , f ) did not show such a difference (see above). Because model M4 was the best one considered, all population statistics of the fitted model parameters reported in the following will refer to the parameters of model M4.

The distribution of the noise parameters across subjects
The distribution of the noise parameters (M4) is shown in Fig. 6. The execution noise correlated negatively with the mean fast planning variance (Spearman's ρ( 2 , w 2 ff ) = − 0.42; p < 0.01) and with the coefficient of variation of the fast planning noise (ρ( 2 , f ) = − 0.36; p < 0.02). Execution noise and constant planning noise did not correlate significantly with each other. The fitted noise parameters ( q , 2 , w 2 ss , f ) of model M4 did not differ between the training conditions (gradual/abrupt) (Oja & Randles multivariate sign test: chi2(4) = 7.71; p = 0.10). None of the four univariate comparisons of these variance parameters between gradual and abrupt training was significant (Wilcoxon signed-rank: p > 0.09). To test for potential effects of age on the trial-to-trial variability, we compared these four variance parameters between subjects younger than 30 (N = 23) and older than 50 (N = 20). Differences of q, 2 , w 2 ss , f between the age groups did not reach significance (Oja & Randles multivariate rank sum test: p > 0.1) for any training condition (gradual/abrupt).  5 Temporal evolution of the trial-to-trial variability of the residual relative pointing direction. Lines show the variance (a, b) and the autocovariance (lag one) (c, d) of the residual evaluated in a running window of length N = 21 centered around the trial number (x-axis) for the gradual (a, c) and the abrupt (b, d) training blocks. The measured data (black) and the model predictions (colored) were evaluated separately in individuals (Eq. A25) and then averaged across subjects. The gray shaded area shows the 95% confidence interval of the observed mean (black). The models with constant noise sources only (model M1, M2) do not explain the increase of the variance during the training period. Signal-dependent planning noise (model M4) matches the observed data better than error-dependent execution noise (model M3) Table 3 shows the median and interquartile ranges of the distribution of error sensitivities and retention rates across all subjects (M4). The distribution of a s , a f , b s , b f differed significantly between the training conditions (Oja & Randles multivariate sign test: chi2(4) = 15.06; p = 0.0046). This effect was due to a highly significant (Wilcoxon signed-rank test: p = 0.0001) decrease of the fast error sensitivity in the abrupt ( b f = 0.17) compared to the gradual training condition ( b f = 0.26). The fitted learning parameters of model M4 were also compared between subjects younger than 30 (N = 23) and older than 50 (N = 20). Differences of a s , a f , b s , b f between these age groups did not reach significance (Oja & Randles multivariate rank sum test: p > 0.15) for any training condition (gradual/abrupt). Together with the absence of age effects on the variance parameters (see above), this result shows that, in our subject group, age did not systematically affect adaptation dynamics or trial-to-trial variability. Therefore, pooling across all subjects for the purposes of model selection was justified. The decrease of the fast error sensitivity in the abrupt training compared to the gradual training manifests in the downward step of the fast state at the beginning of the washout period (trial 160), which was larger in the gradual (Fig. 3a, magenta) than in the stepwise training (Fig. 3b, magenta). As a consequence of this effect, the average speed of the deadaptation between trials 160 and 163 was faster in the gradual ( Across the population, the total adaptive change at the end of the training period ( y n | |n=159 ) did not depend on the mean-squared residual error (Spearman's rank correlation coefficient: gradual: = 0.13; p = 0.38; abrupt: − =0.17; p = 0.25), the average of the planning variance across trials ( w 2 ff + w 2 ss , | |<0.12; p > 0.3), or the average of fast planning variance ( w 2 ff , | |<0.4; p > 0.7). Thus, the overall adaptation performance was not related to the trial-to-trial variability.

Observability of the learning dynamics and variance parameters
The between-subjects variance is the sum of the variance due to real differences between individuals and the variance due to the trial-to-trial noise, which limits the precision of the parameter estimation in each subject. Therefore, we asked for the relative contribution of the within-subject variance to the between-subject variance. Based on the parameters individually fitted with the model, we simulated for each subject and for each gradual and abrupt training experiments 600 pointing sequences. The population medians and iqrs of the used learning parameters are shown in Table 3 and the variance parameters in Table 2 (model M4). Model M4 was then fitted to each of these 49 × 2 × 600 = 58,800 simulated data. The variance of the resulting model parameters across these fits reflects the estimation noise of our method. These within-subject variances of the learning parameters were pooled across all subjects and then compared with the experimentally observed between-subjects variance ( Table 4). The relative contribution of estimation noise to the total between-subjects variability was considerable (between 20% for a f and 46% a s ). The estimation noise was also larger Fig. 6 The distribution of the variance parameters of the model with signal-dependent fast planning noise (M4) across the 49 subjects (crosses). Data show the average between gradual and abrupt training conditions. According to this model, the trial-to-trial variability of each subject is characterized by three variances. The execution variance ( 2 ) and the average of the signal-dependent fast planning noise across all trials ( w 2 ff ) have similar sizes, whereas the constant slow planning variance ( w 2 ss ) is clearly smaller. The heat map shows a smoothed density estimation Table 3 Median [iqr] of the retention rates ( a s , a f ) and error sensitivities b s , b f across all 49 subjects (model M4) for the gradual and the abrupt training conditions. The difference of the fast error sensitivity between the conditions (third row, column b f ) was negative (Wilcoxon signed-rank test; row four), indicating that the fast error sensitivity was larger in the gradual than in the abrupt condition in the parameters of the fast adapting process ( a f , b f ) than in the slow adapting process ( a s , b s ). Nevertheless, the difference of the population variance between the fast and the slow processes is not explained by the inferior observability of the fast process only because the relative contribution of the estimation noise to the total between-subjects noise was even smaller in the fast than in the slow process (Table 4, rows within/between). Thus, the large differences of the fast learning parameters between individuals reflect behavioral differences. The same model fits to simulated data were also used to investigate how the estimation of the variance parameters of the two-rate adaptation model of Smith et al. (2006) depends on the choice of the variance model. Of practical interest is the absolute size of the estimation errors induced by choosing variance models that do not correctly fit to the statistics of the observations. We included in this analysis also the standard regression model, which assumes white Gaussian noise on the observed pointing directions, and for which the method of least squares (LSQ) is identical with maximizing the likelihood of the observation. The standard regression is a special case of the general formulation of Eq. (A16a-f), characterized by the absence of planning noise ( w n = _0 ), and no noise transfer from the motor output to the error driving adaptation ( = 0 ) in closed-loop and error-clamp trials. Table 5 shows that the errors were smallest for the model M4 that fully accounted for the variance structure of the simulated data. The two models ignoring planning noise (LSQ, M1) cause errors in the estimation of the variance of the execution noise ( 2 ) of up to 10 deg 2 . The model assuming constant variance of execution and planning noise (M2) showed errors that were larger ( q : 115%, 2 : 61%, w 2 ss : 147%, w 2 ff : 144%) than the errors of model M4. This demonstrates that the estimation of the variance parameters depends critically on the completeness of the model. Making a wrong assumption about only one parameter such as the assumed constancy of the fast planning noise (M2) affects all other variance estimates.
The median absolute errors shown in Table 5 include both systematic and random estimation errors. To evaluate the systematic errors, we computed the medians of the parameter estimates across the 600 simulations separately for each of the 2 × 49 data sets and compared the median of these estimates (Fig. 7, bars) with the median of the "true" variance parameters (Fig. 7, dashed horizontal lines) used to create these 49 × 2 data sets. The noise-compensation gain q and the execution variance 2 were overestimated by model M1, M2 (Fig. 7a, b). Similar overestimation of 2 was also obtained by the LSQ-fit. Both models including constant planning variance ( w 2 ss ) showed systematic differences between the estimated and the "true" value: M2 overestimated and M4 underestimated w ss (Fig. 7c). A stronger discrepancy between the models M2 and M4 occurred in the mean fast planning variance ( w 2 ff ), which was more underestimated by the model with constant planning variance (M2) than by model M4 (Fig. 7d). This reflects the fact that model M4 could appropriately mimic the time dependency of the fast planning noise with its coefficient of variation f of 0.29 (Fig. 7e).
The uncertainty of the parameter estimation due to the trial-to-trial variability was quantified by the median-quartile distance of the estimate distribution within the 600 fits to the simulations with the same "true" parameter set (whiskers in Fig. 7). Compared to the other models, M4 showed a smaller interquartile range of this distribution for the variance of the slow planning noise ( w ss , Fig. 7c) only. For the variance of the execution noise, the estimation noise was similar for all Table 4 The variances of the retention rates ( a s , a f ) and error sensitivities ( b s , b f ), fitted with model M4. The variance across the population (between variance) is compared with the variance and across simulated pointing sequences (within variance) for both gradual and abrupt training conditions. The row labeled "within/between (%)" shows the relative contribution of the estimation noise to the total observed variance across the population in percent. The variances of the parameters of the fast process are larger than that of the slow process. In contrast, relative contributions of the within-subject noise were smaller in the fast than in the slow process   Fig. 8b). This shows that smaller systematic errors rather than smaller random errors of model M4 were mainly responsible for the better performance of model M4 concerning the absolute errors (Table 5).
Fitting model M4 many times to data sets simulated with the same underlying parameter set also allowed the statistic dependence between the different variance parameters to be investigated on the level of the estimation noise. Correlating the execution noise with the mean fast planning variance across the 600 simulations within each of the 49 × 2 data sets showed a Spearman's rank correlation coefficient with a median of ρ( 2 , w 2 ff ) = − 0.57 [0.27] (N = 98). The median and the interquartile range of the Spearman's correlation between execution noise and the coefficient of variation were ρ( 2 , f ) = − 0.33 [0.17]). That these correlations did not differ from the ones observed across the population as reported above (ρ( 2 , w 2 ff ) = − 0.42; ρ( 2 , f ) = − 0.36) suggests that the negative correlation between execution noise and fast planning noise reflects a feature of the estimation noise rather than a feature of the parameter distribution across subjects.

Discussion
We developed and applied a maximum-likelihood estimation method for a closed-loop linear adaptation model including two different types of signal-dependent noise. The model comparison confirmed previous results (van Beers 2009) that a combination of execution noise and planning noise accounted better for the nonnegative autocovariance (lag one) of the pointing direction ( ΔAIC 1,2 > 0 ) than execution noise alone. Extending previous studies, we showed that the assumption of constant execution noise and constant planning noise is not fulfilled during visuomotor adaptation.
A model with constant execution variance, equal constant planning variance for both slow and fast adapting planning states, and additional signal-dependent fast planning noise (M4) showed smaller AIC-values (Fig. 4) and matched the time course of the variance and of the autocovariance of the within-subject (trial-to-trial) noise better than all other models (Fig. 5). Even though the model with error-dependent execution noise (M3) could not clearly be rejected in favor of signal-dependent planning noise (M4) based on AIC alone, M4 was strongly supported by the increase of the autocovariance (lag 1) that occurred simultaneously with the fast increase of the output variance (Fig. 5b, d, trial 40). Increased execution noise would have had just the opposite effect on the autocovariance.
Throughout this study, we simplified modeling by assuming that errors occurring at a particular movement direction contribute equally to the adaptation of movements of the other direction. This is an oversimplification since it is known that visuomotor adaptation of a single target direction generalizes by only about 70% to target directions that differ by 22.5 deg (Krakauer et al. 2000). In the present study, two target directions, separated by 10 deg, were presented randomly at equal probability. In this case, according to the modeling approach of Tanaka et al. (2012), ignoring the lack of generalization when the error sensitivity for the neighbored target equals only a fraction (f < 1) of the sensitivity for the trained target would lead to an underestimate of the error sensitivities (on the trained direction) in range of (1 + f )∕2 . The estimates of the retention rates are not expected to be systematically affected by the simplification. However, our new method to compute the maximum likelihood of the observed pointing direction could easily be applied to the multi-target adaptation model of Tanaka et al. (2012). This model is a time-variant discrete linear filter with a state space, which is extended to a dimension 2M , . 7 Variance parameters fitted to simulated data. The execution noise-compensation gain ( q ) and the coefficient of variation of the fast planning noise ( f ) are dimensionless, the variances of execution noise ( 2 ) slow ( w 2 ss ) and fast ( w 2 ff ) planning states are shown in units of deg 2 . Dashed horizontal lines: median of the parameters used for simulation (see Table 2, row M4). Bars: Median (across 2 × 49 parameter sets) of the medians of the parameters fitted to the 600 simulated data sets per individual and condition (gradual/abrupt). Whiskers: the length indicates the upper and the lower median-quartile distance within the 600 model simulations with the same underlying parameter set (uncertainty of the parameter estimation due to trial-totrial noise) 1 3 where M denotes the number of trained target directions. Therefore, the extended state-space model for multi-target adaptation is a special case that is formally covered by the time-variant formulation (Eq. A16a/b) and all further derivations of the appendices.

Differences between gradual and abrupt training
Overall, we did not observe strong differences of the adaptation dynamics between gradual and abrupt training. The retention rates of both fast and slow mechanisms, as well as the slow error sensitivity, were identical for both training conditions. Only the fast error sensitivity was significantly smaller in abrupt than in gradual training. This effect confirms that the adaptation dynamics are not independent of the history of the presented errors as shown by previous studies. Zarahn et al. (2008) argued that such effects of metalearning can be modeled by changes of the learning parameters across the phases of an adaptation experiment. Huang and Shadmehr (2009) found in force field adaptation that retention rates measured during error-clamp trials were larger after gradual than after abrupt training. Turnham et al. (2012) observed in adaptation to visuomotor rotation that the fast adapting process showed larger error sensitivity and, to a lesser degree, a larger retention rate after gradual or random training than under control conditions. These effects were interpreted in the context of a model that predicts that the error sensitivity decreases when errors change inconsistently (Herzfeld et al. 2014).

Maximum-likelihood estimation of variance parameters of a closed-loop system
The maximum-likelihood estimator is asymptotically biasfree and has the best efficiency of all normally distributed estimators (Fisher 1925), i.e., the smallest variance for a given number of observations. Applied to linear models with additive Gaussian noise (like the two-rate adaptation model), it has two major additional benefits that make it the generally preferred estimator: First, the estimates of the filter coefficients (i.e., the parameters determining the expectation of the model output) are bias-free not only asymptotically but even for small samples. Second, even using wrong variance models does not induce biases for the estimates of the filter coefficients. For that reason, the linear regression model (LSQ) is widely used even if its strong assumption of white noise is not strictly justified. Unfortunately, this robustness only applies for the filter coefficients, not for the variance parameters (i.e., the parameters determining the noise distribution). The best-known example is the maximum likelihood estimate of the variance in a simple constant model with additive Gaussian noise x i = m + r i for 1iN with r i ∼ N(0, 2 ) . The maximum-likelihood estimator of the variance is , which is not bias free but has the expectation E ̂ 2 = N−1 N 2 . Similarly, our maximum likelihood estimator with the variance model that accounted fully for the variance structure of the simulated data (M4) is also not completely bias-free. Figure 7 shows that the planning noise ( w ss , w ff ) was underestimated by M4. However, Fig. 7 also shows that using a wrong variance model for estimating the variance has more severe consequences on the estimates of the variance parameters. Ignoring the signal dependency of the fast planning noise (M2) causes the noise-compensation gain q , the execution variance 2 , and the slow planning variance w ss to be overestimated. The fast mean planning noise w ff is underestimated. Overall, Fig. 7 demonstrates that our maximum likelihood estimator is able to identify the complex variance structure of these data most accurately when it is based on the correct model (M4). This is important because the observed variance and autocovariance of the behavioral data (Fig. 5, black) could not be modeled appropriately by constant execution and planning noise (M1, M2). Thus, the complexity of model M4 is required to describe the pointing variability in our experiment and must therefore also be considered in a maximum likelihood estimator to identify its variance structure. The algorithm for computing the exact likelihood of the closed-loop system, which is described in "Appendices A3/4/5," is therefore a necessary tool for the purpose of this study.
An interesting new aspect of our variance model is that it allows an empirical estimate for the noise-compensation gain ( q ) to be obtained. Previous studies estimating planning noise (Albert and Shadmehr 2017;Zarahn et al. 2008) in visuomotor adaptation ignored the possible transfer of execution noise into the error signal ( = 0 ) and could therefore apply the standard Kalman filter for computing the likelihood of the output (see "Appendix A2"). One of the main problems with this assumption is that in closed-loop trials this corresponds to a compensation gain of q = 1 , whereas in error-clamp trials = 0 corresponds to q = 0 . The difficulty to justify this assumption in experiments involving both closed-loop and error-clamp trials was one of the main motivations to develop the generalized Kalman filter for the closed-loop system ("Appendix A4"). The median of q was 0.39 (Table 2, line 4), suggesting that the expected visual feedback accounts for 39% (population median) of the execution noise contaminating the actual visual feedback in the closed-loop trials.
In a previous study, Albert and Shadmehr (2017) already evaluated the estimation errors of the learning parameters ( a s , a f , b s , b f ) in individuals and obtained similar simulation results as the ones shown in Table 4 (within variance). Thus, our extension of the underlying variance model had only marginal benefit for the estimation of the learning parameters. However, substantial benefit was not expected since in linear models, the efficiency of the coefficient estimates is relatively robust with respect to small errors of the variance model. One benefit of our method is that it allows analysis of the precision of the variance estimates obtained in a single subject (whiskers in Fig. 7, M4). Of primary interest is here not only the interquartile range of the within-subject distribution per se, but also its ratio to the interquartile range observed across the population (Table 2, line 4, [iqr]). Values of this ratio close to zero indicate that the corresponding variance estimate can be used as an individual characteristic, whereas values close to one indicate the opposite. These ratios were q : 1.02; 2 : 0.38 deg 2 ; w 2 ss : 0.16 deg 2 ; w 2 ff : 0.52 deg 2 ; f : 0.43. They show that, similar to the learning parameters (Table 4), a considerable part of the dispersion across subjects was due to the limited precision of the estimation in a single subject. Especially the noise-compensation gain q proved to be not suitable for characterizing an individual, despite its small estimation bias (Fig. 8a, M4). In contrast, estimates of w 2 ss , 2 and, to a lesser extent also f , allow inter-individual comparisons.

Time-variant planning noise during visuomotor adaptation
On the one hand, our observation that trial-to-trial variability increased during the early part of visuomotor adaptation resembles the increase of execution variability during early motor skill learning. But on the other hand, we did not find any hint that the overall performance of visuomotor adaptation improved with increasing variance of the residual, the total planning variance, or the fast planning variance. Thus, in contrast to the reward-based motor learning task of Wu et al. (2014), we did not find any benefit of execution noise on task performance. This indicates that the execution variability may be more important in the reward processing task of Wu et al. (2014) that resembled a reinforcement learning task more closely than the visuomotor adaptation of the current study. Our finding confirms the classical two-rate model in that its expected adaptation time course does not depend on the noise level. At the same time, the current study demonstrates that the variance of the fast adapting memory state is modulated in time even though this noise was not beneficial for the adaptation task (absence of a correlation between w 2 ff and the total adaptive change). This is not contradictory, as increased variability also did not impair adaptation in our study even though such an impairment might be expected based on studies investigating the effect of error consistency on error sensitivity (Herzfeld et al. 2014) or studies suggesting a direct influence of the noise level on the error sensitivity due to a statistic evaluation of error relevance (Berniker and Kording 2011;Wei and Körding 2009). The absence of effects of execution or planning variance on the total adaptive change suggests that potential nonlinear contributions of trial-to-trial noise did not have any effect, either positive or negative. It seems therefore that the fast adapting planning process is linked with a mechanism of active variance control, possibly because the same planning process is also involved in reinforcement learning tasks where it is beneficial. The participation of this planning process in visuomotor adaptation does not require modification of its associated variance control because it is irrelevant in the adaptation task.
Our interpretation that the temporal modulation of the observed trial-to-trial variance was due to signal-dependent fast planning noise is supported by the finding that the increase of the trial-to-trial variance in the early adaptation phase closely followed the time course of the fast adapting planning state which increased more slowly and reached its peak later in the gradual than in the abrupt training (Fig. 3, magenta). The same variance parameters of model M4 could explain the two different time courses of variance and autocovariance in both conditions (gradual/abrupt; Fig. 5). Thus, the observed variance increase was specifically linked to the contribution of the fast adapting process. This specificity supports the notion that fast and slow components of visuomotor adaptation represent distinct and separable mechanisms. Originally, the two-rate model did not implicate a clear separability of the underlying processes (Smith et al. 2006). Later studies (McDougle et al. 2015) suggested that fast and slow adaptation mechanisms are associated with explicit and implicit motor learning. The present study contributes to this discussion in that it shows that the fast mechanism but not the slow one is involved in the control of planning variability. This is a functional distinction between the two mechanisms even though it remains unclear whether control of planning variability is an explicit or implicit process.
The current study tested the alternatives regarding whether the observed variance increase in the early phase of visuomotor adaptation was due to non-stationary (errordependent) execution noise or non-stationary (signaldependent) planning noise. The sudden increase of the autocovariance at the beginning of the training phase suggested that the non-stationary planning noise was responsible for the observed variance increase. We showed here that signal-dependent planning noise can explain the presented results, but it has to be noted that other mechanisms of nonstationary planning noise may provide possible alternatives. For example, planning variability may be conceived as a combination of genuine (constant) planning noise and exploration. Exploration is not needed when performance is fine and increases only with larger errors when exploration is beneficial. Such a mechanism, which can be characterized as error-dependent planning noise, was previously discussed 1 3 mainly in the context of reinforcement learning. The current study cannot distinguish between such error-dependent planning noise and signal-dependent fast planning noise because both the error and the fast adaptive component show large values in the early training phase.
In summary, the modified Kalman approach developed in this study allowed the variance parameters of time-discrete adaptive systems observed under varying feedback conditions to be estimated. This method allowed modeling of the temporal changes of the trial-to-trial variance and autocovariance of the pointing direction in a complex training paradigm in visuomotor adaptation. The results showed that trial-to-trial variability increased during the early learning phase similar to motor skill learning. This increase was due to an increase of planning noise rather than execution noise and could be modeled by signal-dependent fast planning noise.

A1: Model definition
The structure of a two-rate model of sensorimotor adaptation is shown in Fig. 1. This model is almost identical with the two-rate adaptation model proposed and applied by many previous studies (Smith et al. 2006;Criscimagna-Hemminger and Shadmehr 2008;Albert and Shadmehr 2017;Ethier et al. 2008;Zarahn et al. 2008). Figure 1 visualizes the mathematical relations between the internal memory states represented here by the two components of the twodimensional state vector (x n ), the observed pointing direction (d n ), and the experimentally induced visuomotor distortion (u n ). Earlier studies did not fully investigate the implications of this model for the variance/covariance structure of the pointing sequence ( {d} N−1 0 ). In particular, the dependence of the variance of pointing direction on the trial type (closedloop, error-clamp) was never compared with the predictions of this model. Therefore, the current study focuses on that aspect.
n: Trial number. The experiment is composed of N trials which are numbered from n = 0 to n = N − 1. In the following, quantities that are specific for each trial appear with the index n. d n (deg): Pointing direction. The pointing direction was defined as the direction of the line connecting the start point of the movement with the intersection of the movement trajectory and circle with a radius of 3 cm around the start point. The pointing direction is specified with respect to the straight-ahead direction. Positive values indicate rightward directions.
g n (deg): Motor goal. The angle g specifies the average direction of the pointing movement in the non-adapted state.
It is identical to the sum of the target direction and potential motor biases for a given target and is defined relative to the straight-ahead direction. Positive values of g n indicate movements to the right. The motor goal is defined as the mean pointing direction that is adopted when the subject is fully familiarized with the pointing device in the absence of any artificial visuomotor rotation (r n ). Therefore, the motor goal is also called the baseline pointing direction and is specific for the direction of the target presented in trial n. The functional dependence of the motor goal on target direction is assumed to be independent of time. Thus, the index n in g n does not reflect temporal changes of that function but only that the target direction differed from trial to trial. vd n (deg): Visually guided pointing direction. This angle is the pointing direction that would be executed in the absence of planned motor exploration, external transient mechanical perturbations, and motor noise. All these types of noise are subsumed under the term "execution noise" (v n ) v n (deg): Execution noise. We assume that the execution noise is distributed normally with zero-mean and variance σ 2 r n (deg): Visuomotor rotation. The angle r n specifies the rotation of the cursor with respect to the pointing direction of the hand. Positive r n indicates clockwise rotation of the cursor. The actual visual feedback (h n ) is the sum of the pointing direction and the visuomotor distortion The artificially introduced visuomotor rotation is the quantity that causes the systematic errors driving the motor adaptation. Clockwise visuomotor rotations induce counterclockwise changes in the visually guided pointing direction. To use the same sign convention for the driving stimulus and for the resulting adaptive change, we also define the signal u n . u n (deg): Visuomotor distortion. This is the negative of the visuomotor rotation e n (deg): Visual feedback error. The visuomotor distortion is not directly accessible. Therefore, the adaptation to this distortion is indirectly driven by the visual feedback error. It is defined as the difference between a visual reference orientation ( ĥ n ) and the actual feedback (h n ). Like the visuomotor distortion, positive values of e n induce positive (clockwise) changes of the visually guided pointing direction. Consequently, e n is defined as With this definition, the visual reference orientation ĥ n is implicitly defined as the visual feedback that, when consistently applied, leads finally to the baseline pointing direction (g n ). Thereby, ĥ n is not necessarily a deterministic variable but may be interpreted as a random variable (like the actual visual feedback h n ). This seems natural because the reference orientation ĥ n has been interpreted previously as the "expected visual feedback" (Synofzik et al. 2006(Synofzik et al. , 2008 represented by neural activity. Therefore, we do not assume that ĥ n = g n but only that the expectation of the random variable ĥ n equals the motor goal g n . Another reason to assume that ĥ n is a random variable is that previous studies suggested that error sources affecting the actual visual feedback, such as transient external perturbation sources (Wei and Körding 2009) or errors due to planned exploration (Collins and Wallman 2012) can be ignored and do not contribute to adaptation. To achieve this, the expected visual feedback must account for the ignored error components. Our model does this by assuming that the visual reference orientation ĥ n is derived from the motor goal by adding a certain fraction (q) of the execution noise v (0q1).
q: Noise-compensation gain: The visual feedback error e n drives the adaptation in the following way: where A is a diagonal 2 × 2 matrix with the retention rates a = a s ;a f applied to the slow and fast memory state, i.e., the components of the state vector x n . The vector b = b s ;b f is a 2-dimensional column vector containing the error sensitivities of the slow and the fast memory state, and the two-dimensional random vector w n denotes the two noise components added to the memory states. This reflects the idea that the generation of a specific internal motor plan (in this case that of a visually guided movement) is adapted with respect to a memory of its previous state and that this memory is subject to noise (van Beers 2009). We assume here that w n is normally distributed with zero mean and a diagonal matrix of covariance In the simplest version of the model, it is assumed that W is constant (independent of time) like 2 , the variance (A5) e n =ĥ n − h n .
(A7a) x n+1 = A x n + be n + w n (A7b) of the execution noise. The scalar product û n = c T x n on the right side of (Eq. A7b) denotes the projection of the memory states onto the motor output. The adaptive change û n = c T x n of the adaptation dynamics can be interpreted as an estimator of the visuomotor distortion u n . This estimate is added to the motor goal to obtain the visually guided pointing direction vd n , which is then contaminated by the execution noise v n to obtain the actual pointing direction d n . The projection weight c T is a row vector with both elements identical to one. This assumption is motivated by the problem that the relation between input ( e n ) output ( d n ) of this system is unaffected by linear transformations of the filter coefficients of the type Thus, the assumption of c being fixed avoids ambiguities in model identification.
y n (deg): relative pointing direction with respect to baseline. This angle is defined as the difference between the pointing direction d n and the motor goal This definition is useful because it allows the adaptive process to be expressed independent of the direction of the motor goal as follows. For closed-loop trials, when the visual feedback signal h n depends on the movement of the subject, Eqs. (A1-A7) can be used to express the feedback error in relation to the state vector: Inserting Eqs. (A9) and (A10) in Eq. (A7a/b) leads then to the following time discrete filter In error-clamp trials, the visual feedback does not depend on the subject's action but is determined by the experimenter. The cursor moves at a fixed angle with respect to the target direction and consequently also at a fixed angle ( − n ) with respect to the motor goal According to Eqs. (A5/A6/A12), the feedback error evaluates in error-clamp trials to , and Eq. (A11a) must be replaced in error-clamp trials by Following Albert and Shadmehr (2017), we also consider set breaks that may occur after a closed-loop trial or after (A13) e n = n + qv n , (A14) x n+1 = A x n + b n + qv n + w n .
an error-clamp trial with the index n. In those cases, the state updates for the rial n + 1 result from Eqs. (A11a) and (A13), by left-multiplying the right side of these equations by a matrix D n which is assumed to be a diagonal matrix of the form where p n is larger than zero for set breaks after the trial n and equals zero otherwise. Equations (A11a/b, 13), and (A15) can be put together into a single time-variant discrete system of the form. For greater generality, we use here an output gain-vector c n that is also time-dependent: where A n denotes the time-dependent system matrix b n denotes the time-dependent input gain z n denotes the deterministic input and n denotes the transfer gain of the execution noise into the feedback error e n Concerning the model predictions of the expected output y N−1 0 , Eq. (A16a-f) is identical with the standard two-rate model mentioned above (Smith et al. 2006). The new aspect of the formulation given in Eq. (A16) is that the state update (Eq. A16a) is contaminated by both execution noise ( v n ) and state noise ( w n ), whereas the system output (Eq. A16b) is only disturbed by execution noise. In addition, the noisecompensation gain (q) allows partial compensation of the adaptation input for execution noise to be taken into account.
In the literature (Smith et al. 2006), the filter coefficients ( a s , a f , b s , b f ) are often estimated by computing the expected output ( y n ) of the model by simulating Eq. (A16a/b) in the absence of noise ( v n = 0 , w n = 0 ) and by minimizing the sum of squared residuals is a stationary and white random process. In this case, minimizing the sum of squared residuals maximizes the likelihood of the observed output. However, for a system modeled as above, this would only be the case in the absence of the state noise ( W = 0 ) and if the execution noise did not contribute to the feedback error (γ n = 0). Therefore, LSQ provides estimates of the filter coefficients at a lower precision than a maximum-likelihood estimator based on the correct variance model. 2 Beside the potential benefits of a correct variance model on the precision of the estimated filter coefficients, correctly estimating the variance components W and σ 2 is an interesting question because it allows central and peripheral sources of execution noise to be distinguished (van Beers 2009). The development of a suitable variance model and of the corresponding maximumlikelihood estimator is therefore important for quantitative understanding of visuomotor adaptation. Previous studies (Tanaka et al. 2012;Albert and Shadmehr 2017) have already used maximum-likelihood methods based on the standard Kalman observer. The next section describes this approach and its advantages and drawbacks.

A2: The standard Kalman observer
The standard Kalman approach for optimal state estimates of linear dynamic systems from noisy output observations (Kalman 1960) is made under the assumption of two distinct and independent noise sources: measurement noise and process noise. It assumes that the internal states of the system are iteratively updated by multiplication with the system matrix A and linear summation with a weighted input signal s n . In addition, the state update is contaminated by the process noise w n : The scalar output ( y n ) is the weighted sum of the process states ( x n ), contaminated by the measurement noise v n : In addition, the standard Kalman observer assumes that both measurement noise and process noise are Gaussian with zero-mean and that both are independent of each other: (A18a) x n+1 = A x n + bs n + w n (A18b) y n = c T n x n + v n For this system, the state vector x n is also normally distributed because it results from linear operations on the deterministic signal s n and the normally distributed process noise w n .
The Kalman observer provides an iterative update of the mean and the covariance matrix of the states. Thus, the Kalman observer provides with x n+1 and n+1 the time course of the prior distribution of the state vector x n+1 conditioned on all previous observations {y} n 0 . This iterative update is performed in two steps per trial. First, the prior estimates are updated by the information provided by the measurement of y n , yielding the so-called posterior estimates of the means ( x p n ) and covariance matrix ( p n ) of the state vector: The elements of the vector k denote the so-called Kalman gains, i.e., the sensitivity of the posterior mean states with respect to the difference between the actual and expected observation. In the second step of the iteration, the posterior estimates are updated to the prior estimates of the next time step n + 1 by considering the system dynamics and the process noise: The iteration Eqs. (A21/A22) start with the distribution parameters of the initial states x 0 : F o r a g i ve n s e t o f s y s t e m p a r a m e t e r s = A, b, 2 , W, 0 , x 0 , the prior distributions of x n , provided by the Kalman observer, can be used to compute the negative log-likelihood (NLL) of the observation ( {y} N−1 0 ) conditioned on these parameters: Minimizing the negative log-likelihood (Eq. A24) can then be used not only to fit the system's filter coefficients A, b, and the expectation of the initial state vector x s0 x f 0 , but also the variance parameters ( 2 v , W, 0 ). Previous studies (Albert and Shadmehr 2017;Tanaka et al. 2012;Zarahn et al. 2008) minimized the negative loglikelihood (Eq. A24) of data acquired in the context of an adaptation experiment to visuomotor rotation (the same type of adaptation investigated by the current study). They used the same two-rate adaptation model (Smith et al. 2006;Criscimagna-Hemminger and Shadmehr 2008)  feedback error ( e n ) was identified with the system's input signal s n (Eq. A18a), and the relative pointing direction with respect to the baseline represented the system's output sequence y n (Eq. A18b).
The above-mentioned studies differed in that Zarahn et al. (2008), as well as Tanaka et al. (2012) minimized the negative log-likelihood directly, whereas Albert and Shadmehr (2017) used an expectation-maximization (EM) algorithm (Dempster et al. 1977). However, this difference concerns mainly the convergence properties of the applied algorithms because the EM-algorithm converges at the same solution as the maximum-likelihood approach. In the following, we describe the various aspects of the variance structure defined by Eq. (A16) that are simplified by this approach.
First, by identifying the feedback error e n with the input signal s n of the standard Kalman approach ignores the fact that the input to the state update equation ( z n + γ n • v n , Eq. A16a) is deterministic only for error-clamp trials with zero noise-compensation gain (q = 0). In all other cases, the stochastic nature of the system input z n + γ n • v n will cause the distribution of the output {y} N−1 0 to differ from that predicted by the simplified model. Second, in the simplified model (Eqs. A18), the system matrix A is the same for errorclamp and closed-loop trials, whereas in the more general formulation of the adaptation model (Eq. A16), the system matrix A n depends on the trial type. We therefore expect that the asymptotic covariance matrix ( ∞ ) of the system states differs between trial types, whereas the simplified model does not predict such a difference. Third, the variance of the noise (w n , Eq. A18a) entering the simplified state update does not depend on the trial type. In contrast, the sum of the noise components entering the state update of the more general model ( w n + b • n • v n , Eq. A16a) depends on the trial type because the transfer gain of the execution noise into the feedback error ( n ) depends on the trial type. Equation (A16f) shows that a dependence of the state noise on the trial type exists independent of the assumptions about the noise compensation gain (0 ≤ q ≤ 1). This is another reason why the general but not the simplified model predicts that n | |n→∞ depends on the trial type. In conclusion, previous applications of the Kalman observer for computing the likelihood of observations in visuomotor adaptation were not consistent with the variance/covariance structure defined by Eq. (A16) because these approaches did not consider that in visuomotor adaptation the peripheral execution noise reenters the system's input via the error signal.
In the current study, we want to develop a maximum-likelihood estimation method for the variance structure defined by the model as defined by Eq. (A16). Therefore, we must clarify how the above Kalman observer can be extended to be applied to the closed-loop model. The dependence of the system matrix and of the state noise on time can easily be included in the standard Kalman approach by supplying A or the W in Eq. (A22a/b) with the index n. In contrast, the transfer of execution noise to the state update constitutes a problem for the standard Kalman approach because it causes the state noise ( w n + b • n • v n , Eq. A16a) and the measurement noise ( v n , Eq. A16b) to depend statistically on each other. This violates an important precondition (Eq. A19) of the standard Kalman observer and therefore prohibits its direct application for computing the likelihood of an observation {y} N−1 0 of the adaptation system (Eq. A16). These considerations underline the fact that previous approaches to maximum-likelihood estimators in visuomotor adaptation mark important progress in identifying the tworate learning model compared to the simpler least-square approach. However, the use of the standard Kalman observer implied a simplification of the variance structure of the adaptation system that is not fully compatible with the system defined by Eq. (A16). To compute the exact likelihood of {y} N−1 0 for this system, we developed two different methods. First, we will present an algorithm to compute the mean y (dimension N × 1) of the observation vector y ∶= {y} N−1 0 and its covariance matrix Y (dimension N x N). The inverse Y −1 is then needed to compute the log-likelihood of the multivariate normal distribution of the observationy . The second method, which is numerically much more efficient, will extend the standard Kalman observer to the case when the measurement noise (or a fraction of it) is fed back into the system's input as in the model defined by Eq. (A16).

A3: The variance/covariance structure of the observation
Testing the variance/covariance structure of the pointing directions predicted by the model (Eq. A16) requires a comparison of the expected covariance matrix ( Y = [Y i,j ] N−1 i,j=0 ) with the empiric data ( y ). Ideally, to estimate the empiric covariance matrix, it would be desirable to observe repeated adaptation time courses in the same subject under identical conditions, i.e., with identical initial conditions of the internal visuomotor memory and with the same underlying adaptation dynamics. Such empirical data are difficult to obtain, first, because the memory states are affected by previous experiences and, second, because it was also suggested that the adaptation dynamics depend on previous experience (Herzfeld et al. 2014;Huang and Shadmehr 2009). Like most previous studies, we did not acquire many repeated adaptations in a single subject but estimated the characteristics of the within-subject noise by evaluating the variance and the autocovariance of the residual error r n = y n − y n within a window of length N = 2w + 1 centered around the trial n. These empiric measures were compared with the corresponding model predictions that were obtained as follows: For the given trial window, the expected autocovariance at lag Δ can be derived from the expected covariance matrix Y where Y i,j denotes the element of Y in row i and column j. As a special case, we obtain also the expected variance of the residual ( var_r n = acov_r n (Δ = 0) ) in this window. We evaluated the time course of the observed residual variance and autocovariance in a running window with the length of 20 trials and compared it with the corresponding model predictions obtained by Eq. (A25).
To test this model prediction of the expected autocovariance, we need to evaluate the covariance matrix Y. This was achieved by the following algorithm. Within a first loop, the expectation x Eq. (A27a/b), but not with the state mean x n and the covariance matrix n in the algorithm for the generalized Kalman observer (Eqs. A41-A45) because the later ones are conditioned on the observations {y} n−1 0 . To account for the time variant covariance W n of the state noise in the generalized Kalman observer, W n was computed in parallel with and before each step of the iteration Eq. (A27a/b). The resulting time variant W n was then used to replace the constant W in the generalized Kalman algorithm (Eq. A44).