Analysis of human data
A subset of the data previously collected for the study published in Donchin and Shadmehr (2004) was analyzed in order to determine the amount of variance in human movements and the degree of correlation between different state variables. The methods are completely described in the earlier paper. Briefly, the subjects performed a standard curl field paradigm described making 10 cm movements in one direction (straight away from the body, the Y direction) with the robot assisting in passive return of the hand to the starting position after each movement. Data was analyzed from 4 subjects who each performed 3 sets of 150 movements with no catch trials. We calculated the correlation of X and Y position and X and Y velocity across all time steps of all movements for each subject. We also calculated, for each subject, the mean of the X and Y starting position across all 450 trials and the mean of the starting velocity taken across the first 30 ms of movement in each of the 450 trials. Finally, we calculated the covariance over all four of these variables.
Task
We created a model that was capable of accomplishing a simulated simplified reaching task in a two dimensional space (Todorov and Jordan 2002). The model end effector began movements either from the point (0,) or from a starting position and velocity that were chosen from a normal distribution with a mean and covariance structure matched to the average mean and covariance in starting movements and velocities in human movements. It was then required to move to a target located at Y = 10 cm. Each trial was composed of 100 discrete time steps (0.01 s each), in which the controller generated a motor command u and the state of the plant was updated in response to the motor command and plant dynamics:
$$ {{\mathbf{x}}_{n + 1}} = {\mathbf{A}}{{\mathbf{x}}_n} + {\mathbf{B}}{{\mathbf{u}}_n} + {{\mathbf{w}}_n},{{\mathbf{w}}_n} \sim N\left( {0,{\Sigma_w}} \right) $$
(1)
Where x
n
is the state vector at time step n, u
n
is the command vector at time step n (a force that affects acceleration), A is the dynamics matrix, B is the command matrix, and w
n is the process noise (distributed normally with covariance matrix Σ
w
)
Simulated movements were made either in free space (null field) or in a perturbing curl force field (Shadmehr and Mussa-Ivaldi 1994). In the null field, the only external force acting on the plant was caused by the motor command. In the curl field, an external force perpendicular to the instantaneous velocity of the end effector is added to the motor command. Its magnitude is proportional to speed (\( {{\mathbf{F}}_{ext}} = {\mathbf{C}} \cdot {\mathbf{v}},{\mathbf{C}} = \left[ {0, - b;b,0} \right] \)). Curl fields are commonly used in experiments on adaptation of reaching movements (Shadmehr et al. 2005), and human subjects are able to adapt to curl fields: trajectories gradually shift to curved movements which are optimal for the curl field (Izawa et al. 2008).
Cost
An analytic solution for the generation of control signals in an OFC controller can be found only when the cost function has a quadratic form. Hence, as is common in such simulations, we used a cost function that was a quadratic function of the state and command:
$$ C = \sum\limits_{n = 1}^N {{\mathbf{x}}_n^T{{\mathbf{Q}}_n}{{\mathbf{x}}_n}} + {\mathbf{u}}_n^T{\mathbf{R}}{{\mathbf{u}}_n} $$
(2)
The state dependent cost matrix (Q) penalized distance of the end effector from the target position and velocity. The penalty was imposed only in the final 20% (200 msec) of the trial. To minimize state dependent cost, the controller must reach the target within 800 msec and bring the end effector to a stop. The action dependant cost matrix (R) is constant and diagonal, so it simply penalized effort. This meant that movements were made using as little force as possible. An optimal movement is one which reaches the target in 800 msec with as little effort as possible, and stays there. In total, without using matrix notation, the cost was
$$ C = {\alpha_{pos}}{\left( {{x_{pos}} - {x_{tar}}} \right)^2} + {\alpha_{vel}}x_{vel}^2 + {\alpha_{action}}{u^2} $$
(3)
with
$$ \begin{array}{*{20}{c}} {{\alpha_{vel}} = \left\{ {\begin{array}{*{20}{c}} {100,\;{\text{if}}\;{\text{T}} > 800{\text{m}}\sec } \hfill \\ {0\;{\text{otherwise}}} \hfill \\ \end{array} } \right.} \hfill \\ {{\alpha_{vel}} = \left\{ {\begin{array}{*{20}{c}} {0.2,\;{\text{if}}\;{\text{T}} > 800{\text{m}}\sec } \hfill \\ {0\;{\text{otherwise}}} \hfill \\ \end{array} } \right.} \hfill \\ {{\alpha_{action}} = \frac{{{{10}^{ - 5}}}}{N}} \hfill \\ \end{array} $$
(4)
Plant
The plant that was used for all simulations was a point mass. The plant’s dynamics were inertial; in addition, forces generated by both the motor command and the force field drove the inertial dynamics.
The state vector was defined as \( {\text{x}} = {\left[ {p_{x} \,p_{y} \,v_{x} \,v_{y} \,a_{x} \,a_{y} \,T_{x} \,T_{y} } \right]}^{T} \) (with p
x
and p
y
representing position in the x and y coordinates respectively; and \( {v_\bullet } \), \( {a_\bullet } \) and \( {T_\bullet } \) representing the coordinates of velocity, acceleration, and target position).
The state was calculated at every time step according to the dynamics equation (Eq. (1)), using the dynamics matrix A
$$ {\mathbf{A}} = \left[ \begin{array}{*{20}{c}} 1 \hfill & 0 \hfill & {dt} \hfill & 0 \hfill & 0 \hfill & 0 \hfill & 0 \hfill & 0 \hfill \\ 0 \hfill & 1 \hfill & 0 \hfill & {dt} \hfill & 0 \hfill & 0 \hfill & 0 \hfill & 0 \hfill \\ 0 \hfill & 0 \hfill & 1 \hfill & 0 \hfill & {dt} \hfill & 0 \hfill & 0 \hfill & 0 \hfill \\ 0 \hfill & 0 \hfill & 0 \hfill & 1 \hfill & 0 \hfill & {dt} \hfill & 0 \hfill & 0 \hfill \\ {\mathbf{0}} \hfill & {\mathbf{0}} \hfill & {\mathbf{0}} \hfill & {\mathbf{0}} \hfill & 0 \hfill & 0 \hfill & 0 \hfill & 0 \hfill \\ {\mathbf{0}} \hfill & {\mathbf{0}} \hfill & {\mathbf{0}} \hfill & {\mathbf{0}} \hfill & 0 \hfill & 0 \hfill & 0 \hfill & 0 \hfill \\ 0 \hfill & 0 \hfill & 0 \hfill & 0 \hfill & 0 \hfill & 0 \hfill & 1 \hfill & 0 \hfill \\ 0 \hfill & 0 \hfill & 0 \hfill & 0 \hfill & 0 \hfill & 0 \hfill & 0 \hfill & 1 \hfill \\ \end{array} \right]$$
(5)
and the command matrix B
$$ {\mathbf{B}} = \left[ \begin{array}{*{20}{c}} 0 \hfill & 0 \hfill \\ 0 \hfill & 0 \hfill \\ 0 \hfill & 0 \hfill \\ 0 \hfill & 0 \hfill \\ 1 \hfill & 0 \hfill \\ 0 \hfill & 1 \hfill \\ 0 \hfill & 0 \hfill \\ 0 \hfill & 0 \hfill \\ \end{array} \right]$$
(6)
Thus motor commands were applied as forces that affect acceleration. When the curl field is applied, the dynamics matrix changes to:
$$ {{\mathbf{A}}_{CF}} = \left[ \begin{array}{*{20}{c}} 1 \hfill & 0 \hfill & {dt} \hfill & 0 \hfill & 0 \hfill & 0 \hfill & 0 \hfill & 0 \hfill \\ 0 \hfill & 1 \hfill & 0 \hfill & {dt} \hfill & 0 \hfill & 0 \hfill & 0 \hfill & 0 \hfill \\ 0 \hfill & 0 \hfill & 1 \hfill & 0 \hfill & {dt} \hfill & 0 \hfill & 0 \hfill & 0 \hfill \\ 0 \hfill & 0 \hfill & 0 \hfill & 1 \hfill & 0 \hfill & {dt} \hfill & 0 \hfill & 0 \hfill \\ {\mathbf{0}} \hfill & {\mathbf{0}} \hfill & {\mathbf{0}} \hfill & { - {\mathbf{2}}} \hfill & 0 \hfill & 0 \hfill & 0 \hfill & 0 \hfill \\ {\mathbf{0}} \hfill & {\mathbf{0}} \hfill & {\mathbf{2}} \hfill & {\mathbf{0}} \hfill & 0 \hfill & 0 \hfill & 0 \hfill & 0 \hfill \\ 0 \hfill & 0 \hfill & 0 \hfill & 0 \hfill & 0 \hfill & 0 \hfill & 1 \hfill & 0 \hfill \\ 0 \hfill & 0 \hfill & 0 \hfill & 0 \hfill & 0 \hfill & 0 \hfill & 0 \hfill & 1 \hfill \\ \end{array}\right] $$
(7)
The difference between the dynamics matrices in Eqs. (5) and (7) are in bold face. They describe the effects of position and velocity on acceleration.
Feedback, y
n
, was returned from the plant to the control system according to the equation:
$$ {{\mathbf{y}}_n} = {\mathbf{H}} \cdot {{\mathbf{x}}_n} + \eta, \eta \sim N\left( {{\mathbf{0}},{{\mathbf{\Sigma }}_\eta }} \right) $$
(8)
Where H is the observation matrix, and η is the observation noise, normally distributed with zero mean and covariance Σ
η
. The observation matrix, H is diagonal in the first six rows, so the system receives feedback regarding the actual position, velocity and acceleration in both the x and y coordinates.
Simulations
All simulations were run using MATLAB 7.6.0 (MathWorks, Natick, MA). In order to implement adaptation in different parts of the OFC model, we had to construct our software is such a way that every part of the OFC model could be altered and manipulated independently. We used OFC code made publicly available by Emanuel Todorov (http://www.cs.washington.edu/homes/todorov/software.htm) as a framework but modified the code such that different parts of the model could be made adaptive. The modified code is available at http://www.mll.org.il/AprasoffForwardModel.
We first modified the code to allow adaptation of the forward model. We will denote the output of the forward model \( {\widehat{{\mathbf{x}}}_{n + 1| n}} \), meaning the prediction of the state on trial n + 1 given sensory information from time n. Inputs to the forward model are the current estimated state \( \widehat{{\text{x}}}_{{n\left| n \right.}} \) (the estimate of state n given sensory information from time n; the output of the state estimator) and the current command u
n
. The forward model then calculates the a-priori estimated next state using its estimate of the dynamics matrix (\( \widehat{{\mathbf{A}}} \)) and the command matrix (\( \widehat{{\mathbf{B}}} \)):
$$ {\widehat{{\mathbf{x}}}_{n + 1\left| n \right.}} = \widehat{{\mathbf{A}}}{\widehat{{\mathbf{x}}}_{n\left| n \right.}} + \widehat{{\mathbf{B}}}{{\mathbf{u}}_n} $$
(9)
Forward model adaptation was based on two assumptions: (1) the forward model is implemented by the cerebellum (Shadmehr and Krakauer 2008), and (2) the cerebellum is involved in supervised learning (Doya et al. 2001). Therefore, learning can be approximated as a process of gradient descent. Indeed, some analytical models of behavior have suggested that adaptation is well modeled as a process of gradient descent (Donchin et al. 2003; Hwang et al. 2003). Thus, we used gradient descent learning on the prediction error in order to adapt the forward model in our simulation.
Forward model estimated dynamics \( \widehat{{\mathbf{A}}} \) were changed after every trial. The prediction error in a certain trial was defined as:
$$ E = \sum\limits_{n = 1}^N {{{\left( {{{\mathbf{y}}_n} - {\mathbf{H}}{{\widehat{{\mathbf{x}}}}_{n\left| {n - 1} \right.}}} \right)}^T} \cdot \left( {{{\mathbf{y}}_n} - {\mathbf{H}}{{\widehat{{\mathbf{x}}}}_{n\left| {n - 1} \right.}}} \right)} $$
(10)
To minimize this error, \( \widehat{{\mathbf{A}}} \) must be changed in order to go down the gradient of the error with respect to the estimated dynamics:
$$ \begin{array}{*{20}l} {{\frac{\partial }{{\partial \widehat{A}}}E} \hfill} & {{ = \frac{\partial }{{\partial \widehat{A}}}{\left( {{\sum\limits_{n = 1}^N {{\left( {y_{n} - H\widehat{x}_{{n|n - 1}} } \right)}^{T} \cdot {\left( {y_{n} - H\widehat{x}_{{n|n - 1}} } \right)}} }} \right)} = } \hfill} \\ {{} \hfill} & {{ = \frac{\partial }{{\partial \widehat{A}}}{\sum\limits_{n = 1}^N {{\left( {y^{T}_{n} y_{n} - y^{T}_{n} H\widehat{x}_{{n|n - 1}} - \widehat{x}^{T}_{{n|n - 1}} H^{T} y_{n} + \widehat{x}^{T}_{{n|n - 1}} H^{T} H\widehat{x}_{{n|n - 1}} } \right)}} }} \hfill} \\ {{} \hfill} & {{ = \frac{\partial }{{\partial \widehat{A}}}{\sum {{\left( {y^{T}_{n} y_{n} - y^{T}_{n} H{\left( {A\widehat{x}_{{n - 1}} + Bu_{{n - 1}} } \right)} - {\left( {A\widehat{x}_{{n - 1}} + Bu_{{n - 1}} } \right)}^{T} H^{T} y_{n} + {\left( {A\widehat{x}_{{n - 1}} + Bu_{{n - 1}} } \right)}^{T} H^{T} H{\left( {A\widehat{x}_{{n - 1}} + Bu_{{n - 1}} } \right)}} \right)}} }} \hfill} \\ {{} \hfill} & {{ = 2{\left( {H^{{\text{T}}} {\sum {y_{{\text{n}}} \widehat{x}^{{\text{T}}}_{{{\text{n}} - 1|{\text{n}} - 1}} - H^{{\text{T}}} HA} }{\sum {\widehat{x}_{{{\text{n}} - 1|{\text{n}} - 1}} \widehat{x}^{{\text{T}}}_{{{\text{n}} - 1|{\text{n}} - 1}} - H^{{\text{T}}} HB} }{\sum {u_{{{\text{n}} - 1}} \widehat{x}^{{\text{T}}}_{{{\text{n}} - 1|{\text{n}} - 1}} } }} \right)}} \hfill} \\ \end{array} $$
(11)
Hence, after every trial, we calculated the gradient and changed the estimated dynamics matrix, \( \widehat{{\mathbf{A}}} \), by a fraction of the gradient in the direction that reduced the error. The steps were always 0.005 of the gradient. We let the forward model make the a priori assumption that only the forces acting on the plant were likely to change and that the Newtonian laws relating acceleration to velocity and velocity to position were likely to remain fixed. This meant that we confined adaptation to the rows of the dynamics matrix that calculate the force as a function of the other state variables (indicated with a bold font in Eqs. (5) and (7)).
We then modified the state estimator to make it partially adaptive. The altered state estimator is the optimal adaptive Kalman filter (see Eq. (1) in online resource 1), as suggested in Todorov (2005). The optimal filter needs to be adaptive because the size of the signal generated on a specific movement affects the amount of noise in the state and, thus, the variance of our state estimate. However, the adaptive Kalman filter still relies on the known system dynamics for the estimation of the variances. Hence, adaptation of the Kalman filter will not prevent unknown dynamics for biasing state estimates.
The controller remained the same as in the original Todorov formulation. Its input is the current estimated state of the plant, as estimated by the Kalman filter:
$$ {\widehat{{\mathbf{x}}}_{\left. {n + 1} \right|n + 1}} = {\widehat{{\mathbf{x}}}_{\left. {n + 1} \right|n}} + \widehat{{\mathbf{A}}} \cdot {{\mathbf{K}}_n}\left( {{{\mathbf{y}}_n} - {\mathbf{H}} \cdot {{\widehat{{\mathbf{x}}}}_{\left. n \right|n}}} \right) $$
(12)
Note that we altered the Kalman filter somewhat, by taking the dynamics matrix multiplication out of the Kalman gain and writing it separately. This is purely notational, but it enables us to change the estimated dynamics used by the Kalman filter without re-calculating the Kalman gains. The controller’s output is the command to the plant. The command is calculated linearly from the state according to the equation:
$$ {{\mathbf{u}}_n} = - {\mathbf{L}} \cdot {\widehat{{\mathbf{x}}}_n} $$
(13)
The controller gain matrix L is the optimal feedback control gains as calculated in Todorov (2005), using the estimated dynamics \( \widehat{{\mathbf{A}}} \) and \( \widehat{{\mathbf{B}}} \) in Eq. (8).
We ran three types of adaptation simulation. All simulations were run for 800 trials. In the first type, only the forward model was adapted. At the beginning of the simulation, the OFC model was optimized to work in a null field (Eq. (5)) although the actual dynamics were curl force field dynamics (Eq. (7)). Throughout the simulation, the dynamics driving the controller and state estimator were kept constant and only the forward model was allowed to adapt.
In a second type of simulation, we used what we called a shared internal model. We began, as before, with a model optimized for the null field dynamics. Again the real dynamics were the curl force field dynamics, and the forward model adapted using gradient descent. However, in the shared internal model simulations the controller and state estimators are re-optimized every 15 trials, according to the dynamics learned by the forward model (\( \widehat{{\mathbf{A}}},\widehat{{\mathbf{B}}} \) in Eq. (8)). Thus, the controller and state estimator share the internal model learned by the forward model.
We ran three variations of this simulation. The first variation had a fixed starting point and 0 velocity at the beginning of movements, and all movements were made towards the same target. In the second variation, the starting point and initial velocity were distributed normally with a mean and covariance matched to the mean and covariance in human movements, as described above. The third variation returned to the fixed starting point and 0 initial velocity, but there were nine different targets distributed uniformly along an arc with a radius of 0.1 m and running from the positive x axis, 180° to the negative x axis. The targets thus covered the range of 0–180°. Target order was selected randomly.
In the third type of simulation, we used a linear approximation of controller adaptation, in order to examine the behavior of the system when the controller and forward model are adapted independently. In OFC theories of motor control, the controller is thought to be located outside of the cerebellum and to learn through unsupervised or reinforcement learning (Shadmehr and Krakauer 2008). In order to avoid the complexities of such adaptive controllers, we had the controller make a linear transition from the optimal controller and state estimator for the null field condition to the optimal controller and state estimator for the curl field:
$$ {{\mathbf{L}}_k} = \frac{{\left( {K - k} \right) \cdot {{\mathbf{L}}_{Null}} + k \cdot {{\mathbf{L}}_{FF}}}}{K} $$
(14)
Here, k is the trial number and K is the number of trials. The forward model had been adapted using gradient descent, as in the other simulations. The transition from one optimal controller to the other was made using simple linear interpolation between the two sets of parameters.
For a full list of parameters and values used in the simulations, please refer to online resource 2.