# Active inference, eye movements and oculomotor delays

- First Online:

- Received:
- Accepted:

DOI: 10.1007/s00422-014-0620-8

- Cite this article as:
- Perrinet, L.U., Adams, R.A. & Friston, K.J. Biol Cybern (2014) 108: 777. doi:10.1007/s00422-014-0620-8

## Abstract

This paper considers the problem of sensorimotor delays in the optimal control of (smooth) eye movements under uncertainty. Specifically, we consider delays in the visuo-oculomotor loop and their implications for active inference. Active inference uses a generalisation of Kalman filtering to provide Bayes optimal estimates of hidden states and action in generalised coordinates of motion. Representing hidden states in generalised coordinates provides a simple way of compensating for both sensory and oculomotor delays. The efficacy of this scheme is illustrated using neuronal simulations of pursuit initiation responses, with and without compensation. We then consider an extension of the generative model to simulate smooth pursuit eye movements—in which the visuo-oculomotor system believes both the target and its centre of gaze are attracted to a (hidden) point moving in the visual field. Finally, the generative model is equipped with a hierarchical structure, so that it can recognise and remember unseen (occluded) trajectories and emit anticipatory responses. These simulations speak to a straightforward and neurobiologically plausible solution to the generic problem of integrating information from different sources with different temporal delays and the particular difficulties encountered when a system—like the oculomotor system—tries to control its environment with delayed signals.

### Keywords

Oculomotor delays Tracking eye movements Smooth pursuit eye movements Generalised coordinates Variational free energy Active inference## 1 Introduction

### 1.1 Problem statement

This paper considers optimal motor control and the particular problems caused by the inevitable delay between the emission of motor commands and their sensory consequences. This is a generic problem that we illustrate within the context of oculomotor control where it is particularly prescient (see for instance (Nijhawan 2008) for a review). Although we focus on oculomotor control, the more general contribution of this work is to treat motor control as a pure inference problem. This allows us to use standard (Bayesian filtering) schemes to resolve the problem of sensorimotor delays—by absorbing them into a generative or forward model. Furthermore, this principled and generic solution has some degree of biological plausibility because the resulting active (Bayesian) filtering is formally identical to predictive coding, which has become an established metaphor for neuronal message passing in the brain. We will use oculomotor control as a vehicle to illustrate the basic idea using a series of generative models of eye movements—that address increasingly complicated aspects of oculomotor control. In short, we offer a general solution to the problem of sensorimotor delays in motor control—using established models of message passing in the brain—and demonstrate the implications of this solution in the particular setting of oculomotor control.

Eye movements can anticipate predictable stimuli, such as the sinusoidal movement of a pendulum (Barnes and Asselman 1991; Dodge et al. 1930; Westheimer 1954); for a review, see Barnes (2008). Interestingly, ocular tracking can compensate for sensorimotor delays after around one or two periods of sinusoidal motion—producing a tracking movement with little discernible delay (Barnes and Asselman 1991). This suggests that the oculomotor system can use sensory information from the past to predict its future sensory states (including its actions), despite the fact that these sensory changes can be due to both movement of the stimulus and movement of the eyes. The time taken to compensate for delays increases with the unpredictability of the stimulus (Michael and Jones 1966), though the system can adapt quickly to complex waveforms, with changes in velocity (Barnes and Schmid 2002), single cycles (Barnes et al. 2000) or perturbed periodic waves—where subjects appear to estimate their frequency using an average over recent cycles (Collins and Barnes 2009). Further studies suggest that different sources of information, such as auditory or verbal cues (Kowler 1989) or prior knowledge about the nature of sensory inputs (Montagnini et al. 2006), can evoke anticipatory eye movements.

The aim of this work was to establish a principled model of optimal visual motion processing and oculomotor control in the context of sensorimotor delays. Delays are often ignored in treatments of the visual–oculomotor system; however, they are crucial to understanding the system’s dynamics. For instance, delays may be important for understanding the pathophysiology of impaired oculomotor control: schizophrenic smooth pursuit abnormalities are due to impairments of the predictive (extra-retinal) motion signals that are required to compensate for sensorimotor delays (Nkam et al. 2010; Thaker et al. 1999). Surprisingly, delays may also explain a whole body of visual illusions (Changizi 2001; Changizi and Widders 2002; Changizi et al. 2008; Vaughn and Eagleman 2013), even for visual illusions that involve a static display. Delays are also an important consideration in control theory and engineering. Finally, neuronal solutions to the delay problem speak to the representation of time in the brain, which is essential for the proper fusion of information in the central nervous system.

### 1.2 Existing solutions and the proposed hypothesis

A principled approach to optimal oculomotor control is provided by Bayesian filtering schemes that use probabilistic representations to estimate visual and oculomotor states. These states are *hidden*; i.e. they cannot be measured directly. A popular scheme for linear control problems is the Kalman filter (Kalman 1960). The Kalman scheme can be extended to accommodate biomechanical constraints, such as transmission delays (e.g. fixed-lag smoothers). However, their solutions can become computationally complex when delays are large in relation to discretisation time and are not biologically plausible. We have previously considered generalised Bayesian filtering in continuous time as a metaphor for action and perception. This approach has been applied to eye movements (Friston et al. 2010b) and saccades in particular (Friston et al. 2012a). However, these applications ignored sensorimotor delays and their potentially confounding effects on optimal control.

Crucially, the active inference schemes we have considered previously are formulated using representations in *generalised coordinates of motion*; that is, states (such as position) are represented along with their higher-order temporal derivatives (such as speed, acceleration and jerk). This means that one has an implicit representation of hidden states in the recent past and future that can be used to finesse the problems of delays. For example, it has been shown that acceleration is a necessary component of the predictive drive to eye movements (Bennett et al. 2007). In brief, generalised representations can be projected to the past and to the future using simple (linear) mixtures of generalised motion. Note that a representation of generalised motion can be explicit or implicit using a population coding scheme—as has been demonstrated for acceleration (Lisberger and Movshon 1999). Representations of generalised motion may be important for modelling delays when integrating information in the brain from distal sources—such as other cortical columns in the same cortical area or other areas that are connected with fixed but different delays (Roelfsema et al. 1997). The integration of information over time becomes particularly acute in motor control, where the products of sensory processing couple back to the sampling of sensory information through action.

In the context of action, acted inference finesses the problems with delayed control signals in classical formulations of motor control by replacing command signals with descending corticospinal predictions. For instance, the location of receptive fields in the parietal cortex in monkeys is shown to shift transiently before an eye movement (Duhamel et al. 1992). These predictions are fulfilled at the peripheral level, using fast closed loop mechanisms (peripheral reflex arcs). In principle, “these predictions can anticipate delays if they are part of the generative model,” (Friston 2011); however, this anticipation has never been demonstrated formally. Here, we show how generalised Bayesian filtering—as used in active inference—can compensate for both sensory and motor delays in the visual–oculomotor loop.

It is important to mention what this work does not address. First, we focus on tracking eye movements (pursuit of a single-dot stimulus for a monocular observer with fixed head position): we do not consider other types of eye movements (vergence, saccades or the vestibulo-ocular reflex). Second, we take an approach that complements existing models, such as those of Robinson et al. (1986) and Krauzlis and Lisberger (1989). Existing models account for neurophysiological and behavioural data by refining block diagram models of oculomotor control to describe *how* the system might work. We take a more generic approach, in which we define the imperatives for any system sampling sensory data, derive an optimal oculomotor control solution and show *why* this solution explains the data. Although the two approaches should be consistent, ours offers a principled approach to identifying the necessary solutions (such as predictive coding) to a given problem (oculomotor delays). We hope to demonstrate the approach by modelling pursuit initiation and smooth pursuit—and then consider the outstanding issue of anticipatory responses: in previous treatments (Robinson et al. 1986), “[anticipation] has not been adequately modelled and no such attempt is offered (...) only unpredictable movements are considered”.

### 1.3 Outline

The main contributions of our work are described in the subsequent five sections. First, sect. 2 summarises the basic theory behind active inference and attempts to link generalised filtering to conventional Bayesian filters used in optimal control theory. This section then considers neurobiological implementations of generalised filtering, in terms of predictive coding in generalised coordinates of motion. This formulation allows us to consider the problem of delayed sensory input and motor output in sect. 3—and how this problem can be finessed in a relatively straightforward way using generalised representations. Having established the formal framework (and putative neuronal implementation), the final three sections deal with successively harder problems in oculomotor control. We start in Sect. 4 by considering *pursuit initiation* using a simple generative model of oculomotor trajectories. Using simulations, we consider the impact of motor delays, sensory delays and their interaction on responses to a single sweep of a visual target. The subsequent section turns to *smooth pursuit eye movements*—using a more sophisticated generative model of oculomotor trajectories, in which prior beliefs about eye movements enable the centre of gaze to predict target motion using a virtual or fictive target (see Sect. 5). In the final section, we turn to hierarchical models of target trajectories that have explicit memories of hidden dynamics, which enable anticipatory responses (see Sect. 6). These responses are illustrated using simulations of anticipatory pursuit movements using (rectified) hemi-sinusoidal motion. In short, these theoretical considerations lead to a partition of stimulus-bound eye movements into pursuit initiation, smooth pursuit and anticipatory pursuit, where each mode of oculomotor control calls on formal additions to the underlying generative model; however, they all use exactly the same scheme and basic principles. Where possible, we try to simulate classic empirical results in this field—at least heuristically.

In short, these theoretical considerations lead to a partition of stimulus-bound eye movements into pursuit initiation, smooth pursuit and anticipatory pursuit, where each mode of oculomotor control calls on formal additions to the underlying generative model. However, these models all use exactly the same scheme and basic principles; in particular, they all use the same solution to the oculomotor delay problem. These simulations illustrate that the active inference scheme can reproduce classical empirical results in three distinct experimental contexts.

## 2 From predictive coding to active inference

This section sets out the basic theory, before applying it to the special problem of oculomotor delays in the following sections. We first introduce the general framework of active inference in terms of generalised Bayesian filtering and variational free energy minimisation. In brief, active inference can be regarded as equipping standard Bayesian filtering schemes with classical reflex arcs that enable action, such as an eye movement, to fulfil predictions about hidden states of the world. Second, we will briefly describe the formalism of active inference in terms of differential equations describing the dynamics of the world and internal states of the visual–oculomotor system. The neurobiological implementation of these differential equations is considered in terms of predictive coding, which uses prediction errors on the motion of hidden states—such as the location of a visual target. In the next section, we will turn to the special problem of oculomotor delays and how this problem can be finessed using active inference in generalised coordinates of motion. This solution will be illustrated in subsequent sections using simulations of pursuit initiation responses and smooth pursuit. Finally, we shall exploit the richness of hierarchical generative models—which underlie active inference—to illustrate anticipatory eye movements that cannot be explained by simply compensating for oculomotor delays.

### 2.1 From free energy to generalised filtering

The brain minimises the free energy of sensory inputs defined by a generative model.

The generative model used by the brain is hierarchical, nonlinear and dynamic.

Neuronal firing rates encode the expected state of the world, under this model.

*ad hoc*value or loss function guiding action: action minimises the free energy of the system’s model. This permits the application of standard Bayesian solutions and simplifies the implicit neuronal architecture; for example, there is no need for an efference copy signal (Friston 2011). In this setting, desired movements are specified in terms of prior beliefs about state transitions or the motion of hidden states in the generative model. Action then realises prior beliefs (policies) by sampling sensory data that provide evidence for those beliefs.

The second assumption above is motivated by noting that the world is both dynamic and nonlinear—and that hierarchical structure emerges inevitably from a separation of temporal scales (Ginzburg 1955; Haken 1983). The third assumption is the Laplace assumption that, in terms of neural codes, leads to the *Laplace code*, which is arguably the simplest and most flexible of all neural codes (Friston 2009). In brief, the Laplace code means that probabilistic representations are encoded explicitly by synaptic activity in terms of their mean or expectation (while the second-order statistics such as dispersion or precision are encoded implicitly in terms of synaptic activity and efficacy). This limits the representation of hidden states to continuous variables, as opposed to discrete states; however, this is appropriate for most aspects of sensorimotor processing. Furthermore, it finesses the combinatoric explosion associated with discrete state space models. Restricting probabilistic representations to a Gaussian form clearly precludes multimodal representations. Having said this, the hierarchical form of the generative models allows for fairly graceful modelling of nonlinear effects (such as shadows and occlusions). For example, a Gaussian variable at one level of the model may enter the lower levels in highly nonlinear way—we will see examples of this later. See Appendix 2 for a motivation of the Laplace assumption from basic principles.

By explicitly separating real-world states—hidden from the agent—to its internal states, one can clearly separate the generative model from the updating scheme that allows to minimise the agent’s free energy: the first pair of coupled stochastic differential equations describes the dynamics of hidden states and causes in the world and how they generate sensory states. These equations are stochastic because sensory states and the motion of hidden states are subject to random fluctuations \((\varvec{\omega _x, \omega _\nu })\).

*generalised filtering*or predictive coding and has the same form as standard Bayesian (Kalman–Bucy) filters—see also Beal (2003) and Rao and Ballard (1999). The first term is a prediction based upon a differential operator \(\fancyscript{D}\) that returns the generalised motion of the conditional expectations, namely the vector of velocity, acceleration, jerk and so on—such that \(\fancyscript{D}\tilde{\mu } = (\mu ^{\prime }, \mu ^{\prime \prime }, \mu ^{\prime \prime \prime }, \ldots )\). However, the expected velocity is not the velocity of the expectation and comprises both prediction and update terms: the second term reflects this correction and ensures the changes in conditional expectations are Bayes optimal predictions of hidden states of the world—in the sense that they maximise the free-energy bound on Bayesian model evidence. See Fig. 2 for a schematic summary of the implicit conditional dependencies implied by Eq. 1.

### 2.2 Hierarchical form of the generative model

The deterministic part of the model is specified by nonlinear functions of hidden states and causes \((f^{(i)} , g^{(i)})\) that generate dynamics and sensory consequences. Hidden causes link hierarchical levels, whereas hidden states link dynamics over time. Hidden states and causes are abstract quantities that the brain uses to explain or predict sensations—like the motion of an object in the field of view. In hierarchical models of this sort, the output of one level acts as an input to the next. This input can produce complicated convolutions with deep (hierarchical) structure. We will see examples of this later in particular in the context of anticipatory movements.

### 2.3 Perception and predictive coding

It is difficult to overstate the generality and importance of Eq. 3—its solutions grandfather nearly every known statistical estimation scheme, under parametric assumptions about additive noise (Friston 2008). These range from ordinary least squares to advanced variational deconvolution schemes. In this form, one can see clearly the relationship between predictive coding and Kalman–Bucy filtering—changes in conditional expectations comprise a prediction (first term) plus a weighted mixture of prediction errors (remaining terms). The weights play the role of a Kalman gain matrix and are based on the gradients of the model functions and the precision of random fluctuations.

### 2.4 Action

### 2.5 Summary

In summary, we have derived equations for the dynamics of perception and action using a free energy formulation of adaptive (Bayes optimal) exchanges with the world and a generative model that is both generic and biologically plausible. A technical treatment of the material above will be found in Friston et al. (2010a), which provides the details of the generalised filtering used to produce the simulations in the next section. Before looking at these simulations, we consider how delays can be incorporated into this scheme.

## 3 Active inference with sensorimotor delays

If action and sensations were not subject to delays, one could integrate (solve) eq. 1 directly; however, in the presence of sensory and motor delays (\(\tau _s\) and \(\tau _a\), respectively), eq. 1 becomes a (stochastic and nonlinear) delay differential equation because \(\tilde{s}(t) = \varvec{\tilde{s}}(t - \tau _s)\) and \(a(t) = \varvec{a}(t + \tau _a)\). In other words, the agent receives sensations from (sees) the past, while emitting motor signals that will be enacted in the future (we will only consider delays from the sensory and motor sub-systems and neglect delays between neuronal systems in this paper).

^{1}:

### 3.1 Summary

This section has considered how the differential equations describing changes in action and internal (representational) states can be finessed to accommodate sensorimotor delays. This is relatively straightforward—in the context of generalised schemes—using delay operators that take mixtures of generalised motion to project states into the future or past. Sensory delays can be (internally) simulated and corrected by applying delays to sensory input producing sensory prediction error, while motor delays can be simulated and corrected by applying delays to sensory prediction error producing action. Neurobiologically, the application of delay operators just means changing synaptic connection strengths to take different mixtures of generalised sensations and their prediction errors. We will now use these operators to look at the effects of sensorimotor delays with and without compensation.

## 4 Results: pursuit initiation

This section focuses on the consequences of sensory delays, motor delays and their combination—in the context of pursuit initiation—using perhaps the simplest generative model for active inference. Our purpose is to illustrate the difficulties in oculomotor control that are incurred by delays and how these difficulties dissolve when delays are accommodated during active inference. We start with a description of the generative model and demonstrate its behaviour when delays are compensated. We then use this normal behaviour as a reference to look at failures of pursuit initiation induced by delays. In this section, responses to a single sweep of rightward motion are used to illustrate basic responses. In the next section, we consider pursuit of sinusoidal motion (with abrupt onsets) and the implications for generative models that may be used by the brain.

### 4.1 Generative model of pursuit initiation

The real-world provides sensory input in two modalities: proprioceptive input from cranial nerve nuclei reports the angular displacement of the eye \(\varvec{s}_o \in \mathbb {R}^2\) and corresponds to the centre of gaze. Note that, using the approximation of relatively small displacements, we use Cartesian coordinates to follow previous treatments, e.g. Friston et al. (2010a). However, visual space is better described by bounded polar coordinates, and treatments of large eye movements should account for this. Exteroceptive (retinal) input reports the angular position of a target in a retinal (intrinsic) frame of reference \(\varvec{s}_t \in \mathbb {R}^2\). The indices \(o\) and \(t\) thus refer to states of the oculomotor system or of the target, respectively. Note that \(\varvec{s}_t \) is just the difference between the centre of gaze and target location in an extrinsic frame of reference \(\varvec{x}_t - \varvec{x}_o\). In this paper, we are modelling the online inference of target position, and we are ignoring the problem of how the causal structure of the environment is learned. We simply assume that this structure has already been learned accurately, and therefore, the dynamics of the real world and the generative model are the same. Clearly, this model of visual processing is an enormous simplification: we are assuming that place coded spatial information can be summarised in terms of displacement vectors. However, more realistic simulations—using a set of retinotopic inputs with classical receptive fields covering visual space—produce virtually the same results. We will use more realistic models in future publications that deal with smooth pursuit and visual occlusion. Here, we use the simpler formulation to focus on delays and the different sorts of generative models that can provide top-down or extra-retinal constraints on visual motion processing.

The hidden states of this model comprise the true, real-world oculomotor displacement (\(\varvec{x}_o \in \mathbb {R}^2\)) and target location (\(\varvec{x}_t \in \mathbb {R}^2\)). The units of angular displacement are arbitrary, but parameters are tuned to correspond to a small displacement of 4 degrees of visual angle for one arbitrary unit (that is approximately 4 times the width of a thumb’s nail at arm’s length). The oculomotor state is driven by action with a time constant of \(t_a=64~\hbox {ms}\ \) and decays (slowly) to zero through damping, with a time constant of \(t_o = 512~\hbox {ms}\). The target location is perturbed by hidden causes \(\varvec{x}_t \in \mathbb {R}^2\) that describe the location to which the target is drawn, with a time constant of \(t_m=16~\hbox {ms}\). In this paper, the random fluctuations on sensory input and on the motion of hidden states are very small, with a log precision of 16. In other words, the random fluctuations have a variance of \(\exp (-16)\). This completes our description of the process generating sensory information, in which hidden causes force the motion of a target location and action forces oculomotor states. Target location and oculomotor states are combined to produce sensory information about the target in an intrinsic frame of reference.

The generative model has exactly the same form as the generative process but with one important exception: there is no action and the motion of the hidden oculomotor states is driven by the displacement between the target location and the central gaze (with a time constant of \(t_s=32~\hbox {ms}\)). In other words, the agent believes that its gaze will be attracted to the location of the target, which, itself, is being driven by some unknown exogenous force or hidden cause. The log precisions on the random fluctuations in the generative model were four, unless stated otherwise. This means that uncertainty about sensory input, (motion of) hidden states and causes was roughly equivalent.

Having specified the generative process and model, we can now solve the active inference scheme in Eq. 1 and examine its behaviour. Sensorimotor delays are implemented in the message passing from the generative process to the generative model. This generative model produces pursuit initiation because it embodies prior beliefs that the centre of gaze will follow the target location. This pursuit initiation rests on conditional expectations about the target location in extrinsic coordinates and the state of the oculomotor plant, where the location is driven by hidden causes that also have to be inferred.

The generative model described in this section provides the equations required to simulate active inference using the formalism of the previous section. In short, we now consider the generative model that defines the variational free energy and (Bayes) optimal active inference.

### 4.2 Simulations

The upper left panel shows the predicted sensory input (coloured lines) and sensory prediction errors (dotted red lines) along with the true values (broken black lines). Here, we see horizontal excursions of oculomotor angle (upper lines) and the angular position of the target in an intrinsic frame of reference (lower lines). This is effectively the distance of the target from the centre of gaze and reports the spatial lag of the target that is being followed (solid red line). One can see clearly an initial retinal displacement of the target that is suppressed after approximately \(20~\hbox {ms}\). This effect confirms that the visual representation of target position is predictive and that the presentation of a smooth predictable versus an unpredictable target would induce a lag between their relative positional estimates, as is evidenced in the *flash-lag effect* (Nijhawan 1994).

The sensory predictions are based upon the conditional expectations of hidden oculomotor (blue line) and target (red line) angular displacements shown on the upper right. The grey regions correspond to 90 % Bayesian confidence intervals, and the broken lines show the true values. One can see clearly the motion that elicits pursuit initiation responses, where the oculomotor excursion follows with a short delay of about \(64~\hbox {ms}\). The hidden cause of these displacements is shown with its conditional expectation on the lower left. The true cause and action are shown on the lower right. The action (blue line) is responsible for oculomotor displacements and is driven by proprioceptive prediction errors. Action does not return to zero because the sweep is maintained at an eccentric position during this simulation. This eye position slightly undershoots the target position: it is held at around 95 % of the target eccentricity in the upper right panel. Note that this corresponds roughly to the steady-state gain observed in behavioural data, which was modelled explicitly by Robinson et al. (1986). For our purposes, these simulations can be regarded as Bayes optimal solutions to the pursuit initiation problem, in which sensorimotor delays have been accommodated (discounted) via absorption into the generative model. We can now examine the performance in the absence of compensation and see how sensory and motor delays interact to confound pursuit initiation:

In effect, the active inference scheme has undergone a phase transition from a stable to an unstable fixed point. We have illustrated this bifurcation by increasing sensorimotor delays under a fixed motor precision or gain in Eq. 7. The results in Fig. 5 used a motor gain with a log precision of 2.5. We chose this value because it produced stable responses with sensory or motor delays alone and unstable dynamics with combined delays. These results illustrate the profound and deleterious effects of sensorimotor delays on simple pursuit initiation, using biologically plausible values—namely sensorimotor delays of 64 ms and a target velocity of about 16 degrees per second. This also illustrates the necessity of compensation for these delays so that the system can achieve a more robust and stable response. One would anticipate, in the face of such failures, real subjects would engage interceptive saccades to catch the target, of the sort seen in schizophrenic patients (Levy et al. 1993). In the remainder of this paper, we will concentrate on the nature of pursuit initiation and smooth pursuit with compensated sensorimotor delays, using a reasonably high motor gain with a log precision of four.

### 4.3 Pursuit initiation and visual contrast

### 4.4 Summary

In this section, we have seen that sensorimotor delays can have profound and deleterious effects on optimal oculomotor control. Here, optimal control means Bayes optimal active inference, in which pursuit initiation emerges spontaneously from prior beliefs about how a target attracts the centre of gaze. These simulations demonstrate that it is relatively easy to compensate for sensorimotor delays by exploiting representations in generalised coordinates of motion. Furthermore, the resulting scheme has some construct validity in relation to experimental manipulations of the precision or contrast of visual information. However, there are certain aspects of oculomotor tracking that suggest the pursuit initiation model above is incomplete: when presented with periodic target motion, the latency of motor gain (defined operationally in terms of the target and oculomotor velocities) characteristically reduces after the first cycle of target motion (Barnes et al. 2000). This phenomenon cannot be reproduced by the pursuit initiation model above.

## 5 Results: smooth pursuit

In this section, we consider a slightly more realistic generative model that replaces the prior beliefs about the target attracting the centre of gaze with the belief that both the target and centre of gaze are attracted by the same (fictive) location in visual space. This allows pursuit initiation to anticipate the trajectory of the target and pursue the target more accurately—providing the trajectories are sufficiently smooth. The idea behind this generative model is to account for the improvements in tracking performance that are not possible at the onset of motion and that are due to inference on smooth target trajectories.

### 5.1 Smooth pursuit model

*target*and the centre of gaze but by the displacement between the

*hidden cause*and the centre of gaze. In other words, the hidden oculomotor states are attracted by the hidden cause of target motion—not the target motion

*per se*. The idea here is that inference about the trajectory of the hidden cause should enable an anticipatory optimisation of pursuit initiation, provided these trajectories are smooth—hence a smooth pursuit model. Note that the equation of motion in the oculomotor model \(\dot{x}_o = \frac{1}{t_s}(x_t-x_o)\) (see Eq. 8) is the (adiabatic) solution to the equation used to model smooth pursuit: \(\frac{1}{t_v}(\nu ^{(1)}-x_o) - \frac{t_s}{t_v}{x}^{\prime }_o =0 \) when \(\nu ^{(1)} = x_t\) (see Eq. 9). As a result (and as confirmed by simulations), this model behaved similarly for the sweep stimulus used in Figs. 4, 5 and 6.

### 5.2 Simulations

### 5.3 Summary

## 6 Results: anticipatory pursuit

This section presents a full hierarchical model of anticipatory smooth pursuit eye movements that tries to account for anticipatory oculomotor responses that are driven by extra-retinal beliefs about the periodic behaviour of targets. This entails adding a hierarchical level to the model that enables the agent to recognise and remember the latent structure in target trajectories and suitably optimise its pursuit movements—which are illustrated here in terms of an improvement in the accuracy of target following after the onset of rectified target motion.

### 6.1 Anticipatory pursuit

As above, all the random fluctuations were assumed to have a log precision of four. Crucially, the mapping between the second-level (latent) hidden states and the motion of first-level hidden states encoding trajectories in visual (extrinsic) space is nonlinear. This means that latent periodic motion can be distorted in any arbitrary way. Here, we use a soft thresholding function \(\sigma (x) = \exp (4(x-1))\) to suppress negative (rightward) excursions of the target to model hemi-sinusoidal motion. This is the same function we used to generate the motion in Fig. 9. Note that if the precision of the noise at the second level falls to zero and there is no (precise) information at this level, the generative model assumes that the random fluctuations have an infinite variance. As a consequence, the prediction at the level below in the hierarchical model simplifies to \(\nu ^{(1)} = \omega ^{(2)}_\nu \), and we recover eq. 9 describing the smooth pursuit model. As a consequence, this parameter tunes the relative strength of anticipatory modulation.

### 6.2 Summary

In conclusion, to account for anticipatory pursuit movements that are not immediately available in target motion, one needs to equip generative models with a hierarchal structure that can accommodate latent dynamics—that may or may not be expressed at the sensory level. It is important to note that this model is a gross simplification of the complicated hierarchies that may exist in the brain. For instance, while some anticipation may be induced in smooth pursuit eye movements, some aspects, such as the aperture problem, may not be anticipated (Montagnini et al. 2006). In this model, the second-level hidden causes are simply driven by prediction errors and assume a constant frequency. As a consequence, prior beliefs about frequency are modelled as stationary. In the real brain, one might imagine that models of increasing hierarchical depth might allow for nonstationary frequencies and other dynamics—that would better fit behavioural data. We have chosen to illustrate the basic ideas using a minimalistic example of anticipation in eye movements. Hierarchical extensions of this sort emphasise the distinction between visual motion processing and attending oculomotor control based purely upon retinal and proprioceptive input—they emphasise extra-retinal processing that is informed by prior experience and beliefs about the latent causes of visual input. We will exploit this anticipatory smooth pursuit model in future work, where visual occluders are used to disclose beliefs about latent motion.

## 7 Discussion

In this paper, we have considered optimal motor control in the context of pursuit initiation and anticipatory smooth pursuit. In particular, we have taken a Bayesian perspective on optimality and have simulated various aspects of eye movement control using predictive coding and active inference. This provides a solution to the problem of sensorimotor delays that reproduces the results of earlier solutions—but using a neuronally plausible (predictive coding) scheme that has been applied to a whole range of perceptual, psychophysical, decision theoretic and motor control problems beyond oculomotor control. Active inference depends upon a generative model of stimulus trajectories and their active sampling through movement. This requires a careful consideration of the generative models that might be embodied by the visual–oculomotor system—and the sorts of behaviours one would expect to see under these models. The treatment in this paper distinguishes between three levels of predictive coding with respect to oculomotor control: the first is at the lowest level of sensorimotor message passing between the sensorium and internal states representing the causes of sensory signals. Here, we examined the potentially catastrophic effects of sensorimotor delays and how they can easily render oculomotor tracking inherently unstable. This problem can be finessed—in a relatively straightforward way—by exploiting representations in generalised coordinates of motion. These can be used to offset both sensory and motor delays, using simple and neurobiologically plausible mixtures of generalised motion. We then motivated a model of smooth pursuit eye movements by noting that a simple model of target following cannot account for the improvement in visual tracking after the onset of smooth and continuous target trajectories. In this paper, smooth pursuit was modelled in terms of hidden causes that attracted both the target and centre of gaze simultaneously—enabling the trajectory of the target to inform estimates of the hidden cause that, in turn, provide predictions about oculomotor consequences. While this extension accounted for experimentally observed tracking improvements—under continuous trajectories—it does not account for anticipatory movements that have to accumulate information over time. This anticipatory behaviour could only be explained with a deeper hierarchical model that has an explicit representation of latent (periodic) structure causing target motion. When the generative model was equipped with a deeper structure, it was then able to produce anticipatory movements of the sort seen experimentally. Clearly, the simulations in this paper are just heuristic and do not represent a proper simulation of neurobiological processing. However, they can be taken as proof of principle that the basic computational architecture—in terms of generalised representations and hierarchical models—can explain some important and empirical facts about eye movements. In what follows, we consider the models in this paper in relation to other models and how modelling of this sort may have important implications for understanding the visual–oculomotor system.

### 7.1 Comparison with other models

The model that we have presented here speaks to and complements several existing models of the oculomotor system. First, it shares some properties with computer vision algorithms used for image stabilisation. Such models often use motion detection coupled with salient feature detection for the registration of successive frames (Lucas and Kanade 1981). A major difference is that these models are often applied to very specific problems or configurations for which they give an efficient, yet *ad hoc* solution. A more generic approach is to use—as our model does—a probabilistic method, for instance particle filtering (Isard and Blake 1998). Our model provides a constructive extension—as we integrate the dynamics of both sensation and action. In principle, this could improve the online response of feature tracking algorithms.

Second, using our modelling approach, we reproduce similar behaviours shown by other neuromimetic models of the oculomotor system. For example, the pursuit of a dot with known uncertainty can be modelled as the response of a Kalman filter (Kalman 1960). Both generalised Bayesian (active inference) and Kalman filtering predict the current state of the system using prior knowledge (about previous target locations) and refine these predictions using sensory data (prediction errors). This analogy with block diagrams from control theory was first highlighted by Robinson et al. (1986) and Krauzlis and Lisberger (1989)—and has since been used widely (Grossberg et al. 1997). For a recent treatment involving the neuromorphic modelling of cortical areas, see Shibata et al. (2005). However, it should be noted that the link with Kalman filtering is rarely explicit (but see de Xivry et al. 2013); most models have been derived heuristically, rather than as optimal solutions under a generative model. One class of such neuromimetic models uses neural networks that mimic the behaviour of the Kalman filter (Haykin 2001). This model was used to fit and predict the response of smooth pursuit eye movements under different experimental parameters (Montagnini et al. 2007) or while interrupting information flow (Bogadhi et al. 2011a). Developing this methodology—and by analogy with modular control theory architectures—these building blocks can be assembled to accommodate increasingly complex behavioural tasks. This can take the form of a multi-layered model for transparency processing (Raudies et al. 2011) or of an interconnected graph connecting the form and motion pathways (Beck et al. 2008). Such models have been used to understand adaptation to blanking periods and to tune the balance between sensory and proprioceptive inputs (Madelain and Krauzlis 2003). Our model is different in a key aspect: The Kalman filter is indeed the (Bayes) optimal solution under a linear generative model, but a cascade of such solutions is not the optimal solution to (nonlinear) hierarchical models (Balaji and Friston 2011). The active inference approach considers the (embodied) system as a whole and furnishes an optimal solution in the form of generalised Bayesian filtering. In particular, given the delays at the sensory and motor levels, it provides an optimal solution that accommodates (or compensates for) these delays. As shown in the results, the ensuing behaviour reproduces experimental results from pursuit initiation (Masson et al. 2010) to anticipatory responses (Avila et al. 2006; Barnes et al. 2000). The approach thus provides in inclusive framework, compared with heuristics used in neuromimetic models that focus on specific aspects of oculomotor control (see below).

The model presented here shares many features with other probabilistic models. First, representations are encoded as probability density is. This allows processing and control to be defined in terms of probabilistic inference; for instance, by specifying a prior belief that favours slow speeds (Weiss et al. 2002). This approach has been successful in explaining a wide variety of physiological and psychophysical results. For example, it allows one to model spatial (Perrinet and Masson 2007) or temporal (Montagnini et al. 2007) integration of information, using conditional independence assumptions. Furthermore, recent developments have addressed the estimation of the shape and parameters of priors for slow speeds (Stocker and Simoncelli 2006) and for the integration of ambiguous versus non-ambiguous information (Bogadhi et al. 2011b). The active inference scheme used here relies on generative models that entail exactly the same sorts of priors. It has also been shown that free energy minimisation extends the type of probabilistic models described above to encompass retinal stabilisation and oculomotor reflexes (Friston et al. 2010b). A crucial difference here is that we have explicitly considered the problem of dynamics and delays. Our goal was to understand how the system could provide an optimal solution, when it knows (or can infer) the delay between sensing input (in the past) and processing information that informs action (in the future). This endeavour allowed us to build a model—using simple priors over the dynamics of the hidden causes—that reproduces the sorts of anticipatory behaviour seen empirically.

### 7.2 Limitations

Clearly, there are many aspects of oculomotor control we have ignored in this theoretical work. Foremost, we have used a limited set of stimuli to validate the model. Pursuit initiation was only simulated using a simple sweep of a dot, while smooth pursuit was studied using a sinusoidal trajectory. However, these types of stimuli are commonly used in the literature, as they best characterise the type of behaviour (following, pursuit) that we have tried to characterise: see Barnes (2008) for a review. We have not attempted to reproduce the oscillations at steady state as in Robinson et al. (1986) or Goldreich et al. (1992), although this may help to optimise the parameters of our model in relation to empirical data. The hemi-sinusoidal stimulus is also a typical stimulus for studying anticipatory responses (Avila et al. 2006; Barnes et al. 2000). Further validations of this model would call on a wider range of stimuli and consider and accumulated wealth of neurophysiological and behavioural data (Tlapale et al. 2010).

In this paper, we have focused on inference under a series of generative models of oculomotor control. We have not considered how these models are acquired or learned. In brief, the acquisition of generative models and their subsequent optimisation in terms of their parameters (i.e. synaptic connection strengths) is an important, if distinct, issue. In the context of active inference, model acquisition and perceptual learning can be cast in terms of model selection and parameter optimisation through the minimisation of free energy. Under certain simplifying assumptions, this learning reduces to associative plasticity. A discussion of these and related issues can be found in Friston (2008).

The generative model used in this paper has no explicit representation of space but only the uncertain, vectorial position of a target. We have previously studied the role of prediction in solving problems that are associated with the detection of motion using a dynamical and probabilistic model of spatial integration (Perrinet and Masson 2012). Both that model and the current model entertain a similar problem: that of the integration of local information into a global percept, in both the temporal (this manuscript) and spatial (Perrinet and Masson 2012) domains. We have considered integrating sensory information in the spatial domain: terms of the prediction of sensory causes and their sampling by saccades (Friston et al. 2012b), and of the effects on smooth pursuit of reducing the precision. This manipulation can account for several abnormalities of smooth pursuit eye movements typical of schizophrenia (Adams et al. 2012). In this paper, we have limited ourselves to integrating information over time. It would be nice, in the future, to consider temporal and spatial integration simultaneously.

A final limitation of our model is the simplified modelling of the physical properties of the oculomotor system—due to the biophysics of the eyes and photoreceptors, sensory input contains motion streaks that can influence the detection of motion (Barlow and Olshausen 2004). Furthermore, we have ignored delays in neuronal message passing among and within different levels of the hierarchy: for a review of quantitative data from monkeys, see Salin and Bullier (1995). Finally, we have not considered in any depth the finer details of how predictive coding or Bayesian filtering might be implemented neuronally. It should be noted that predictive coding in the cortex was attended by some early controversies; for example, paradoxical increases in visual evoked responses were observed when prediction error should be minimal. For example, a match between sensory signals and descending predictions can lead to the enhancement of neuronal firing (Roelfsema et al. 1998). The neuronal implementation assumed in our work (see 2) finesses many of these issues. In this (hypothetical) scheme, predictions and prediction errors are encoded by the neuronal activity of deep and superficial pyramidal cells, respectively (Mumford 1992; Bastos et al. 2012). In this scheme, the enhancement of evoked responses is generally thought to reflect attentional gain, which corresponds to the optimisation of the expected precision (inverse variance) of prediction errors, via synaptic gain control (Feldman and Friston 2010). Put simply, attention increases the gain of salient or precise prediction errors that the predictions are trying to suppress. Indeed, the orthogonal effects of expectations and attention in predictive coding have been established empirically using fMRI (Kok et al. 2011). See Bastos et al. (2012) for a review of the anatomical and electrophysiological evidence that is consistent with the scheme used here.

### 7.3 Perspectives

Notwithstanding the limitations above, this approach may provide some interesting perspectives on neural computations in the oculomotor system. First, the model presented here can be compared to existing models of the oculomotor system. In particular, any commonalities of function suggest that extant neuromimetic models may be plausibly implemented using a generic predictive coding architecture. Second, the Bayes optimal control solution rests on a computational (anatomical) architecture that can be informed by electrophysiological or psychophysical studies. For example, we have considered only delays at the motor and sensory level. However, delays in axonal conduction between hierarchical levels—within the visual–oculomotor system—may have implications for intrinsic and extrinsic connectivity: in visual search, predictions generated in higher areas (say supplementary and frontal eye fields) may exploit a shorter path, by stimulating the actuator to sample more information (by making an eye movement) rather than accumulating evidence by explaining away prediction errors in lower (striate and extrastriate) cortical levels (Masson et al. 2010). By studying the structure of connections implied by theoretical considerations (see Fig. 3), our modelling approach could provide a formal framework to test these sorts of hypotheses. A complementary approach would be to apply dynamic causal modelling (Friston et al. 2003) to electrophysiological data, using predictive coding architectures, such that transmission delays (and their compensation or modelling) among levels of the visual–oculomotor system could be evaluated empirically. A recent example of using dynamic causal modelling to test hypotheses based upon predictive coding architectures can be found in Brown and Friston (2012). This example focuses on attentional gain control in visual hierarchies.

Second, this work may provide a new perspective for experiments, in particular for the generation of stimuli. We have previously considered such a line of research by designing naturalistic, texture-like pseudo-random visual stimuli to characterise spatial integration during visual motion detection (Leon et al. 2012). We were able to show that the oculomotor system exhibits an increased following gain, when stimuli have a broad spatial frequency bandwidth. Interestingly, the velocities of these stimuli were harder to discriminate relative to narrow bandwidth stimuli—in a two alternative forced-choice psychophysical task (Simoncini et al. 2012). In this work, the authors used competitive dynamics based on divisive normalisation. Moreover, textured stimuli were based on a simple forward model of motion detection (Leon et al. 2012). This may call for the use of more complex generative models to generate such textures. In addition, the use of gaze contingent eye-tracking systems allows real-time manipulation of the configuration (position, velocity, delays) of the stimulus, with respect to eye position and motion. By targeting different sources of uncertainty, at the different levels of the hierarchical model, one might be able to get a better characterisation of the oculomotor system.

The confounding influence of delays inherent in neuronal processing is a strong biophysical constraint on neuronal dynamics. Representations in generalised coordinates of motion provide a potential resolution that may have enjoyed positive evolutionary pressure. However, it remains unclear how neural information, represented in a distributed manner across the nervous system, is integrated with exteroceptive, operational time. The “binding” of different information, without a central clock, seems essential, but the correlate of such a temporal representation of sensory information (independent of delays) has never been observed explicitly in the nervous system. Elucidating the neural representation of temporal information would greatly enhance our understanding of both neural computations themselves and our interpretation of measured electromagnetic (EEG and MEG) signals that are tightly coupled to those computations.

We have a made a slight approximation here because \(T(\tau _a) \tilde{\mu }(t-\varvec{\tau }_a) = T(\tau _a-\varvec{\tau }_a) \tilde{\mu }(t)\) when, and only when, the free energy gradients are zero and \(\dot{\tilde{\mu }}(t) = \fancyscript{D} \tilde{\mu }(t)\). Under the assumption that the perceptual destruction of these gradients is fast, in relation to action, this can be regarded as an adiabatic approximation.

## Acknowledgments

LuP was supported by EC IP project FP6-015879, “FACETS” and FP7-269921, “BrainScaleS” and by the Wellcome Trust Centre for Neuroimaging. RAA and KJF are supported by the Wellcome Trust Centre for Neuroimaging.

## Copyright information

**Open Access**This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.