1 1 Introduction

Human perceptual skills rely on the activity of thousands of neurons within sensory cortex. The activity of each of these neurons contributes to the perception of sensory information in our environment. Exactly how the pattern of neuronal activities are combined to inform perceptual decisions is one of the central problems in theoretical and system neuroscience (Dayan and Abbott 2001). Solutions to this problem generally need to address (1) how sensory information is encoded in the pattern of activated neurons, and (2) how this information is read out to ultimately inform a perceptual decision. Here we propose a new theoretical framework providing insights into both aspects of neuronal decoding and apply it to perceptual performance in a fine discrimination task.

Neurons convey information about sensory inputs by producing graded responses to a continuously varying stimulus feature such as the direction of visual motion. The response profile of a neuron is typically characterized by its tuning function. For visual motion processing, tuning functions are well described by a bell-shaped Gaussian profile indicating that a motion sensitive neuron responds with a peak amplitude only for a narrow range of motion directions, and that response strength levels off for directions of motion offset from the preferred direction (Britten 2003; Born and Bradley 2005). The peak region and width parameter provide a complete description of neuron’s response to different directions of motion.

Based on these tuning characteristics, various theoretical approaches have been proposed to decode neuronal population activity in order to predict the sensory stimulus evoking particular neuronal responses and to infer the perceptual decision about the input (Fig. 1). However, it is far from clear how the neuronal response characteristics are used by the brain to extract the perceptually relevant information. Most commonly used approaches rely on vector averaging of neuronal responses, maximum-likelihood or Bayesian probability estimation (Paradiso 1988; Vogels 1990; Seung and Sompolinsky 1993; Kim and Wilson 1997; Zemel et al. 1998; Pouget et al. 2000; Jazayeri and Movshon 2006). In population vector models the relevant sensory information is derived directly from the neuronal firing rate, which is considered as the “weight” of a neuron in a global population vector of firing rates (Vogels 1990; Seung and Sompolinsky 1993). In contrast, likelihood based models rely on the calculation of the probability that a particular response is observed in response to an input stimulus and Bayesian approaches compute the probability that an input was presented given the observed response. Individual neurons contribute by means of their probability function of observing a neuronal response to a given stimulus (Seung and Sompolinsky 1993; Zemel et al. 1998; Pouget et al. 2000; Jazayeri and Movshon 2006).

Fig. 1
figure 1

a Illustration of the visual motion stimulus used for the center-surround paradigm in the experiment and model. b Theoretical psychometric function of a subject for discriminating the direction of motion of the center target from the vertical (upward) reference direction. It represents the proportion of “rightward” answers as a function of the center motion direction, and helps to visualize the perceived vertical reference direction (midpoint) as well as the discrimination threshold for reliably (above p=0.84) seeing a deviation from the perceived reference. Theoretical models need to predict the perceived value of the stimulus. The discrimination threshold is the error on the perceived value

However, it is also possible to calculate information conveyed by a neuronal population about the sensory input explicitly as an information profile representing directly the statistical knowledge of the input. It can be represented through a measure known as Fisher information (FI) (Fisher 1925; Paradiso 1988; Kass et al. 2005) that is used for arriving at an optimal estimate of available sensory information. The interest of using the FI measure is that its definition is related to the underlying variance of the estimate (Fisher 1925; Paradiso 1988) and therefore to the error on the predicted stimulus (the discrimination threshold, e.g. Paradiso 1988). In other words, extracting FI for a given neuronal system provides a representation of the minimum variance estimate within the neuronal population. The FI encoded by neurons is directly related to their tuning functions. The most informative part of the tuning function for estimating the sensory stimulus is a region offset from the peak, i.e. the neurons most sensitive to the stimulus are not the ones with the highest firing rate (Paradiso 1988; Seung and Sompolinsky 1993; Dayan and Abbott 2001). However, FI definition implies that it should allow estimating the identity of the sensory stimulus that has evoked a neuronal population response. In human perceptual terms, decoding of FI from the neuronal population should allow to predict which stimulus is actually perceived.

In the following we calculate information profiles of theoretical neuronal population activity during a motion discrimination task. We show that actual perceptual decisions directly correspond to the stimulus value that leads to the same information content for the different alternative motion directions, which need to be discriminated. The proposed model is based on the assumptions (1) that behaviorally relevant information in neuronal population responses is represented as FI and (2) that decoding this information can be achieved by a simple equalization procedure. The model is used to predict perceived stimulus values and it is compared to the prediction of standard decoding methods considered in the literature (population vector average, maximum-likelihood and maximum a posteriori inferences). All models provide testable predictions of the neuronal tuning parameters underlying the perceptual performance. Additionally, the variation of discrimination sensitivity is also explained through the modulation of the total Fisher information content (e.g. Paradiso 1988).

We apply the models to perceptual performance during motion discrimination of a foveally presented target stimulus surrounded by task irrelevant motion of varying directions across trials (Fig. 1). The paradigm is known to give rise to motion repulsion reflecting a systematic overestimation of the angular difference of two motion directions (Marshak and Sekuler 1979; Kim and Wilson 1997). In particular, subjects misperceive the physical direction of the target motion when the task irrelevant surround moves at directions around 30-60° away from the target direction (Kim and Wilson 1997). Motion repulsion is often explained by inhibitory interactions between motion-tuned neurons (Marshak and Sekuler 1979; Allman et al. 1985; Born and Bradley 2005), and in particular to neurons in area MT/V5 tuned to visual unidirectional motion (Britten 2003; Born and Bradley 2005). In the following we utilize these tuning properties to propose a novel approach to decode the information content of neuronal population responses during a motion repulsion paradigm, and we compare its prediction power to standard decoding schemes.

2 2 Modeling human perception

In the following we describe the theoretical background of the various decoding methods applied to the psychophysical results. We first provide in detail the rationale underlying our novel approach to decode perceptual decisions from neuronal information profiles. Secondly, we summarize the models based on population vector average, maximum-likelihood and Bayesian maximum-a-posteriori inference for performing the decision process. We use the term standard models for these last three models because they are commonly considered in the literature and we demonstrate that in the experimental paradigm considered here they give rise to identical estimates for the value of the stimulus despite different theoretical origins.

2.1 2.1 Theoretical background

All decoding models are based on theoretical and functional hypotheses. Here, we assume that the decision process is based on the response of those neurons tuned to the main stimulus feature—here the direction of motion, and these neuronal responses are used to decode the visual input with a given theoretical decoding scheme. Furthermore, we assume a continuous population of direction of motion sensitive neurons based on the insights from cortical area MT/V5 (see Fig. 2a). This visual area contains a hypercolumn representation of motion direction at each location of the visual field and the neuronal activity is modulated by surround-to-center spatial interactions (Britten 2003; Born and Bradley 2005). Importantly, the neurons’ response to various directions of visual motion follows a characteristic unimodal bell-shaped tuning function that is well represented by a Gaussian function written as:

$${r_i}({\theta _0}) = {A_i}\exp \left( { - {{{{({\theta _0} - {\theta _i})}^2}} \over {2\sigma _i^2}}} \right)$$
(1)

. In this equation r i (θ 0) is the mean firing rate of the neurons with preferred direction of motion θ i in response to a motion direction θ 0, σ i is the standard deviation of the tuning curve, and A i is the mean maximum firing rate of the neurons tuned to θ i . Therefore, for a single presented stimulus the hypercolumn population response follows the same Gaussian function (Fig. 2b, thick solid line) (Pouget et al. 2000). To simplify the subsequent calculations, we assumed that all neurons could be modelled by a characteristic tuning curve with amplitude A 0 and tuning width σ 0 and that the responses of neurons follow an independent Poisson process (cf. Dayan and Abbott 2001; Jin et al. 2005).

Fig. 2
figure 2

Illustration of population activity and Fisher information (FI) within a theoretical neuronal population and its hypothetical modulation by surround motion. a Schematic of a population of neurons sensitive to all directions of motion and arranged in a hypercolumn of motion sensitive neurons in macaque cortical area MT/V5. b The corresponding theoretical population response function (thick line) based on neurons with all possible preferred directions of motion in response to a zero degree motion direction. Error bars denote the Poisson variability of the firing rate of the neurons. The thin line represents the FI information profile of the neuronal population. c Modulation of neuronal tuning curve amplitudes induced by a second motion stimulus in the surround moving +40° away from zero degree. The suppressive influence on the amplitude is shown in thick grey (Eq. 2). d The corresponding population activity and FI profile modulated by the presence of the surround (thick and thin line, respectively). The FI estimate is shown as vertical white line and the Standard models estimate as vertical black line (A i0 = 0.8, σ = 30). Note that both models provide different estimates (the x-position of the vertical lines) given identical population characteristics

We were particularly interested in the modulation of perceived motion directions of the center stimulus in a center surround stimulus configuration with the surround being task-irrelevant (Fig. 1a).Motion in the surround does not activate directly the population of neurons within the directional hypercolumn responding to the central target, but indirectly modulates the strength of the neuronal response (Allman et al. 1985; Born and Bradley 2005). According to neurophysiological evidence (Allman et al. 1985), the surround motion (subscript i 0) is modelled to reduce the maximal response amplitude of neurons responding to the target motion. The response modulation varies as a function of the difference between the surround motion directions (\(\theta_{i_0}\)) and the neuron’s preferred direction (θ i ), with the surround assumed to have a multiplicative effect on neuronal responses to the target motion when they have similar preferred directions (no effect from opponent motion directions is considered, see Appendix A). We capture this amplitude modulation effect with a Gaussian function centred at \(\theta_{i_0}\) and a standard deviation \(\sigma _{i_0}\), written as:

$${A_i}({\theta _i},{\theta _{{{\rm{i}}_0}}}) = {A_0}\left( {{\rm{1 - }}{A_{{i_{\rm{0}}}}}\exp \left( { - {{{{({\theta _i} - {\theta _{{i_{\rm{0}}}}})}^2}} \over {2\sigma _{{i_{\rm{0}}}}^2}}} \right)} \right)$$
(2)

, with \(A_{i_0}\) corresponding to the maximum amplitude inhibition. Figure 2c illustrates an amplitude suppressive effect of a second irrelevant motion on the population tuning curves.

In deriving most models, we need to define the probability distribution P(R|θ 0) of observing a pattern of activity R = (x 1, x 2x n ) in the hypercolumn population given the presence of the stimulus θ 0. Based on the independent Poisson process for neuronal firing rates and assuming a reading time T of the neuronal response giving individual firing rates x i in the trial, the probability can be written as (e.g. Dayan and Abbott 2001):

$$P(R{\rm{|}}{\theta _{\rm{0}}}) = \prod\limits_{i = 1}^n {P({x_i}|{\theta _0})} $$
(3)
$$ = \prod\limits_{i = 1}^n {{{{{({r_i}({\theta _0})T)}^{{x_i}T}}} \over {({x_i}T)!}}\exp ( - {r_i}({\theta _0})T)} $$
(4)

. This equation represents the likelihood function for observing a population response R for the presented stimulus θ 0 with known mean tuning characteristics r i (θ 0) (here x i T represents the number of spikes fired by neuron i during the reading time of the neuronal population).

2.2 2.2 Fisher information decoding method

To calculate information profiles of a theoretical neuronal population we derive Fisher information (FI) across the whole neuronal population (as in Paradiso 1988; Seung and Sompolinsky 1993; Dayan and Abbott 2001) (cf. Fig. 2). When the neuronal population is activated by a single stimulus with motion in one direction (Fig. 2b), the total FI represented by the neuronal population response is reflected in the mean curvature of the log-likelihood function (Fisher 1925; Paradiso 1988; Dayan and Abbott 2001; Kass et al. 2005). It is defined as:

$${I_F}({\theta _0}) = \left\langle { - {{{\partial ^2}} \over {\partial \theta _0^2}}\ln (P(R|{\theta _0}))} \right\rangle $$
(5)

. For neuronal responses based on Poisson-like noise processes, it can be shown that the FI about a single presented stimulus conveyed by neurons having the same preferred stimulus θ i is proportional to the square of the tuning curve derivative divided by the tuning curve itself (Paradiso 1988; Seung and Sompolinsky 1993; Dayan and Abbott 2001):

$${I_F}({\theta _i},{\theta _0}) = {(({\rm{d}}/{\rm{d}}{\theta _0}){r_i}({\theta _0}))^2}/{r_i}({\theta _0})$$
(6)

. The above total FI (Eq. 5) is the sum of the FI across the neuronal population with different preferred values.

Neuronal tuning functions for the direction of motion in area MT/V5 are known to followa Gaussian function (Eq. 1). Thus, the mean FI carried by the neurons with a preferred direction θ i about the stimulus θ 0 is:

$${I_F}({\theta _i},{\theta _0}) = {{{A_0}} \over {\sigma _0^2}}\left( {{{{{({\theta _0} - {\theta _i})}^2}} \over {{\sigma _0}}}} \right)\exp \left( { - {{{{({\theta _0} - {\theta _i})}^2}} \over {2\sigma _0^2}}} \right)$$
(7)

. The neurons that provide the most FI about a given direction of motion are thosewith preferred directions at \(\sigma \sqrt 2 \) from the stimulus direction, i.e. where small changes of motion direction results in the strongest differences in neuronal response compared to the underlying variability of the firing rate (see Fig. 2b) (Dayan and Abbott 2001).

We were particularly interested in the modulation of neuronal information profiles induced by a second task-irrelevant motion direction in the spatial surround of the target motion (cf. Fig. 1a). Motion in the surround does not activate directly the population of neurons within the directional hypercolumn responding to the central target, but indirectly modulates the strength of the neuronal population response (Allman et al. 1985; Born and Bradley 2005). As a consequence, this spatial segregation leads to a single FI term for the hypercolumn model. Figure 2cd illustrate the suppressive effect of a second irrelevant motion on the population tuning curves together with the modulation of the population response and its information content.

We model the subjectively estimated perceived motion direction as an information equalization process. It is obtained by assuming that human subjects perceive the stimulus value that lead to the same amount of information present for either alternative. It can be written in different forms (ratio, squared or absolute value of difference), and here we use the mathematical expression:

$$G(\theta ) = \left| {\int\limits_{\theta - \pi }^\theta {{I_F}({\theta _i},{\theta _0}){\rm{d}}{\theta _i} - \int\limits_\theta ^{\theta + \pi } {{I_F}({\theta _i},{\theta _0}){\rm{d}}{\theta _i}} } } \right|$$
(8)
$${\theta _{{\rm{FI}}}} = \rm{argmin} G(\theta )$$
(9)

G(θ) represents the difference in information content conveyed by the neuronal populations “left” and “right” of the reference value θ.

For example, when a single direction of motion is presented to the neuronal hypercolumn population without a surround motion (Fig. 2b), it is clear that the amount of information for directions of motion leftward and rightward from the exact physical direction of motion is equal (cf. dark and light grey areas in Fig. 2b). When a second stimulus is presented in the surround and changes the response strength of a part of the neuronal population (Fig. 2c), the equalization point is moved further away from the surround direction (white vertical line in Fig. 2d).

The FI computation can be implemented by postsynaptic neurons in a higher-level neuronal population coding for direction of motion. For example, calculation of the individual FI terms require neurons to compute the square of the tuning curve derivative divided by the tuning curve (see Eq. 6). In neuronal terms it involves basic computations: subtracting the responses of neurons with adjacent stimulus preference and squaring ((r i+1r i−1)2)), and dividing the outcome by the neuronal response (r i ). These calculations could easily be implemented in dendritic trees equivalent to two-layer neural networks (London and Häusser 2005). As a final step, the read-out of this FI population representation is performed through Eq. 9.

2.3 2.3 Standard models of decoding

2.3.1 2.3.1 Population vector average

The population vector average model assumes that the final percept θ perc is obtained by performing a weighted-average across firing rates of neurons’ with preferred direction θ i (e.g. Dayan and Abbott 2001). It is written as:

$${\theta _{{\rm{PV}}}} = {{\int {{r_i}({\theta _0}) \times {\theta _i}{\rm{d}}{\theta _i}} } \over {\int {{r_i}({\theta _0}){\rm{d}}{\theta _i}} }}$$
(10)

2.3.2 2.3.2 Maximum-Likelihood estimate

Themaximum likelihood estimate of the input is obtained by computing the log-likelihood function L(θ) = ln(P(R|θ)) for the observed firing pattern R in the neuronal population encoding the direction of motion. The likelihood function for different motion directions θ can be represented in a higher level neuronal population (e.g. Jazayeri and Movshon 2006) and therefore could allow to extract the stimulus value that maximizes this function, i.e. the most probable input value. The final perceived value is mathematically tractable, and gives:

$${\theta _{{\rm{ML}}}} = {{\int {{r_i}({\theta _0}) \times {\theta _i}{\rm{d}}{\theta _i}} } \over {\int {{r_i}({\theta _0}){\rm{d}}{\theta _i}} }}$$
(11)

2.3.3 2.3.3 Maximum-a-posteriori estimate

The Bayes rule relates the conditional probability that a stimulus was presented given the neuronal response to the global probabilities of neuronal responses and stimulus presentations and their conditional relation. It is written as:

$$P(\theta |R) = P(R|\theta )P(\theta )/P(R)$$
(12)

where P(θ|R) is the conditional probability that stimulus θ was presented given the observed activity pattern R across the neuronal population, P(R|θ) is the conditional probability of observing a pattern of activity R given the presence of stimulus θ, and P(θ) and P(R) are respectively the probabilities of stimulus θ being presented and observing response R (e.g. Chap. 3, “Neural decoding”, in Dayan and Abbott 2001).

By using the theoretical hypotheses (Sect. 2.1) plus the assumption that the probability of a stimulus θ being presented is unimodal and can be represented with a Gaussian probability density function centred on θ prior and width σ prior, one obtains the maximum-a-posteriori estimate θ MAP as (cf. Dayan and Abbott 2001):

$${\theta _{{\rm{MAP}}}} = {{T\int {{r_i}(} {\theta _0})\theta i/\sigma _0^2{\rm{d}}{\theta _i} + {\theta _{{\rm{prior}}}}/\sigma _{{\rm{prior}}}^2} \over {T\int {{r_i}({\theta _0})/\sigma _0^2{\rm{d}}{\theta _i} + 1/\sigma _{{\rm{prior}}}^2} }}$$
(13)

If the prior distribution is centred on the MAP estimate, i.e. θ MAP = θ prior , which is equivalent to saying that the prior distribution of the stimulus is centred on the mean perceived value by the subject, it can be easily shown that Eq. 13 becomes:

$${\theta _{{\rm{MAP}}}} = {{\int {{r_i}(} {\theta _0}) \times {\theta _i}{\rm{d}}{\theta _i}} \over {\int {{r_i}({\theta _0}){\rm{d}}{\theta _i}} }}$$
(14)

This demonstrates an interesting result for the condition described above, that the maximum-a-posteriori estimate is independent of the prior width. In Sect. 4, we show that the staircase psychophysical method used in our experimental design entails exactly this property, that is, the global a priori probability distribution of the target direction of motion is centred on the mean perceived value by the subject. Therefore, a direct comparison of the MAP estimate to the human perceptual data is possible.

2.3.4 2.3.4 Decision process

All these standard decoding processes have the same mathematical description under different theoretical assumptions. The PV model is based on the assumption that each neuron codes its preferred direction of motion and contributes to the perceived value with a weight equal to its firing rate. The ML estimate is based on the assumption that a higher-level neuronal population computes the different probabilities of observing the activity profile in MT/V5 for different possible input values and the decision process simply reads-out the stimulus value with the highest probability (e.g. Jazayeri and Movshon 2006). The MAP estimate could be implemented by assuming that a higher-level neuronal population calculates the a posteriori distribution based on knowledge of both stimulus distribution across the experiment and firing rate distribution of the neuronal population (e.g. Zemel et al. 1998).

As demonstrated in the model derivations these three decision processes provide the same prediction of the stimulus value, and therefore the exact modulation of the perceived value follows the same behaviour in the three cases. When a single stimulus is presented to the hypercolumn population without a surround, the firing rate weighted-average of the preferred directions is representing the stimulus estimate. It is the exact presented stimulus value (Fig. 2b, peak of the Gaussian population response). When a second stimulus is presented in the surround and changes the response amplitude of a part of the neuronal population, the estimate is moved further away from the surround direction, consistent with the psychophysically observed repulsion effect (Fig. 2d, black vertical line).

2.4 2.4 Discrimination thresholds: error on the perceived value of a stimulus

In addition to the measured perceived direction of motion we can infer the discrimination threshold, i.e. the amount of direction change required for a subject to report the correct direction difference with respect to his/her perceived reference (e.g. Paradiso 1988; Dayan and Abbott 2001). We can estimate a lower bound on the discrimination thresholds through the total Fisher information representing the best decoding error one can achieve with an unbiased decoding algorithm. The squared discrimination threshold is inversely proportional to the total FI present in the neuronal population, and it is written as (Cramer-Rao bound, cf. Dayan and Abbott 2001):

$${({\sigma _{\exp }})^2} \ge {\left( {1 + {{\rm{d}} \over {{\rm{d}}{\theta _0}}}{\theta _{{\rm<Emphasis Type="BoldItalic">as</Emphasis>}}}} \right)^2}/{I_F}({\theta _0},{\theta _{{i_0}}})$$
(15)

In the described center-surround paradigm, the mathematical computations for the total FI contained in the neuronal population activity are practicable (integral over θ i of Eq. 7), and lead to the following expression:

$${I_F}({\theta _0},{\theta _{{i_0}}}) = I_F^0\left[ {1 - {{{A_{{i_0}}}} \over {{{(1 + k)}^{3/2}}}}(1 + 2{Y^2}) \times \exp ( - {Y^2}/{k^2})} \right]$$
(16)

by replacing:

$$Y = {k^2}({\theta _0} - {\theta _{{i_0}}})/\sqrt {1 + {k^2}} \sqrt 2 {\sigma _0})$$
(17)
$$k = {\sigma _0}/{\sigma _{{i_0}}}$$
(18)

Equation 16 represents the total FI present in the neuronal hypercolumn population about the reference direction θ 0 given the presence of the modulatory surround direction \(\theta_{i_0}\). Here, I 0 F is the FI contained in the total population activity when no surround direction of motion \(\theta_{i_0}\) is present (cf. Dayan and Abbott 2001). Thus, it is also possible to predict the exact variation of the discrimination thresholds for the central target motion with different surround motion directions.

2.5 2.5 Predictions of human perceptual parameters modulation

The described decision processes predict a systematic variation of the perceived upward reference direction of motion as a function of the direction of motion in the spatial surround (Fig. 3a). It is important to note that for a given theoretical population characteristics (σ 0, \(\sigma _{i_0}\) and \(A_{i_0}\) fixed) the repulsion amplitude of the models are substantially different (see also Fig. 2d). This is due to the high sensitivity of the FI decoding method to both tuning curve parameters: width and amplitude of firing rate, whereas the Standard models need strong amplitude modulation for creating a repulsion effect. As a consequence, for predicting similar repulsion amplitudes the models need very different population characteristics. The modulation of the total FI content predicts variation of the discrimination threshold of the target stimulus as a function of surround motion direction (Fig. 3b). We assumed that k = 1, i.e. σ 0 = \(\sigma _{i_0}\) and optimal decoding for the total variance estimate (these assumptions are taken throughout the paper).

Fig. 3
figure 3

a Illustration of the prediction of the perceived reference direction (0°) for the FI (thick black line) and the Standard models (thin grey line) as a function of the distracting surround motion direction (different y-scales for the two curves). b Elevation of the discrimination thresholds as a function of surround motion direction (ratio between discrimination threshold for the center-with-surround stimulus and the center-without-surround stimulus). An elevation of 1.0 indicates that the discr. threshold center-with-surround is not different from center-without-surround’s discr. thresholdwhen adding a surround with the corresponding motion direction (x-axis)

3 3 Psychophysics of center-surround motion discrimination

3.1 3.1 Methods

3.1.1 3.1.1 Subjects

We collected data from 20 subjects (including the first author) for fitting and extracting model parameters. Subjects had normal, or corrected to normal, vision and gave written consent for participating in the experiment (mean age: 27.6 ± 5.6 years).

3.1.2 3.1.2 Apparatus and stimuli

The experiment was conducted on a 21-inch CRT monitor at a refresh rate of 85 Hz and a resolution of 40 pixels per degree of visual angle, controlled by a MacIntosh G4 computer. Stimuli were random dot patterns (RDP) presented at the center of a white screen (luminance: 80.2 cd/m2). RDPs contained 8 dots per square degree with each dot extending 4 pixels and set to the lowest luminance resulting in a contrast of 22.6 cd/m2 (the contrast of an RDP is defined as \(C = \sum {{p_i}{{({L_i} - {L_0})}^2}} \); the sum is over all dots present in the RDP; p i is the proportion of surface in the RDP of dot i; L i is its luminance; L 0 is the luminance of the background). Dots moved through a circular or annular aperture at a speed of 8°/s in an unidirectional translational motion for the center stimulus and the surround in the test condition. For the noise-control condition, the surround contained dots moving at 8°/s in random directions, with each dot being assigned a fixed direction of motion until it disappeared from the screen. Upward motion was defined as zero degree with “leftward” motion directions as negative values. The target RDP had a radius of 1.5°. In the target-alone control condition, its allowed directions of motion were in the ±30° range sampled in one degree steps. In the other two experimental conditions the range of the target RDP directions was ±29° sampled every 2°, and the target RDP was surrounded by an annular aperture (inner/outer radius: 1.5°/4.5°) (see Fig. 1a). For the test condition, the surround motion was 100% coherent in one of the 36 directions of motion spanning the whole 360°. The variation allowed us to test the influence of different angles between target and surround motion on the perceived upward reference direction of the central target RDP, i.e. the motion repulsion effect.

3.1.3 3.1.3 Procedure

Subjects were seated in a dimly lit room 57 cm in front of the monitor. A chin rest was used to stabilize the head. They were instructed to fixate a small dark square centered on the screen and to attend the central target stimulus, the surround being task irrelevant. They started each trial by pressing the space bar, and 353ms after the offset of the fixation square the stimulus was presented for 117ms (10 frames) at the center of the screen. Subjects had to report if the direction of motion of the small central RDP (the target) was to the left or right relative to his/her internal reference direction of upward motion by pressing either the “4” or “6” keys on the computer keyboard.

Two control and one test condition were presented to each subject. In the control conditions the target stimulus was presented either alone or with random motion in the surround. Each of these two conditions had a total of 120 trials. In the test condition, the target was shown together with the surround containing 100% coherent motion. The test condition was split in two experimental blocks, one containing 18 directions of motion of the surround (from −170 to +170° with steps of 20°), the second the remaining directions of motion (from −160 to +180° with steps of 20°). The test design contained a small number of total presentations of each annular stimulus direction (30 times) combined with a short presentation of the stimuli to the subject at each trial (117 ms).

For measuring the subject’s response curve to the direction of motion of the central RDP, a weighted up-down staircase procedure for stimulus presentation was used (Kaernbach 1991). The theoretical convergence hit rate of a given staircase algorithm was either 75 or 25%, corresponding to step sizes Up-Down along the psychometric function of 3/1 and 1/3 respectively. For example, in the test condition using the convergence point of 75%, the step was 3×2° to the left when “right” responses were given, and 1×2° to the right when a “left” response was present. In the test and noisecontrol conditions, each staircase run started at ±21° deviation from zero degrees upward motion, at the opposite side of the convergence point when compared to the midpoint. Such a staircase run for a 75% convergence point can be visualized in Fig. 4a where the run starts at +21° and the algorithm converges rapidly to the stochastic region. In addition, the convergence points of 25 and 75% were equally distributed through the different staircases within a given experimental block in order to avoid biases due to unequal number of “left” and “right” answers. In the control conditions, four independent staircase algorithms were interleaved in order to avoid that subjects learned the experimental procedure (Cornsweet 1962). Within one test block, the eighteen staircases, corresponding respectively to each motion direction of the surround, were presented in a pseudo-random order on a trial-by-trial basis (Bonnet 1986). This allowed avoiding any building-up of information (learning or adaptation) due to the shortness of the presentation (117 ms), the randomness of surround directions from trial-to-trial, and the small number of total trials (30) per surround direction. This global design prevented potential problems coming from asymmetric sampling of motion directions (Rauber and Treue 1999), adaptation effects, eye tracking (Rauber and Treue 1999), or stimulus statistics (Mahani et al. 2005) as explanation of motion repulsion.

Fig. 4
figure 4

a Example of a staircase run which started by presenting a target direction of 21° and “walks” over up to the 75% convergence point where it has a typical random walk (condition with surround motion of −70°). b The corresponding pooled responses (dots), i.e. proportions of “leftward” answers as a function of target direction of motion together with the fitted logistic function (solid curve; a = −5.93, b = −0.44). The number of trials at each visited target motion direction are shown above/below each datum. This example shows the repulsion effect on the psychometric function, with the midpoint shifted closer to the surround, such that the subject responded to the physical vertical direction (0°) to be further away from the surround. The psychometric function allowed to extract the discrimination threshold as illustrated in Fig. 1b. c Histogram of all target motion directions presented across subjects. The distribution peaks around 0° (mean: −0.073; SE: 0.054; n = 21, 600), demonstrating that globally the staircase method presented a mean upward target. Therefore the mathematical condition for applying the MAP estimate is met (see text for details)

3.1.4 3.1.4 Raw data extraction

We extracted the staircase runs and obtained the response curves in the control conditions, and for each surround motion direction in the test condition. The response curve represents the number of “left” answers of the subject as a function of each motion direction of the target RDP. Figure 4a,b presents a staircase run and the pooled “left” responses at each target direction of motion. Using the maximum likelihood method and the simplex algorithm (Press et al. 1997), each response curve was fitted with a logistic model of the form:

$$p(x) = {1 \over {1 + \exp ( - b(x - a))}}$$
(19)

, where x is the direction of motion of the target RDP, p(x) is the corresponding hit rate, a corresponds to the midpoint of the function and b is related to its steepness. Thus, a reflects the internal reference direction of upward motion, i.e. the direction for which subjects are equally likely to give a “left” or “right” response. Parameter b allows to compute the discrimination threshold defined as σ exp = x p=0.84x p=0.50 = (1/b)log(21/4). It represents the amount of direction deviation in degrees that allows the subject to discriminate between the target direction relative to his/her internal zero degrees upward reference in 84% of the trials (for an example, see Fig. 4a,b). By analyzing the raw results, in one case out of the 720 experimental values (36 surround directions × 20 subjects) the fitted psychometric function was flat with correspondingly very high discrimination threshold (>100°). This psychometric function was not included in the data analysis.

Previous studies have demonstrated that the slope, and thus the discrimination threshold, is a biased and very noisy parameter (Leek et al. 1992; Vågerö and Sundberg 1999; Kaernbach 2001). Therefore, we conducted Mont Carlo simulations using the staircase procedure described above for assessing the biases and variability of both parameters in the experimental design. As reported previously, we observed that the slope parameter b has a strong bias and variability. Additionally, the simulations showed that the midpoint a has a small bias corresponding to less than5%of its total variance (see Sect. 4).

3.2 3.2 Results

In the experiments, subjects had to judge whether the direction of motion of a foveally presented stimulus deviated either rightward or leftward from upward motion (defined to be 0° and rightward deviations as positive) (see Fig. 1). Varying the direction of a task irrelevant motion in the spatial surround induced changes in the perceived upward motion direction in all 20 subjects (motion repulsion, cf. Figs. 4 and 5), complementing previous reports (Kim and Wilson 1997; Tzvetanov et al. 2006). Figure 5a shows the between-subjects mean value of the perceived vertical reference direction of motion across different surround motion directions. The figure plots the direction of motion of the center target that the subjects reported to perceive as moving upward. The physical direction of motion perceived as moving upward is closer to the surround motion direction, thus demonstrating the repulsion effect. Subjects misperceive the upward direction particularly for surround directions around ±30–60° from the true zero degrees upward direction. Interestingly, the average results showed a small bias induced by surround stimuli moving at ±130–150°, i.e. opposite to the direction that caused the strongest motion repulsion. This influence of motion opponency was evident in only a subset of the subjects and is integrated into the model in more detail in Appendix A.

Fig. 5
figure 5

Psychophysical results and model fit. a Average psychophysical repulsion curve (dots) obtained by computing the between-subjects mean perceived vertical upward reference direction of the target motion as a function of the direction of the surround motion. The model fits are shown as solid line: FI model in black, standard models in grey. Light grey data points represent deviations from the predicted motion repulsion (see Sect. 4). b Average normalized discrimination thresholds of the target motion (dots) and model fit (line) as a function of the surround motion direction. Error bars denote SE (n = 20)

Surround motion also modulated the threshold for discriminating the target motion direction (Fig. 5b) with elevated thresholds for surround directions around ±40–50° (test thresholds normalized to the noise-control condition; noise-control mean discrimination threshold of 2.7 ± 0.2°, SEM; n = 20). This is in the range of themaximum repulsion effect (Pearson correlation coefficient computed between the absolute value of perceived upward motion and discrimination thresholds: r = 0.29, p < 0.001).

4 4 Model fit to the data

4.1 4.1 Methods

4.1.1 4.1.1 Perceived value

The logistic fits to the psychophysical results provided the perceived vertical upward reference direction of motion and the threshold in the a and b parameters respectively. After subtracting the individual mean of the data for adjusting to zero, the variation of the a parameter with different modulatory directions of motion in the surround can be directly compared to the models.

The FI and standard models predictions were fit to the data by minimizing the mean squared error between the theoretically predicted perceived upward motion direction by the model and the physical true 0° upward direction (using the simplex algorithm, cf. Press et al. 1997). For a given set of theoretical model parameters, the perceived upward direction predicted by the model is obtained by the following procedure. The two values of (1) the measured physical direction of motion reported as upward (parameter a) and (2) the corresponding surround direction (θ surr) are introduced in Eqs. 12 and then the neuronal activity and information profiles computed. Models predictions were extracted through the standard models estimate and the equalization point of the FI profile (computing the cumulative of the population information profile and extracting the stimulus value giving half of the total information content). These calculations were performed by discretizing the theoretical hypercolumn population in steps of 0.01° (this method is computationally intensive). Each model has 2 parameters to be adjusted: σ 0, \(A_{i_0}\). Interestingly, we found that some individual subjects’ data showed clear attraction effects for opposite directions of motion that was only marginally evident in the group average. To account for this finding we extended the model to incorporate motion opponency, described in detail in Appendix A.

For fitting the MAP model to the data, we first tested for the assumption that the prior distribution of the target stimulus, which is the experimental distribution of the presented target motion direction, is centered on the perceived value. This condition needs to be met in order to have the MAP estimate independent of the target distribution width and therefore allowing a simple Bayesian prediction of human perceived value. Figure 4c presents for all trials and across subjects (21,600 trials) the distribution of presented target directions of motion. The mean is −0.073 ± 0.054 (SE, n = 21, 600) showing that the distribution of presented target directions is well centred on the real vertical direction across all trials and subjects, as expected due to the staircase algorithms used in the experiment. Therefore, the mathematical conditions for applying the MAP estimate (Eq. 14) are met.

4.1.2 4.1.2 Discrimination threshold

We also predicted changes in motion discrimination thresholds as a function of the direction of motion in the surround. We computed the ratio of thresholds in the test condition to those in the control condition. According to Eqs. 15 and 16 and using σ 0exp = 1/I 0 F (the discrimination threshold of the control condition), we can express the above mentioned ratio as:

$${{{\sigma _{\exp }}} \over {\sigma _{\exp }^0}} = {B_0} + \sqrt {I_F^0/{I_F}({\theta _0},{\theta _{{i_0}}})} $$
(20)

with B 0 reflecting a global constant shift of the curve. This model prediction was applied on the average normalized thresholds across subjects, normalized to the noise-control condition, and fit with the Levenberg-Marquardt algorithm (Press et al. 1997). Additionally, we considered optimal information extraction, i.e. the smallest attainable discrimination threshold by assuming equality of left and right sides in Eq. 15 and no bias of the curve. Equation 20 provides three parameter estimates: B 0, \(A_{i_0}\), σ 0. Using the second target-alone condition for normalization of threshold estimates gave nearly identical results (not shown).

4.1.3 4.1.3 Statistical tests

The between-subject mean repulsion data were compared to the mean fitted model with a standard χ 2 test for how good a model predicts the data points. In an initial fit of the mean repulsion curve and Discrimination thresholds variation obtained across subjects, wefound a statistical difference between models and data (see Appendix B). Therefore, we decided to restrict all model fits and tests to 32 out of the 36 data points where the repulsion effect is due to inhibitory interactions and not affected by yet unknown supplementary interactions (discarding surround angular deviations of ±10° and ±20°, see Fig. 5, grey data points).

It should be noted that any influence from the small bias of the midpoint estimate mentioned in Sect. 3 (“Raw data extraction”) on the main statistical outcome would decrease the match between the model and data. The bias is always in a direction away from the midpoint and toward the theoretical convergence point. As a consequence of the fixed experimental design, with half points slightly biased toward the 25% and half toward the 75% convergence points, the χ2 test would show less statistically reliable model prediction.

The discrimination threshold fit was assessed with an F-test between global mean residual variance of the data and model residual variance. This test allows estimating if the model predicts the data better than the global data mean.

4.2 4.2 Results

We evaluated the outlined models by fitting them to the psychophysical data. The resulting fits of the data pooled across subjects are shown in Fig. 5a (FI in black solid line; standard models in grey solid line) and reveals that the models captured the observed behavioural motion repulsion across varying surround motion directions (FI: χ 2 = 28.0, df = 30, p > 0.05; standard models: χ 2 = 20.2, df = 30, p > 0.05). The FI model parameters underlying the significant fit corresponded to an underlying tuning width of σ = 18.2° and surround modulation of the response strength to the target motion of 2.4% (\(A_{i_0}\) = 0.024). The parameter of the standard models showed σ = 30.1° of tuning width and a surround modulation of the response strength to the target motion of 50% (\(A_{i_0}\) = 0.50). Comparing the capability to predict the data by the FI and by the standard models showed no statistically different predictions (F(30,30)=1.38, p > 0.05).

In addition to changes in the perceived upward motion we also fit the variation of the discrimination threshold as a function of the surround motion direction. Despite the inherent variability of discrimination thresholds the model reproduces the basic shape of the surround influence on human sensitivity to motion of the center stimulus (Fig. 5b; comparison to the global data mean: F(31, 29) = 2.48, p < 0.01; parameter estimates: σ = 25.3, \(A_{i_0}\) = 1.34, B 0 = 0.14).

5 5 Discussion

This study delineated a novel approach to decode sensory information from neuronal population responses based on the extraction of local Fisher information (FI) representing the underlying variance estimate of the sensory input. We applied the model to a theoretical hypercolumn population of area MT/V5 tuned to direction of motion, and predicted for a motion discrimination paradigm the global perceptual misjudgements of the center motion direction when flanked with surround motion. For comparison, we applied more commonly used standard models (Population vector average, maximum-likelihood, and maximum-a-posteriori (Bayesian) inference) to the same data and found that these models showed similar statistical prediction power for the global perceived direction of motion. However, the standard approaches and the FI decoding algorithm provided clearly divergent estimates of the tuning parameters of the neuronal population that could underlie the psychophysically observed effect. In addition, our approach demonstrated that the same surround modulation of the population response also gave rise to a systematic change in the sensitivity to motion direction.

Our psychophysical results revealed a systematic change of the perceived motion direction and discrimination thresholds of a target stimulus due to the presence of a task irrelevant motion signal in the spatial surround. Subjects misperceived the vertical upward motion maximally by on average 6° for surround direction of motion deviating by about ±40° from the upward reference direction. The magnitude and direction of this effect agrees with previous reports of motion repulsion (Hiris and Blake 1996; Kim and Wilson 1997; Rauber and Treue 1999; Tzvetanov et al. 2006). Moreover, the average thresholds of human subjects for discriminating a target motion varied in accordance with the repulsion effect. Elevations of discrimination thresholds were strongest when the angle between center and surround motion was ±40–50° (similar to results of a previous study which used an adaptation paradigm, Hol and Treue 2001).

The psychophysically observed effects could be predicted by decoding the neuronal population response as information profile. The motion direction that was subjectively perceived was calculated as the value that lead to the same information content for either behavioural alternative (rightward vs. leftward direction of motion) across the whole neuronal population. This finding together with the representation of Fisher information within the neuronal population (Paradiso 1988; Seung and Sompolinsky 1993; Dayan and Abbott 2001) suggests that human responses can be conceived of as reading information explicitly from the neuronal response profiles. This finding extends the more standard views on neuronal decoding, which also accounted for the observed effects. However, FI and standard models clearly differed with respect to the neurophysiological population parameters that underlie motion discrimination in the center-surround configuration: compared to the standard models the FI model predicted far narrower neuronal tuning width (about 40% smaller) and required only a fraction of the surround amplitude modulation to account for the perceptual results.

The discrimination threshold variations with surround motion directions were well predicted based on modulation of the total FI content. This finding parallels an earlier report from the orientation domain (Paradiso 1988) that demonstrated how the experimental modulation of orientation sensitivity to a target line by a superimposed irrelevant line segment (Westheimer et al. 1976) could be well predicted by changes in the total FI in a theoretical neuronal population tuned to orientation. We applied these insights to the center-surround configuration in a fine motion discrimination task that allowed simplified mathematical computations.

The combined computational modelling of both human perceptual parameters, that is, the exact perceived value as well as the sensitivity to this value, demonstrates that a theoretical framework is available for fully predicting human perceptual decisions. Regarding the prediction of perceived stimulus value, it is still contentious how decoding schemes could be implemented in the brain (for example, see the different propositions in Vogels 1990; Seung and Sompolinsky 1993; Pouget et al. 2000; Jazayeri and Movshon 2006). The proposed Fisher information decoding could be implemented in a higher-level neuronal population similar to maximum- likelihood inference, but by reading-out the Fisher information represented in the neuronal population instead of probability distributions. It should be noted that the FI model and the various standard models share common simplifying neurophysiological assumptions allowing tractable mathematical derivations: identical Gaussian tuning curve characteristics across neurons (amplitude and width), firing rate variability following independent Poisson noise, a theoretical hypercolumn arrangement of neuronal tuning. A challenging task for future research is to consider the biological complexity in more detail and explore the influence of various noise functions or correlation structures on decoding performance of the various models (Vogels 1990; Zemel et al. 1998; Abbott and Dayan 1999; Pouget et al. 2000).

The proposed FI model is built around two essential ingredients that allowed us to bridge the gap from neuronal encoding of sensory information to exact predictions of human perceptual performance. First, the model assumes that sensory responses are decoded in the theoretically optimal way by extracting the FI contained in the population of motion-tuned neurons in area MT/V5. Second, we propose in the context of a two-alternative forced choice discrimination task, that the subjective decision about the perceived stimulus is achieved by finding the stimulus value that lead to equal FI available for either alternative of the discrimination task (“leftward” versus “rightward” from the upward direction of motion).

The described mechanism critically depends on the insight that maximal information about a stimulus is conveyed by neurons with tuning preferences offset from the stimulus value (Regan and Beverley 1985; Hol and Treue 2001). In particular, for neurons in macaque area MT/V5 with known tuning width for the direction of motion of about 42–51° (Britten 2003), the most “Fisher” informative neurons are those with tuning preferences 60–72° away from the presented motion (at \(\sigma \sqrt 2 \)). Importantly, these FI theoretical values gain strong support from neuronal recording in macaque area MT/V5 (Purushothaman and Bradley 2005). Using a fine motion discrimination task this study revealed that most information about discrimination performance are obtained from neurons tuned to directions that were around 60–70° away from the task direction, which exactly corresponds to the prediction of the FI model. These neurophysiological findings along with the FI model suggest that the peaks of misperception correspond to those pools of neurons that contribute the most to the Fisher information. The individual neuronal contribution from the remaining population is lower but also conveys information and cannot be discarded. Thus, perceptual decisions are based on the whole population of activated neurons, while particular subsets of neurons contribute more than others due to their tuning parameters in relation to the incoming sensory stimulus.

Using FI as the critical decoding variable clearly diverges from models using maximum-likelihood and Bayesian inference methodologies to match perceptual decisions. These approaches generally involve computing probability distributions from the observed neuronal activity pattern. In contrast, we propose that the variance coded locally in the corresponding neuronal population is the only relevant variable used for the decision. This provides a new concept to imagine how the brain processes information. If response variance serves as the neuronal parameter critical for the read-out and decision mechanisms, then the brain may actually implement statistical variance tests on neuronal population responses and finds the stimulus value equalizing the variances. This can be considered as a novel theoretical framework of how the brain processes information, complementing a recently proposed general framework based on weight of evidence (Gold and Shadlen 2001, 2002).

In summary,we delineated an information theoretic model to account for human perceptual performance and compared it to standard decoding methods. The good performance of the models highlights the relevance of theoretical constraints on neuronal processing. In particular, it demonstrates that human perception could be the result of computational strategies that are optimal given the known tuning properties of neurons in cortical areas. This finding bridges the existing gap between theoretical accounts of neuronal information processing and perceptual performance of humans and suggests that humans can extract sensory information at their best given the neuronal population dynamics.