A neurophysiological explanation for biases in visual localization

Moreland, James C.; Boynton, Geoffrey M.

doi:10.3758/s13414-016-1251-z

A neurophysiological explanation for biases in visual localization

Published: 01 December 2016

Volume 79, pages 553–562, (2017)
Cite this article

Download PDF

Attention, Perception, & Psychophysics Aims and scope Submit manuscript

A neurophysiological explanation for biases in visual localization

Download PDF

James C. Moreland¹ &
Geoffrey M. Boynton¹

998 Accesses
4 Citations
1 Altmetric
Explore all metrics

Abstract

Observers show small but systematic deviations from equal weighting of all elements when asked to localize the center of an array of dots. Counter-intuitively, with small numbers of dots drawn from a Gaussian distribution, this bias results in subjects overweighting the influence of outlier dots – inconsistent with traditional statistical estimators of central tendency. Here we show that this apparent statistical anomaly can be explained by the observation that outlier dots also lie in regions of lower dot density. Using a standard model of V1 processing, which includes spatial integration followed by a compressive static nonlinearity, we can successfully predict the finding that dots in less dense regions of an array have a relatively greater influence on the perceived center.

Gravitational effects of scene information in object localization

Article Open access 01 June 2021

A unifying theory explains seemingly contradictory biases in perceptual estimation

Article 15 February 2024

Evidence against global attention filters selective for absolute bar-orientation in human vision

Article 30 October 2015

Introduction

Much of our environment consists of “stuff” that contains complex statistical structure. The human visual system is exquisitely sensitive to the summary statistics that describe this structure across a wide variety of domains that includes orientation, size, location, speed, and facial expression (Albrecht & Scholl, 2010; Alvarez & Oliva, 2009; Ariely, 2001; Chong & Treisman, 2005; Hubert-wallander & Boynton, 2015; Parkes, Lund, Angelucci, Solomon, & Morgan, 2001). What is less well understood is the mechanisms by which these summary representations are formed.

Here we examine how the statistics of a sample’s distribution influence the weighting of individual items in the context of one of the most commonly studied summary statistics – the ability of individuals to extract the mean or central tendency of a group of similar objects (Alvarez & Oliva, 2008; Greenwood, Bex, & Dakin, 2009; Hubert-Wallander & Boynton, 2015). Findings that representations of the group are more accurate than for any individual element has led many to conclude that the process appears to include all items in the group (Allik, Toom, Raidvee, Averin, & Kreegipuu, 2014; Ariely, 2001; Chong & Treisman, 2005; Haberman & Whitney, 2007; Oriet & Brand, 2013; Robitaille & Harris, 2011). It remains, however, an open question what contribution each element makes, and how the visual system arrives at a summary of these elements.

Some studies have shown that elements that deviate far from the group mean contribute less to the mean estimate than those that are more similar to the mean, consistent with the idea of a “robust estimator” that treats outliers as being less reliable. Such “robust weighting” has been shown both for category judgments of shape and color (de Gardelle & Summerfield, 2011), and for estimates of the centroid of a set of dots (Juni, Singh, & Maloney, 2010). However, other studies have shown the opposite finding, despite using quite similar stimuli. When observers were asked to saccade to the center of an array of dots they showed biases towards regions with fewer dots (McGowan, Kowler, Sharma, & Chubb, 1998).

In our study, subjects estimated the center of a two-dimensional array of dots using a procedure similar to that of McGowan et al.’s (1998) but with a mouse click response rather than a saccade. We thought it possible that explicit judgments (Juni, Singh, & Maloney, 2010) and saccades (McGowan et al., 1998) might be mediated by different estimates of the center location and could explain the differing results. To further examine any possible dissociation between these measures we also collected free viewing eye movements alongside explicit mouse click responses. The prediction made from the previous literature would be that eye movements would be biased towards more isolated dots and click responses would show equal weighting. Our results replicated those of McGowan et al. for both clicks and fixations – dots that lay further from other dots were weighted more heavily than those in denser regions. A second experiment serves to rule out the possibility of a more object-based model where it is the number of items per-unit-space in favor of a perceptually-based filtering model.

Such results cannot easily be explained as being mediated by the process of statistical estimation per se – sensible statistical estimators of central tendency (mean, median, mode, trimmed mean, and so forth) universally weight outliers less heavily. We suggest here that “anti-robust” biases in statistical estimation may reflect processing within an earlier stage of processing, and show that our results can be explained using a simple model where estimates of the mean are generated through simple unbiased averaging of nonlinear V1 responses that are themselves influenced by dot density.

Experiment 1

Methods

Participants

Fifteen students from the University of Washington were recruited from the Department of Psychology. All received payment of US$20 for their participation. Data collection for each subject was completed in under 60 min, across two sessions separated by a minimum of 4 h. All participants had normal or corrected-to-normal vision. Recruitment and study procedures in all experiments presented here were conducted in accordance with the ethical policies set forth by the University of Washington’s Human Subjects Division, and those in the Declaration of Helsinki.

Analysis of simulated data using variance based on two similar studies (Juni et al., 2010 - see Supplementary Material; McGowan et al., 1998) suggested that ten subjects would be sufficient to detect the predicted effect and would replicate previous findings. We set out to collect 15 observers to ensure sufficient power given that we expected to exclude trials due to failures in eye-tracking.

Apparatus and materials

Observers sat in a dimly lit room, 50 cm from a CRT monitor subtending 40.4° × 30.8° of visual angle. Two-dimensional movements of one eye were recorded by an ASL Eye-Trac® 6 at ~100 Hz. The subject’s other eye remained uncovered. A chin and forehead rest was used to stabilize the head position and maintain the distance from the screen. All stimuli were generated by custom software written using Psychophysics Toolbox (Brainard, 1997) for MATLAB.

Procedure

A schematic of the task design is shown in Fig. 1. Participants viewed a fixation cross in the center of the screen for a period that varied randomly between 500 and 1,000 ms. Then ten dots (radius of 0.3°) appeared simultaneously. Participants were free to move their eyes as soon as the dots appeared. After 300 ms the dot array was removed. After a further 300 ms a cursor appeared in a random location on the screen and participants made their response by moving the cursor with the mouse and clicking on the location they perceived to be the mean of the dots shown.

The instructions to the participant were to maintain fixation on the cross until it disappeared from the screen and to click on the location they thought was the center of the dots shown. It was made clear that this was not a sample from a larger population whose mean they were estimating.

A response had to be made within 2,000 ms of the cursor appearing or the trial would be discarded and a warning displayed. In order to make it clear that the saccade was not an explicit response, both eye movement data and click responses were collected within the same trials.

For each trial, the dot locations were sampled from a bivariate Gaussian probability distribution with a standard deviation of 2.3°. The center of the bivariate Gaussian distribution was drawn randomly on each trial from a uniform rectangular region, subtending the inner 70% of the full screen (20.2° × 15.1°). If a dot’s sampled location was outside the borders of the screen, it was resampled until it was within the full screen region.

Participants completed practice trials before beginning the experimental trials and were given the opportunity to clarify any instructions. Each participant completed 8–10 blocks of 50 trials with eye-tracking recalibration between each block.

Eye-tracking inclusion criteria

Fixations were defined as periods of eye position velocity below .015°/s for interpolated first derivatives of the eye position.

Trials were excluded if there were disturbances in the eye-tracking data collection such as no detected fixations, eye-tracking locations beyond the screen area or periods of no eye-tracking (including blinks). Trials were also excluded if the first fixation was not within 1.5° visual angle of the fixation cross at the beginning of the trial, if the second fixation occurred before 200 ms or after 600 ms and if the second fixation was made further than 7° visual angle from the trial array center (based on the dot distribution SD = 2.3 this would include almost all possible dots).

A number of subjects struggled to consistently meet these criteria and were excluded. In total, five of 15 subjects were excluded from all analyses on this basis, leaving ten subjects with 300–500 trials each.

Results

Our measures of interest were the click response reporting the subjects’ explicit estimate of the center of the dots with a mouse click, and the location of fixation at the end of the first saccade (note that subjects were not specifically told to fixate on the center of the dot array). Both responses consistently landed near the mean of dot arrays. This can be seen in Fig. 2 where the response location is centered on the mean of the dot array. Error in locating the mean of the array was smaller in the click responses (M = 0.64°, SD = 0.43°) than the first fixation (M = 1.22°, SD = 0.82°). There was no difference in size of error dependent on the eccentricity of the trial (Click: t(9) = 0.58, p = .58, 95% CI [−0.001, 0.002]; Fixation: t(9) = 0.80, p = .44, 95% CI [−0.003, 0.006]). All subjects showed an under-shoot in the saccades with a group mean proportion of 0.91 (SD = 0.23) compared with a group click mean proportion of 1.00 (SD = 0.14). This can be seen in Fig. 2b where the distribution of fixation responses is shifted to the left of the origin. These undershoots are consistent with previous saccade to target literature (e.g., McGowan et al., 1998).

Dot weighting as a function of distance from the true mean

We used maximum likelihood estimation to estimate the weight given to dots as a function of the distance from the true mean. Dots were sorted into distance bins with edges defined using a cumulative normal distribution so that each bin contained, on average, the same number of dots.

We then calculated the average weight applied to each dot within a bin by assuming that the perceived center of the array on trial t is the average of the individual dot locations, weighted by how far each dot fell from the mean. Let X _t = {x _1t, x _2t, …, x _10t} and Y _t = {y _1t, y _2t, …, y _10t} be the X and Y positions of the 10 dots for trial t, and let the weights for the bins be w[1], w[2], …, w[nbins]. The perceived center G(X), G(Y) for trial t is then predicted by:

$$ G\left(w,{X}_t\right)=\frac{{\displaystyle {\sum}_{i=1}^{10\kern0.5em }}{x}_{it}w\left[{\beta}_{it}\right]\ }{{\displaystyle {\sum}_{i=1}^{10\ }}w\left[{\beta}_{it}\right]}\mathrm{and}\ G\left(w,{Y}_t\right)=\frac{{\displaystyle {\sum}_{i=1}^{10\kern0.5em }}\ {y}_{it}w\left[{\beta}_{it}\right]\ }{{\displaystyle {\sum}_{i=1}^{10\ }}w\left[{\beta}_{it}\right]} $$

where β _it is the index [1, 2, …, nbins] for the bin that dot i falls in on trial t. If the weights are all equal, then the model predicts that subjects will estimate the center of the dots to be located at the true center of gravity of the dots:

$$ G\left({X}_t\right)=\frac{{\displaystyle {\sum}_{i=1}^{10}}{x}_{it}}{10}\mathrm{and}\ G\left({Y}_t\right)=\frac{{\displaystyle {\sum}_{i=1}^{10}}{y}_{it}}{10} $$

The best-fitting weights were found using a maximum likelihood method that assumed that the variability in the subject’s responses has a bivariate normal distribution and is independent across trials (supported by observing descriptive plots of the responses, Fig. 2a and b) (Wilks, 2011).

While the slope of the best-fitting weights for the click responses appears qualitatively to increase as a function of distance, the deviation from equal weighting is just shy of statistical significance (Mean slope = 0.089, SD = 0.12, t(9) = 2.18, p = .057; Fig. 3b). Equal weighting as a function of distance is more apparent with the fixation response. The location of the first fixation does not deviate significantly from the flat line prediction of a weighting of 1 regardless of dot distance (Mean slope =−0.003, SD = 0.13, t(9) = −0.08, p = .94; Fig. 3c).

Dot weighting as a function of dot proximity

Because our dots were drawn from a Gaussian distribution, density falls off with distance from the sample mean. Thus, an alternative explanation for our findings is that subjects put more weight on dots that are isolated from other dots. We first tested this by measuring how dot contribution is influenced by the number of other dots in near proximity.

For each trial, dot proximity was measured as the average distance to the other 9 dots. For isolated dots this results in a larger number. This is a parameter free measure of density. Weights as a function of average proximity were then estimated by fitting the model described above, but this time dots were sorted into discrete average proximity bins instead of into discrete distance to the mean bins.

Unlike the distance results we do not see equal weighting as a function of proximity for the click responses (Mean slope = 0.011, SD = .007, t(9) = 4.88, p < .001; Fig. 3e). Greater weight is associated with dots that lie further from other dots. Weights for fixation responses did not deviate from equal weighting (Mean slope = 0.03, SD = 0.14, t(9) = 0.60, p = .56; Fig. 3f).

Dot weighting as a function of density

Average proximity as implemented above is a measure of density over the whole array of ten dots and is therefore a somewhat implausible calculation for the brain. An alternative which can be measured over a more local area implements a linear filter which has been widely observed in the visual system. Our third model takes this approach.

For each trial, dot density was determined by convolving an image of the dot field with a 2-D Gaussian. The standard deviation of the Gaussian used to define density was allowed to be a free parameter for each observer which led to slightly different boundaries of the density bins. Density was then defined as the amplitude of this “density map” at each dot location. An isolated dot takes the lowest possible density of one. Weights as a function of density were then estimated by fitting the model described above, but this time dots were sorted into discrete density bins instead of discrete distance bins. The results reported here were found by repeating the bin fits using the average Gaussian width separately for click responses (M = 0.83, SD = 0.17) and first fixations (M = 0.80, SD = 0.23). We tested a range of Gaussian standard deviations between 0.4° and 2.0° and found the weights to be robust to the choice of Gaussian standard deviation.

Consistent with the proximity results, we do not see equal weighting as a function of density for the click responses (mean slope = −0.10, SD = 0.07, t(9) = −4.54, p = .001; Fig. 3h); lower weights were assigned to dots in high density regions (Fig. 3h). Free viewing data examining the location of the first fixation shows the same pattern of results (mean slope = −0.05, SD = 0.06, t(9) = −2.75, p = .022; Fig. 3i).

A linear-nonlinear model for localization

We next show how our results are consistent with a linear-nonlinear, or LN model, in which the perceived center of mass is computed by linear spatial filtering followed by a static compressive nonlinearity. To implement the model we the generated a density map as described above by convolving an image of the dot field for each trial with a 2-D Gaussian. The density map was then passed through an exponential function, U = V ^p. The perceived center was calculated as the two-dimensional centroid of the modified density map. If p < 1, then V ^p a compressive nonlinearity that increases the relative influence of dots in regions with lower density. This model of linear spatial filtering followed by a static non-linearity is consistent with known psychophysical (Legge, 1981; Legge & Foley, 1980) and physiological evidence (D. G. Albrecht & Hamilton, 1982), as well as well-established “normalization” models of early visual processing (Carandini & Heeger, 2012; Heeger, 1992).

Figure 4 shows an example of this model for an example set of dots. The leftward panel shows the center-of-mass after linear spatial filtering. The center-of-mass remains identical to the Euclidian mean of the unfiltered dots. The rightward panel shows the center-of-mass after the output of the linear spatial filters is passed through a saturating nonlinearity (p = 0.5). The effect is a small shift of the center-of-mass of the image away from high density regions.

For each subject we estimated the value of p that minimized the difference between predicted and obtained center of mass estimates. The best fitting value of p ranged between 0.31 and 0.84 for our ten subjects. The mean value of p across subjects was significantly lower than 1 (M = 0.63, SD = 0.16), t(9) = −7.28, p < .001, 95% CI [0.52, 0.75]. Thus, adding a compressive nonlinearity significantly improves our ability to predict subjects’ performance.

We next demonstrate the relationship between the LN model and the binned distance and density models by simulating responses generated by the LN model and fitting the simulated data with the weighted models. If the LN model is valid, we expect to find the similar weights as for the observed data.

Responses were generated, for the same dot stimuli that were presented to the observers, using a power function, U = V ^p, and the values of p found for each observer. Noise was added to the simulated responses using the observers own click response standard deviation for draws from a zero mean 2-D Gaussian.

The weights for the density and distance binning models are shown in Fig. 5. The weights found for the predicted responses closely follow those observed in the behavioral data.

Interim discussion

This first experiment investigated whether the visual system equally weights all elements when localizing a group of dots. We found that observers were not equally weighting all dots. Instead there is support for an overweighting of dots in low density regions as defined by both a parameter free proximity model and a linear filter model. We were able to account for these findings by using a simple LN model of early visual processing where a linear spatial filter is followed by a non-linear compression leading to an emphasis on lower densities relative to higher densities.

Experiment 2

Experiment 1 showed that observers were not equally weighting all of the dots when estimating the sample center. These results are consistent with two main classes of models. For the “position-based” model, elements are assigned weights based entirely on their position relative to other elements, so that outlying and/or dots in regions of low density receive relatively higher weights. For the “perceptually-based” model, the center of mass is computed on the representation of the stimulus after an early stage of perceptual processing. A compressive nonlinearity in the early filtering process leads to an effective overweighing of dots in low-density regions.

Experiment 2 was designed to distinguish between these two classes of models by varying the contrasts of the dots within each array. A spatially-based model only considers dot location and should therefore not be affected by dot contrast. However, the filtering process in the LN model should reduce the influence of low-contrast dots on the perceived center.