Sensory uncertainty leads to systematic misperception of the direction of motion in depth
- 1.2k Downloads
Although we have made major advances in understanding motion perception based on the processing of lateral (2D) motion signals on computer displays, the majority of motion in the real (3D) world occurs outside of the plane of fixation, and motion directly toward or away from observers has particular behavioral relevance. Previous work has reported a systematic lateral bias in the perception of 3D motion, such that an object on a collision course with an observer’s head is frequently judged to miss it, with obvious negative consequences. To better understand this bias, we systematically investigated the accuracy of 3D motion perception while manipulating sensory noise by varying the contrast of a moving target and its position in depth relative to fixation. Inconsistent with previous work, we found little bias under low sensory noise conditions. With increased sensory noise, however, we revealed a novel perceptual phenomenon: observers demonstrated a surprising tendency to confuse the direction of motion-in-depth, such that approaching objects were reported to be receding and vice versa. Subsequent analysis revealed that the lateral and motion-in-depth components of observers’ reports are similarly affected, but that the effects on the motion-in-depth component (i.e., the motion-in-depth confusions) are much more apparent than those on the lateral component. In addition to revealing this novel visual phenomenon, these results shed new light on errors that can occur in motion perception and provide a basis for continued development of motion perception models. Finally, our findings suggest methods to evaluate the effectiveness of 3D visualization environments, such as 3D movies and virtual reality devices.
KeywordsMotion: in depth 3D perception: *other Binocular vision: neural mechanisms and models
The accurate perception of object motion is critical to survival. Although we have made major advances in our understanding of motion perception based on the processing of lateral (2D) motion signals on computer displays, the vast majority of motion in the real (3D) world occurs outside of the plane of fixation, and motion directly towards or away from the observer tends to have particular behavioral relevance.
One insight gained from the study of 2D motion perception is that when two objects with physically identical speeds but different contrasts translate on a computer screen, observers tend to report that the lower contrast object moves more slowly (Thompson, 1982; Stone & Thompson, 1992). This phenomenon has been explained in terms of perceptual inference, whereby the reduction in contrast and the associated increase in sensory uncertainty, result in a proportionally larger contribution of prior expectations. Because prior experience tells us that most objects in our environment tend to be stationary or move slowly, poorly visible stimuli therefore appear to move more slowly (Yuille & Grzywacz, 1988; Weiss, Simoncelli, & Adelson, 2002; Stocker & Simoncelli, 2006).
Since misperception of the motion of an approaching object can have serious consequences, we would like to know if similar effects occur for perception of 3D motion. Indeed, previous work has reported systematic biases in the estimation of both real and virtual object motion in depth, such that objects appear to move more sideways (Harris & Dean, 2003; Welchman, Tuck, & Harris, 2004; Harris & Drga, 2005; Gray, Regan, Castaneda, & Sieffert, 2006; Poljac, Neggers, & van den Berg, 2006; Lages, 2006; Rushton & Duke, 2007; Welchman, Lam, & Bülthoff, 2008; Duke & Rushton, 2012). This lateral bias is thought to arise based on the geometry of 3D motion perception and the mechanism for 2D speed perception described above (Welchman et al., 2008).
However, such bias is somewhat puzzling. Everyday behavior does not seem to be routinely affected by laterally biased estimates of object motion. The goal of the current study, then, is to revisit this bias and systematically investigate the accuracy of 3D motion perception.
We first assessed performance under relatively optimal conditions, and subsequently investigated changes in performance under two manipulations of sensory noise. In the first manipulation, we increased sensory noise through reductions in stimulus contrast, similar to manipulations employed in the study of biases in 2D motion perception. In the second manipulation, we varied the target’s position in depth relative to fixation. The shift of target position in depth does not affect the sensory uncertainty associated with the two retinal images, but we reasoned that it should impact sensory uncertainty due to reduced sensitivity to stimulus properties, such as binocular disparity away from the plane of fixation (Westheimer & Tanzman, 1956; Blakemore, 1970; Schumer & Julesz, 1984; Landers & Cormack, 1997). We further reasoned that if both sensory noise manipulations have similar consequences from a perceptual inference perspective, they should have similar impacts on behavioral performance.
To anticipate, we find little evidence for the previously reported bias in 3D motion perception. Instead, we find relatively accurate performance under optimal conditions. In addition, we find a surprising novel phenomenon, whereby observers systematically confuse the motion in depth direction of a target’s motion (i.e., the observer will report approaching motion when the object is in fact receding and vice versa). Because these confusions become more prevalent with manipulations of both contrast and target position, we conclude that sensory uncertainty in general leads to these systematic confusions of the direction of motion in depth.
Experiment 1: Behavioral performance in a 3D motion extrapolation task (“3D Pong”)
The goal of Experiment 1 was to establish performance under relatively optimal, low-sensory noise conditions. We measured performance with high-contrast targets moving near the fixation plane. Because our initial observations were inconsistent with previously reported biases in the perception of 3D motion, we verified the robustness of this tendency across five additional levels of relatively high target contrast.
Five experienced observers, including one author (JMF), and four inexperienced observers participated in the experiment. All had normal or corrected-to-normal vision. The experiments were performed in accordance with the guidelines of The University of Wisconsin - Madison Institutional Review Board, and all observers gave informed written consent.
The experiments were performed using Matlab and the Psychophysics Toolbox (Brainard, 1997; Pelli, 1997) on a Windows 7 computer with an Nvidia Quadro 4000 video card. All stimuli were presented on a 29-cm × 51-cm 3D LCD display (Planar, 120 Hz, 1920 × 1080 pixels) at a viewing distance of 90 cm (29.54° × 16.62° of visual angle) such that 1 pixel subtended 0.015° of visual angle. Stimuli were anti-aliased to achieve subpixel resolution. Observers viewed the display through Nvidia 3D shutter glasses, which were synched with the refresh rate of the display (60 Hz per eye). The experiment was conducted in a dark room, with the display being the only source of illumination. The luminance of the display was linearized using standard gamma-correction procedures with mean luminance = 3.85 cd/m2 when viewed through the shutter glasses. Observers used a keyboard to make responses and completed the trials at their own pace.
On each trial, a 0.43-cm diameter dot (“target”), 0.25° at the 90-cm viewing distance, appeared at fixation (i.e., at the midpoint of the screen plane). The target was rendered with one of six contrast levels (Weber fractions: 434.34, 62.16, 44.87, 38.85, 18.55, and 10.28), which corresponded to luminance values of 5.79 cd/m2, 0.84 cd/m2, 0.61 cd/m2, 0.53 cd/m2, 0.26 cd/m2, and 0.15 cd/m2 when viewed through the shutter glasses. (A fully white object was 5.79 cd/m2 when viewed through the shutter glasses). Note that these Weber fractions are all >1, meaning that all contrast levels in this experiment were significantly larger than the corresponding Michelson contrast (100 %) typically used in 2D motion experiments. We will turn to the effect of Weber fractions <1 in Experiment 2.
The target followed a random trajectory defined by independently chosen random speeds in the x (lateral) direction and the z (motion in depth) direction, with no change in y (vertical direction) before disappearing. Velocities in x and z were independently chosen from uniform distributions with magnitudes ranging between (0.02:6.2 cm/s). The velocities were then given a random sign, so that for approximately 50 % of the trials, the stimulus approached/receded. Given the stimulus presentation time of 1 second, the target’s motion produced an average maximum binocular disparity of 0.27°. The independently chosen x- and z-velocity components were critical for preventing observers from adopting strategies that bypass the need for accurate estimation of both x- and z-velocity components from the retinal signals.
A 3D rectangular block (paddle) whose faces also consisted of a 1/f noise pattern was positioned within the display at a simulated radial distance of 12.4 cm from the target’s initial start position. During adjustment, the paddle moved along a circular path around the target’s initial start position in the x-z plane with the same side of the paddle facing the target’s start position at all times. No translation occurred in y. The 1-cm wide paddle was positioned 12.4 cm from the target’s initial start position, so that it was at a sufficient distance to keep the target’s motion within the fuseable area while providing appreciable depth to the displays. A movie demonstrating the stimulus trial sequence can be viewed in the Supplementary Material
Observers were first provided with written and oral instructions from the experimenter. They then performed 10-15 practice trials in the presence of the experimenter to become familiar with the apparatus. During this time, the observer also was asked to note whether the target was approaching or receding. If observers were unable to make these judgments, reported difficulty themselves, or were judged to have difficulty viewing the stimulus in any other way, they were excluded from further participation (this amounted to <10 % of potential observers for the entire study). All remaining participants completed the experimental trials in two sessions (Experiment 1) and one session (Experiments 2 & 3), and all data collected were included in the subsequent analyses. No feedback was provided for either the practice or experimental trials.
On each trial, the observer fixated the region between the Nonius lines at the center of the screen. The target appeared at fixation and followed a linear trajectory defined by the random independent velocities in x and z chosen for that trial, sometimes appearing to come out of the screen (“approaching”), sometimes appearing to move back into the screen (“receding”). After 1 second, the target disappeared and the paddle immediately appeared. On the first trial, the paddle appeared to the far right of fixation (i.e., at 0°). On subsequent trials, the paddle appeared at the last location set by the observer. The observer was instructed to extrapolate the visible portion of the target’s trajectory and adjust the paddle’s position so that it would have intercepted the target if the target had continued along its trajectory. We asked observers to extrapolate the trajectory so that responses would be based on perceived motion direction rather than some heuristic, such as the location of target disappearance. Explicit visual and verbal instructions, as well as the task familiarization phase validated that observers understood the task as an extrapolation task. During the setting phase of the trial, the observer’s eyes were free to move about the display. When the observer was satisfied with the paddle setting, he resumed fixation and pressed the spacebar to initiate a new trial (Fig. 1b).
The angle corresponding to the midpoint of the paddle setting (reported direction) on each trial provided the observer’s estimate of the target motion direction (presented direction) on that trial. To quantify overall performance, we computed the error for each trial as the circular distance between the presented motion trajectory and the reported paddle position (Fig. 1c) using the CircStats toolbox for MATLAB (Berens & Velasco, 2009). To draw comparisons to the existing literature that has consistently reported lateral biases in observer reports, we took the direction (sign) of the observer’s report relative to the physical trajectory into account (Fig. 1c). Settings that were laterally biased as in previous studies, i.e., biased towards the fixation plane, were assigned positive values. Conversely, settings that were medially biased, i.e., biased towards the vertical plane through the viewing direction (the midsagittal plane) were assigned negative values. Thus, the circular mean of these signed errors not only provide a measure of each observer’s overall accuracy for the particular condition, but also a measure of any systematic biases (lateral or medial) in their estimates—negative values correspond to medial bias and positive values correspond to lateral bias.
To summarize performance as a function of sensory uncertainty, we used two basic summary measures of performance: 1) confusions in the motion in depth direction of the target’s motion (i.e., approaching vs. receding); 2) confusions in the lateral direction of the target’s motion (Fig. 1d).
To compare performance to previous reports (Welchman et al., 2004; Welchman et al., 2008), we also briefly consider performance in response to target motion within the range of −64:64° relative to head-on motion (270°), denoted by the inset in Fig. 2a and the data in Fig. 2b. The between-subject mean signed error was −4.25° [bootstrapped 95 % CI −5.30, −3.19°], indicating a small but medial bias within this subset of presented target motion—that is, the settings were closer to the midline than the true target trajectories, revealing an inconsistency with the pattern reported in previous studies.
Due to the decline in performance for the lowest target contrast condition, reduced target contrast is associated with a significant increase in the tendency to confuse whether the target is approaching or receding (Fig. 3b; repeated-measures ANOVA, F(5,20) = 10.2, p < 0.001). On the other hand, there is no effect of reduction in target contrast in the proportion of trials in which observers confuse the lateral direction of motion in depth (Fig. 3c; repeated-measures ANOVA, F(5,20) = 0.73, p = 0.61).
In contrast to previous work, these results reveal highly accurate performance under high-contrast conditions, with little evidence for systematic biases in the judgment of 3D motion direction. When contrast is reduced, the pattern of responses indicates that observers begin to confuse motion in depth, but not lateral motion, revealing a previously undocumented illusion in the perception of 3D motion. Rather than a lateral bias, observers seem to confuse approaching and receding motion. Given the potentially problematic nature of such confusions in the real world, we further explore this phenomenon in the next two experiments. As an aside, O3 & O4 seem to be confusing the direction of motion in depth, for receding, but not approaching, motion even under optimal conditions. We believe this is due to a separate bias in judging the position of stimuli as a function of contrast. We will return to this point in the discussion.
Experiment 2: Manipulating target contrast
Experiment 1 demonstrated that, over a broad range of target contrasts, 3D motion in depth trajectory extrapolation performance is relatively accurate. However, for the lowest contrast level, larger errors did emerge so that all observers became more likely to confuse the direction of the target’s motion in depth (i.e., reporting that the target was approaching, when it was in fact receding) even when observers’ ability to identify the lateral direction of the target’s motion was not impacted. Having established the type of impact target contrast has on 3D trajectory extrapolation, we designed Experiment 2 to explore the impact of target contrast by further reducing target contrast. This was achieved by changing the stimulus aperture to mid-gray. We then used three target Weber contrast levels <1.
Five observers from the UW-Madison community with intact stereovision as determined by the task acclimation phase (General Methods) participated in the experiment. Two were authors (J.F. and M.R.) and three were new and naïve to the purpose of the experiment. The authors’ performance did not differ from the naïve observers and thus their data are combined in the analyses.
The stimuli were identical to those of Experiment 1 with the exception that the aperture was set to mid-gray with luminance = 3.66 cd/m2 when viewed through the shutter glasses. The target was rendered with one of three Weber fraction contrast levels (0.58, 0.33, 0.17), which corresponded to luminance values of 5.79 cd/m2, 4.87 cd/m2, and 4.24 cd/m2 when viewed through the shutter glasses. The three target contrast levels were counterbalanced and presented in pseudorandom order. Each contrast level was tested 200 times for a total of 600 trials per observer.
Experiment 3: Manipulating position-in-depth
In the preceding experiments, we demonstrated that reductions in target contrast were associated with (i) an increased tendency to confuse the motion in depth direction of the target’s motion, and (ii) no significant impact on the ability to estimate the lateral motion direction of the target’s motion. We argued that these impacts resulted from increased target motion uncertainty due to increased sensory noise, rather than the contrast of the target per se. If our reasoning is correct, the specific source of the sensory uncertainty should be irrelevant, and other manipulations that increase sensory noise should be associated with similar impact on observer performance. To test this assertion, we fixed the target contrast to the highest level used in Experiment 2 but varied the reliability of binocular information through shifts in the target’s position in depth relative to the plane of fixation.
Ten new, naïve observers from the UW-Madison community with intact stereovision as determined by the task acclimation phase (General Methods) participated in the second experiment along with two of the authors (J.F. and M.R.).
Each of the ten naïve observers were assigned randomly to one of the two start position conditions (5 per group). Each completed 200 trials. The two authors participated in both conditions. The authors’ performance did not differ from the naïve observers, and thus their data are included in the analyses, yielding a total of seven observers per group. All other experimental details are identical to Experiment 2.
Figure 5a shows the reported direction as a function of the presented direction for all seven observers in each of the two target start position conditions. For comparison, the nonshifted (“At Fixation” start position) data for the five observers from Experiment 2 also are shown. As expected, we find a significant increase in the proportion of trials in which the motion in depth direction of the target was confused for the two shifted target position conditions (F(2,16) = 8.243, p < 0.01; Fig. 5b). The shift in the target position relative to fixation did not significantly alter the proportion of trials in which observers confuse the lateral motion component (F(2,16) = 0.464, p = 0.64; Fig. 5c).
Quantifying the impact of uncertainty on 3D motion estimation
In three experiments, we have provided new insight into the impact of sensory uncertainty on 3D motion estimation. One of the consequences of increased uncertainty as revealed by performance in our task is a tendency for observers to misreport the direction of the motion-in-depth component under conditions of greater uncertainty. Whereas reported lateral motion did not differ significantly across levels of uncertainty, this may simply have been due to the low proportion of lateral motion confusions in our data (<3 %). We therefore wanted to test if the 3D motion estimation process breaks down more generally with increased sensory uncertainty, by nonetheless revealing an effect of uncertainty on reported lateral motion.
Towards this end, we compared the relative reliability of observers’ estimates of the two motion components (x and z) in our task. Observers’ responses reflect the combination of the perceived lateral and motion in depth speed components of the target’s motion on each trial, so we cannot test the reliability of those estimates separately. However, we can index the uncertainty in-motion, in-depth component on a trial-by-trial basis according to whether or not the observer reported the correct direction of motion in depth. We assume that sensory uncertainty was lower for trials in which the observer reported the correct direction of motion and that sensory uncertainty was higher for trials in which the observer reported the incorrect direction of motion. This binary measure is simply a categorization of a continuum of uncertainty. The lateral judgment error variance conditioned on the accuracy of the corresponding depth judgment provides a measure of uncertainty in lateral motion component estimation.
The extent to which motion-in-depth and lateral measures of sensory uncertainty vary together is indicative of the nature of break down in the 3D motion estimation process. The common source of sensory information (i.e., the retinal motion) used to derive these estimates predicts a correlation between the measures such that uncertainty in the motion in depth direction should be associated with greater variability in lateral judgments. Previous work has reported that observers do not rely on the z-component of motion when reporting trajectory direction (Harris & Dean, 2003; Harris & Drga, 2005). This would result in the two measures not being correlated. Such an outcome would suggest that estimation of motion in depth and lateral motion for the targets in our task share no common mechanisms (which could be the case, for example, if lateral motion is velocity-derived and motion in depth is disparity-derived).
The lateral judgment variance for the two categories of trials (i.e., correct/incorrect motion in depth direction reports) for all observers and experimental conditions are plotted in Fig. 6b. The majority of the data points fall above the identity line, including the group mean, indicating that the lateral judgment error variance was larger when the motion-in-depth direction was judged incorrectly compared to when it was judged correctly under all levels of uncertainty. These results suggest that estimation of both an object’s lateral motion component and motion in depth component are subject to a common source of sensory uncertainty. We previously did not find significant differences in lateral motion confusions with manipulations of sensory uncertainty. However, relating this finding back to the stereotypical pattern of results that emerged across these experiments and highlighting the motion in depth direction confusions, these results can be interpreted as providing direct evidence that trial-to-trial sensory uncertainty as revealed by motion in depth confusions has a similar, albeit much smaller, effect on lateral motion estimates.
We can rule out that the motion in depth direction confusions are the result of a random approaching versus receding response choice, independent of the lateral setting, as would be revealed by the data points falling along the identity line. Lastly, we can further rule out the use of a simple response heuristic in which observers provide a default response on trials in which they are uncertain about the motion in depth direction, as would be revealed by the data points falling below the identity line.
To conclude, the results indicate that the lateral and motion in depth components of the observer’s report are affected by a common source of sensory uncertainty. The effect on the motion in depth component is simply much more apparent than the effect on the lateral component.
In the current study, we examined observer reports of the direction of targets moving in depth under manipulations of sensory noise. We uncovered a novel and surprising tendency to confuse the direction of target motion (i.e., confusing approaching and receding motion). These behavioral effects were consistent and systematic across conditions of increased sensory noise, suggesting that confusion in the direction of motion in depth is due to fundamental limitations of the sensory inference of 3D motion, rather than specific properties of the stimuli. These results shed new light on failures of human motion perception and provide a criterion to assess the quality of 3D visualization environments, such as 3D movie or virtual reality devices.
Although initial investigation indicated that the tendency to confuse the lateral component of the target’s motion did not significantly increase with sensory noise, further investigation revealed that these effects are in fact related, such that an increase in uncertainty in motion in depth component estimation is associated with an increase in uncertainty in lateral motion component estimation on a trial by trial basis. Thus, the two motion components share a common source. It simply seems the case that the impact of sensory uncertainty is much more apparent in the observer’s estimation of the z (depth) motion component.
These results contribute to the understanding of the types of errors that occur in 3D motion processing. It has been shown previously that 2D motion appears slower under high-noise conditions, but that even under low-noise conditions 3D motion appears to be systematically laterally biased—i.e., observers report motion as moving more sideways (Harris & Dean, 2003; Welchman et al., 2004; Harris & Drga, 2005; Gray et al., 2006; Poljac et al., 2006; Lages, 2006; Rushton & Duke, 2007; Welchman et al., 2008; Duke & Rushton, 2012). We do not find evidence for such bias in our data. The systematic decline of observer performance under the noisy conditions in our study supports the overall notion that uncertainty in 3D target motion gives rise to errors in the reported direction of motion (Welchman et al., 2008; Lages, 2006), but this seems to have altogether different effects than an increase in lateral bias.
Why might we observe results different from those reported previously? One distinction between our task and those used in most previous studies is that the full 360° space was utilized for both the stimuli and responses (but see Lages, 2006). In previous work, observers were often provided with a response space that was markedly inconsistent with the true set of stimuli. Specifically, the expected stimulus space comprised a wider (i.e., more lateral) range than the actual stimulus space. Observers were typically told that the stimulus could approach at any angle across the 180° range, whereas the stimuli in actuality were constrained within smaller ranges around the midsagittal plane. We hypothesize that invoking prior beliefs about the distribution of possible target motion trajectories is a strategy observers may use in overcoming the uncertainty in estimating motion in depth. We hypothesize that such a mismatch between one’s prior beliefs and the true distribution of possible target motion trajectories may have been a factor in the lateral estimation biases reported previously. In the context of our study, the range of presented trajectory directions did span the full 360° space; however, we note that the use of uniformly distributed x- and z-velocities establishes anisotropies in the range such that presented trajectories near the 45/135/225/315° directions were more likely than those closer to the cardinal directions. Thus, although we do not see strong evidence in observers’ settings, our observers may nevertheless have developed a prior for such a distribution. Future work focusing specifically on the range of presented trajectory directions, including work that utilizes a truly isotropic distribution of presented trajectory directions, might provide greater insight into the role and impact of prior beliefs on 3D motion direction estimation.
The directional confusions we report may have been observed previously in motion-in-depth estimation tasks. Lages (2006) established errors in the reported quadrant of the 360° space as a criterion to screen out observers before experimental participation. That is, directional confusions previously have been treated as indications that observers are unable to perform the task. We interpreted these confusions not as failures of stereovision per se, but rather as an index of uncertainty in estimation of the depth component. All our observers easily met the Lages criterion in that they correctly reported the motion in depth direction of the target on 75 % or more of trials for the majority of the contrast levels used. That observers (inexperienced and authors alike) can perform the task and meet the criterion under the best conditions, but start to confuse the direction of motion in depth under poor conditions, indicates that the confusions are a real phenomenon arising from increased sensory uncertainty rather than a fixed characteristic of an observer’s stereomotion acuity.
Secondary to our main effects, observers seemed more likely to report receding motion as approaching under high-contrast conditions and approaching motion as receding under low-contrast conditions. This pattern of results is consistent with a “brighter is closer” heuristic. This heuristic mirrors the relationship between contrast and depth in natural scenes, sometimes known as proximity-luminance covariance (Coules, 1955; Schwartz & Sperling, 1983; Dosher, Sperling, & Wurst, 1986) and recently has been shown to bias observer reports of perceived depth in natural images (Cooper & Norcia, 2014). Thus, in the context of our study, observers may be influenced by a prior expectation of target position in addition to any prior expectations of target motion, especially in cases where sensory uncertainty is already high.
Finally, one could propose that these motion-in-depth confusions are due to observers not seeing the target on certain trials. However, if observers did in fact miss the target on a particular trial, the response on that trial should be random. Random responses would not only yield motion-in-depth direction confusions, but also lateral motion confusions (i.e., misreports of whether the target was traveling leftward or rightward). Importantly, we saw few lateral motion confusions in our data (<3 % of all trials), ruling out that potential explanation for these surprising errors.
This work extends our understanding of motion perception and contributes to a principled approach to understanding errors that arise in 3D motion perception. We have shown that the human ability to estimate motion in depth, in particular, the direction of motion in depth (i.e., approaching vs. receding) is limited by sensory uncertainty, which may arise from a variety of sensory noise sources.
This work was supported by the Netherlands Organization for Scientific Research (NWO) Veni Grant 451-09-030 to B. Rokers. Additional support was provided by NSF award SMA-1004961. The authors thank Andrew Welchman, Michael Landy, and one anonymous reviewer for helpful comments on a previous version.
- Berens, P., & Velasco, M. J. (2009). The circular statistics toolbox for Matlab. MPI Technical Report No 184.Google Scholar
- Welchman, A. E., Lam, J. M., & Bülthoff, H. H. (2008). Bayesian motion estimation accounts for a surprising bias in 3D vision. Proceedings of the National Academy of Sciences of the United States of America, 105(33), 12087–12092. doi: 10.1073/pnas.0804378105 PubMedCentralPubMedCrossRefGoogle Scholar