Representation of motion onset and offset in an augmented Barlow-Levick model of motion detection
Kinetic occlusion produces discontinuities in the optic flow field, whose perception requires the detection of an unexpected onset or offset of otherwise predictably moving or stationary contrast patches. Many cells in primate visual cortex are directionally selective for moving contrasts, and recent reports suggest that this selectivity arises through the inhibition of contrast signals moving in the cells’ null direction, as in the rabbit retina. This nulling inhibition circuit (Barlow-Levick) is here extended to also detect motion onsets and offsets. The selectivity of extended circuit units, measured as a peak evidence accumulation response to motion onset/offset compared to the peak response to constant motion, is analyzed as a function of stimulus speed. Model onset cells are quiet during constant motion, but model offset cells activate during constant motion at slow speeds. Consequently, model offset cell speed tuning is biased towards higher speeds than onset cell tuning, similarly to the speed tuning of cells in the middle temporal area when exposed to speed ramps. Given a population of neurons with different preferred speeds, this asymmetry addresses a behavioral paradox—why human subjects in a simple reaction time task respond more slowly to motion offsets than onsets for low speeds, even though monkey neuron firing rates react more quickly to the offset of a preferred stimulus than to its onset.
KeywordsAcceleration Accretion and deletion Occlusion Visual cortex Visual motion
The human visual system operates in depth, separating even the simplest images into a figure and its background (Rubin 1921). Kinetic occlusion (Michotte et al. 1991; Kaplan 1969) is one figure-ground segregation cue that may be processed early in the visual hierarchy. The ubiquity and primacy of motion processing across species provides some evidence for a low-level kinetic occlusion mechanism: for example, both humans (van Doorn and Koenderink 1982) and bees (Srinivasan et al. 1990) can find edges defined by motion parallax alone; conversely, prey are better hidden when both camouflaged and still (Heatwole 1968). However, the neural mechanisms that process local motion signals, while modeled in animals like the fly (Hassenstein and Reichardt 1956) and rabbit (Barlow and Levick 1965), are not fully understood in primates.
Kinetic occlusion produces discontinuities in the optic flow field, whose saliency increases with surface texture density. A patch of contrast on a far surface will move through the optic flow field until it suddenly stops at the occluding boundary and disappears from view (texture deletion; Kaplan 1969). If the far surface is instead being uncovered, patches of contrast suddenly appear at the occluding boundary (texture accretion). The sudden onset and offset of contrast affects motion discrimination in a way that suggests it produces a strong transient signal (Churan et al. 2009). Accretion and deletion are not necessary for depth ordering from kinetic occlusion (Yonas et al. 1987), but they are local cues that have proven useful in computer vision models of depth ordering (Black and Fleet 2000; Feldman and Weinshall 2008). During kinetic occlusion the change in texture is accompanied by the onset and offset of local motion signals. Detection of these motion onsets and offsets may thus be an important early step in kinetic occlusion perception.
Our understanding of motion onset and offset neural mechanisms is guided by reaction time (RT) studies, which have yielded two general results: subjects report their perception of onset and offset after a time that is inversely proportional to the speed of an object while it moves (Dzhafarov et al. 1993; Kawakami et al. 2002), and they respond slightly more slowly to motion offsets than onsets (Kreegipuu and Allik 2007). Monkey neurophysiology studies have searched for sustained acceleration and deceleration signals in visual areas, of which onsets and offsets are extremes. These studies have not found cells in visual motion areas whose tonic firing varies linearly with acceleration, but many cells produce transient responses to both the onset and offset of motion (Lisberger and Movshon 1999). Generalizing to all accelerations, the studies suggest that adaptation of middle temporal area (MT) cell activities to moving stimuli may allow for a population-level representation of acceleration (Priebe and Lisberger 2002; Price et al. 2005; Schlack et al. 2007). This adaptation may also explain the reaction time results mentioned above (Dzhafarov et al. 1993).
We have synthesized these results into a circuit model that detects the unexpected onset or offset of stimulus motion. Based on evidence that Meynert cells in layer six of primary visual cortex (V1) use a nulling inhibition mechanism for motion detection (Livingstone 1998), we use the Barlow-Levick detector (Barlow and Levick 1965) as an elementary motion detector rather than a correlative (Hassenstein and Reichardt 1956) or energy (Adelson and Bergen 1985) model. The output of the Barlow-Levick model is the input to a similar circuit, which prefers strong accelerations/decelerations by inhibiting responses to constant motion, just as the original circuit inhibits against the null direction of motion. The cells in this new model layer respond selectively to stimulus motion onset and offset over a limited range of speeds, the distribution reflecting responses of MT cells to accelerations and decelerations (Schlack et al. 2007). We show that, given a simple model of reaction time (Ratcliff 1978), the speed-dependent response of onset and offset cells also qualitatively explains the difference in human subject reaction times when responding to the onset and offset of stimulus motion (Kreegipuu and Allik 2007). The key model insight is that, in order to produce a positive offset response to an absence of neural activity corresponding to motion, the system must produce excitatory activity (tonic excitation, predictive priming, etc.) that underlies both a faster neural response and a slower behavioral response relative to motion onsets.
2 Model specification
2.1 Barlow-Levick circuit
For simplicity, the input to the model (squares in Fig. 1(e)) is specified as an undirectional contrast signal Ii at positions denoted by the spatial index i. The input is explicity defined as a function of time t and position i, rather than as a differential equation. Undirectional cells correspond to cells in magnocellular lateral geniculate nucleus (LGNm), which respond to sudden increases or decreases in contrast (Benardete and Kaplan 1999).
Undirectional cells activate both directional cells and their associated inhibitory interneurons. The interneurons provide nulling inhibition to adjacent positions, where an input would otherwise activate both directions at that position. These interneurons do not use an explicit time delay but instead have a slow passive decay rate, which leaves an activity trace (“delay”) after an input disappears. In some models (Grossberg et al. 2001; Berzhanskaya et al. 2007) directional cells decay quickly, but for simplicity we use the same decay parameter for all model cells.
The filtered output of directional cells becomes the input to model onset and offset cells (described below), just as it forms the basis of some more elaborate motion processing models (Chey et al. 1998; Grossberg et al. 2001; Berzhanskaya et al. 2007).
2.2 Short range filter
2.3 Onset and offset detectors
Onset and offset model cells act similarly to the Barlow-Levick model in that they signal a large acceleration or deceleration, respectively, unless a nearby spatially and temporally displaced directional signal also occurs. When directional inputs activate neighboring positions in order, the excitation and inhibition to these cells are balanced; otherwise one or the other will become active according to whether the motion has suddenly stopped or started. In the course of constant motion, onset (offset) cells will be preemptively inhibited (excited) before the second input appears. While these model cells are used as comparators for MT speed tuning characteristics (Schlack et al. 2007) in the present work, model units with this connectivity pattern might instead be identified with cells in the second primate visual area (V2), some of which respond selectively to kinetic contours (Marcar et al. 2000).
Activity variable names and connectivities for all of the above cell types are shown in Fig. 1(e).
The sudden onset of a stimulus produces an onset signal as well as directional signals in both directions away from the onset position (Fig. 3, dotted lines). Once the input activity continues to the right, however, the signal becomes directional because leftward directional cells are preemptively inhibited before the input arrives (Fig. 3, solid lines). Offset cells respond vigorously when the stimulus ends (Fig. 3, dashed lines), but they also give a response during constant motion. This small, “incorrect” signal reduces the reliability of offset signals. The size of both correct and spurious (incorrect) responses vary as a function of stimulus speed—a relationship investigated in Section 3.1.
3.1 Speed-dependent model responses
The Barlow-Levick detector is sensitive to speed, just as a motion energy filter responds to stimuli within a limited speed range (Simoncelli and Heeger 1998). Onset and offset cell activities also change in amplitude with a change in stimulus speed: at speeds for which directional cells vigorously respond, onset/offset cells are also more active, both when the stimulus begins/ends (correct) and when the stimulus moves uniformly (incorrect). We measured this speed dependence with a selectivity measure that increases with correct activity and decreases with incorrect activity. In order to do this, we simulated the lumped activities of three distinct theoretical neuron populations that have different input connectivities for accumulating evidence of motion onset, a particular motion direction, and motion offset. We assume that evidence accumulating populations exist or are dynamically constructed for different tasks; we implement only those that are sensitive to aspects of the stimuli used in the presented simulations. Each accumulator population is excited by “correct” activity and inhibited by “incorrect” cell activity. Selectivity is defined as the maximum activity this evidence accumulator population reaches, which roughly corresponds to the time it takes to reach a threshold activity after taking into account unmodeled brain processes such as competition between evidence accumulators and higher-level motion grouping processes. Onset and offset selectivities are inverted in Section 3.2 to create a measure of the model’s reaction time to the presence of that stimulus aspect. This measure captures our assumption that some other brain region which controls the motor response of a subject in a reaction time experiment accumulates evidence for when a moving stimulus has changed while habituating to “incorrect” cell activity (Dzhafarov et al. 1993).
Figure 4(c) plots the temporal delay between a stimulus event and the generation of a response from the corresponding evidence accumulator. The time of the generated response is defined as the time at which the evidence accumulator activity ytype is greater than 0.1. This latency generally decreases with increasing speed for the neural response to stimulus onset ton and the establishment of directionality in a local area tdir. Because stimulus offset signals are preemptively generated, the latency actually decreases and can occur before actual stimulus motion offset at low speeds. The neural response to stimulus motion offset has a lower latency than the neural response to stimulus motion onset in this model, especially at low speeds, in accordance with the results of a VEP analysis paired with a reaction time experiment (Kreegipuu and Allik 2007). At high speeds all latencies increase because the stimulus moves too quickly to strongly activate the circuit.
Figure 4(d) shows a measure of model selectivities stype for simulations run with different stimulus speeds. Model selectivity corresponds to peak evidence accumulator activity over a simulation (Eqs. (12)–(17)). All selectivities are limited at high and slow stimulus speeds because LGN does not respond to these stimuli (Fig. 2), and at high speeds the input never remains at a position long enough to activate model cells. Onset and offset selectivities are generally higher than directional selectivity because the nonlinear input term in the SRF (Eq. (4)) expands its directional input activities. “Incorrect” offset signals are large at low stimulus speeds (Fig. 4(b), solid gray curves), which decreases the evidence for an actual motion offset and lowers offset selectivity soff relative to onset selectivity son. At high stimulus speeds, the input is less directional because interneurons have no time to strongly inhibit the opposite direction; directional cells thus have low activity except at the last stimulus position, where no directional competition occurs. This asymmetry boosts offset selectivity relative to onset selectivity at high speeds.
Figure 5 shows that the speeds for which offset selectivity is greater than, approximately equal to, or lower than onset selectivity vary widely over the parameter space. For the parameters tested, however, onset selectivity is never higher than offset selectivity at low speeds, and onset selectivity is never lower than offset selectivity at high speeds, a trend that may be generically true by the model’s architecture. We have chosen a set of parameters that allows for a slight separation between onset and offset selectivities at low speeds and has high selectivity over a wide range of speeds.
3.2 Reaction times
Behavioral data on the perception of motion onset and offset comprises a set of reaction time studies that find an inverse relationship between stimulus speed and response time: the faster the stimulus moves, the shorter the response time of subjects recognizing that the stimulus changed (Dzhafarov et al. 1993; Kawakami et al. 2002; Kreegipuu and Allik 2007). These relationships take the general form RT = c ·v − β + r, where v is the velocity of the stimulus when it moves, β is a parameter controlling the convergence of reaction time to its minimum (generally chosen between 0.5 and 1), c is a scaling parameter, and r is an additive parameter independent of velocity.
Summary of simulation results
The model presented in this paper is built upon a bilocal motion detector; while we have used the Barlow-Levick model (Barlow and Levick 1965) based on evidence for its existence in primate visual areas (Livingstone 1998), the onset/offset layer will produce qualitatively similar results with any related motion detector, such as a correlative (Hassenstein and Reichardt 1956) or motion-energy (Adelson and Bergen 1985) model. Our motion detection scheme was derived from and can also form the basis of more complicated motion-processing models (Chey et al. 1998). The output of these motion detectors are gated by a short range filter before being used as the input to onset/offset cells. In our model the short range filter keeps spurious motion signals produced during motion onset from activating onset/offset cells, but it has been previously theorized for other reasons such as forming a speed-sensitive basis (Chey et al. 1998) and explaining psychophysical responses to transparent motion (Qian et al. 1994).
Our model of onset and offset cell connectivity is similar to a model of pigeon pretectal nucleus cell activities (Zhang et al. 2005). The Zhang et al. model explains how sustained cell activities can arise that linearly vary with stimulus acceleration rate. Their model input is a directional cell whose activity linearly increases as a result of some accelerating stimulus within the directional cell’s receptive field; this implies that stimulus speed changes appreciably while the stimulus is within the cell’s receptive field. Our model instead describes transient cell dynamics that occur when the stimulus speed changes dramatically across adjacent directional cell receptive fields. These two sets of results suggest that the same connectivity may produce either set of dynamics, contingent on the underlying directional cell properties.
Jumps in stimulus speed are associated with the perception of acceleration; the sudden appearance and constant movement of an object is perceived to decelerate from a faster speed, as if it were shot out of a cannon (Runeson 1974). The neural correlates of acceleration perception, however, have proven more elusive than those for motion itself. MT cell responses, for example, are generally insensitive to the rate of an accelerating stimulus (Price et al. 2005); attempts to explain acceleration tuning have so far focused on the response of a population of MT neurons with different rates of adaptation (Priebe and Lisberger 2002; Price et al. 2005; Schlack et al. 2007). Acceleration-sensitive cells have been found neither in cat V1, V2, nor in the posteromedial lateral suprasylvian area (PMLS) (Price et al. 2006). Neurons with analog sensitivity to acceleration have been found in the pigeon pretectal nucleus (Cao et al. 2004), which has been modeled by a similar mechanism to ours (Zhang et al. 2005) and may correspond well to neurons in the superior colliculus and other areas involved in retinal slip during smooth pursuit eye movements in primates.
The perception of kinetic contours, however, involves the detection of speed jumps, which produce transient responses from retinal cells in the tortoise (Thiel et al. 2007) to MT cells in the primate (Lisberger and Movshon 1999). Recorded transient responses to motion offset have always been decrements in firing rate, while our model predicts that cells exist whose firing rate increases at the offset of motion in a visual location displaced from their classical receptive field. While MT cells are too sensitive to motion to respond to motion discontinuities (Marcar et al. 1995), area V2 is both direction selective (Lu et al. 2010)) and selective for kinetic contours (Marcar et al. 2000). Possible neural analogs of model onset/offset detectors, then, could either be MT cells because they are the primary output of V1 Meynert and stellate cells (Maunsell and van Essen 1983), V2 cells because they receive directional input from V1 and respond to kinetic contours, a subset of cells in V1 layer six that receive both lateral input from layer six cells and feedback from layer four (Callaway 1998), or even cells in the superior colliculus.
4.2 Speed-dependent model responses
Onset and offset cell responses to constantly moving stimuli differ because their connectivity produces different responses to the same null stimulus sequence (Fig. 1(c) and (d)). Because offset cells are preemptively excited during constant stimulus motion, they activate selectively at speeds that are fast enough to only weakly activate directional cells. Depending on chosen model parameters, this biases offset cell speed tuning (Section 3.1) towards higher speeds than that of onset cells. This configuration of speed tunings is similar to MT cell speed tuning when presented with speed ramp stimuli (Schlack et al. 2007). The speed tuning bias of MT cells towards higher speeds for decelerating stimuli and lower speeds for accelerating stimuli is usually attributed to an adaptation effect (Price et al. 2005; Schlack et al. 2007), which may also be a general mechanism for creating an offset or rebound response that increases with stimulus strength (Carpenter and Grossberg 1981; Francis et al. 1994; Baloch et al. 1999). Transient MT cell responses are also tuned to slightly higher speeds than sustained responses, which could produce a shifted peak in their difference towards higher speeds (Lisberger and Movshon 1999, Fig. 3). One challenge for our model, however, is that onset latencies decrease with increased speed (Lisberger and Movshon 1999, Fig. 5); offset latencies are not reported. False model offset cell signaling occurs when the cell receives an excitatory input for a significant amount of time before it is inhibited. If this excitatory input arrives later for slower stimuli, then the excitatory and inhibitory signals can be better matched, which produces less false signaling and correspondingly less bias in offset cell speed tuning.
Because the speed tuning of the proposed model relies on the amount of undirectional input that either saturates or silences the circuit, this speed tuning can be modulated by contrast strength, reflected in undirectional cell activity level. While the perception of speed is dependent on contrast (Thompson 1982), we believe contrast will minimally affect the proposed circuit for most speeds because the neural correlates of undirectional cells (LGNm) exhibit strong contrast gain control (Benardete and Kaplan 1999). A more careful modeling study of speed tuning and speed discrimination, built on the Barlow-Levick circuit, explores this relationship with contrast in more detail (Chey et al. 1998).
4.3 Reaction times
If the neural correlates of model onset/offset cell populations strongly influence perceptual performance on simple reaction time tasks for the onset and offset of motion, then according to one reaction time model (Ratcliff 1978), responses should be inversely related to the “drift rate” of evidence accumulation. This inverse relation assumes that decision-making areas directly accumulate evidence from early visual areas, an idea which has some support (Shadlen and Newsome 2001). We assume that the activity level of evidence accumulators is inversely related to reaction time and that the accumulators compete with each other for access to their preferred decision and motor response. Our simulations qualitatively fit reaction time data (Kreegipuu and Allik 2007), but the fit is improved when using an inverse exponent β closer to 0.5 (not shown). This suggests either that the decision making process is noisy, which flattens the reaction time curve, or that reaction time is not dependent on early visual areas, but rather on areas that are selective for more complicated stimuli (Dzhafarov et al. 1993).
The neural response timing to preferred stimulus offset has been shown to be faster and more consistent within and across stimulus variations than the neural response to preferred stimulus onset (Bair et al. 2002). This has recently been found to have psychophysical consequences: subjects can better discriminate between two gratings that stop moving at different times than between two gratings that start moving at different times (Tadin et al. 2010, Experiment 2). On one hand, this discrimination result is nominally unrelated to manual reaction time; the smaller discrimination thresholds in offset timing asynchrony further highlight the paradox that a reliable signal used in a discrimination task also produces slower reaction times (Kreegipuu and Allik 2007). On the other hand, our model predicts that the transformation from an offset in one population to an active transient signal in another cell population is inherently noisy, which might affect both manual reaction times and discrimination tasks. The referenced discrimination task, however, was performed at a speed of high selectivity for the motion system (4.8°/s), where our model predictions apply mainly at slow stimulus speeds.
The proposed model has longer reaction times for motion offset at slow speeds than for high speeds. While this effect was not always reported for reaction time studies over large speed ranges (Dzhafarov et al. 1993; Kawakami et al. 2002), it has been reported for studies of reaction time at the lowest speeds detectable by human subjects (Kreegipuu and Allik 2007). This difference was reported to be primarily a vertical shift, which corresponds to similar speed tunings but different selectivities. This model suggests that onset and offset selectivities are different mainly in speed tuning ranges, which would correspond to a horizontal shift in reaction time curves.
Because directional cells are not activated in this model by extremely fast stimuli, reaction times are expected to increase again at high speeds (uptick in Fig. 6(d)). Subjects perform well on both motion onsets and offsets at high speeds, with a negligible increase in reaction time for dots moving at 500°/s (Kawakami et al. 2002). At these speeds, however, a sudden stimulus change will produce a salient signal in other, undirectional circuits that can drive the perceptual and motor response. For example, a static noise image replaced by snow on a cathode ray tube display is a detectable stimulus change that does not create a consistent directional percept. This discrepancy may also be accounted for by using a more sophisticated population model, where small subpopulation responses, such as those from cells that respond selectively to high speeds (> 100°/s), are enhanced relative to the rest of the population.
The detection of motion onset or offset, while possibly the basis of kinetic contour perception, is not necessarily its equivalent. Multiple local events have to be spatially integrated to produce a contour defined by local changes like speed jumps (Shipley and Kellman 1994). Even for displays containing motion onsets and offsets, a perception of occlusion and amodal persistence occurs only when speed jumps occur for texture elements on only one side of a boundary (Kaplan 1969). A better understanding of the ecological constraints of kinetic occlusion will help guide research on the mechanisms that link neural responses to motion onsets and offsets to the amodal perception of kinetically occluded surfaces.
We thank Florian Raudies and four anonymous reviewers for their valuable comments on earlier versions of this manuscript. T.B. and E.M. were supported in part by the Center of Excellence for Learning in Education, Science and Technology (CELEST), a National Science Foundation Science of Learning Center (NSF SBE–0354378 and NSF SMA–0835976). E.M. was also supported in part by the Office of Naval Research (ONR N00014–11–1–0535).
This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.
- Carpenter, G. A., & Grossberg, S. (1981). Adaptation and transmitter gating in vertebrate photoreceptors. Journal of Theoretical Neurobiology, 1, 1–42.Google Scholar
- Grossberg, S. (1973). Contour enhancement, short term memory, and constancies in reverberating neural networks. Studies in Applied Mathematics, 52(3), 213–257.Google Scholar
- Hassenstein, B., & Reichardt, W. (1956). Systemtheoretische analyse der zeitreihenfolgen und vorzeichenauswertung bei der bewegungsperzeption des rüsselkäfers chlorophanus. Zeitschrift für Naturforschung, 11b(9–10), 513–524.Google Scholar
- MATLAB (2010). Version 7.10 [Computer software]. Natick, MA: The Mathworks.Google Scholar
- Michotte, A., Thinès, G., & Crabbé, G. (1991). Amodal completion of perceptual structures. In G. Thinès, A. Costall, & G. Butterworth (Eds.), Michotte’s experimental phenomenology of perception (pp. 140–167). Hillsdale, NJ: Erlbaum.Google Scholar
- Rubin, E. (1921). Visuaell wahrgenommene Figuren [figure and ground]. In D. C. Beardslee & M. Wertheimer (Eds.), Readings in perception (pp. 194–203). Princeton, NJ: Van Norstrand Co., Inc.Google Scholar
- Srinivasan, M. V., Lehrer, M., & Horridge, G. A. (1990). Visual figure-ground discrimination in the honeybee: The role of motion parallax at boundaries. Proceedings of the Royal Society of London. Series B, Biological Sciences, 238(1293), 331–350.Google Scholar
- van Doorn, A. J., & Koenderink, J. J. (1982). Spatial properties of the visual detectability of moving spatial white noise. Experimental Brain Research, 45(1–2), 189–195.Google Scholar
- van Doorn, A. J., Koenderink, J. J., & van de Grind, W. A. (1984). Limits in spatio-temporal correlation and the perception of visual movement. In A. J. van Doorn, W. A. van de Grind, & J. J. Koenderink (Eds.), Limits in perception (pp. 203–234). VSP.Google Scholar