Introduction

Retinotopic and non-retinotopic representations in human vision

The optics of the eye map neighboring points in the environment to neighboring photoreceptors in the retinae, and these neighborhood relations, known as retinotopic organization, are preserved in early visual cortical areas. Under normal viewing conditions, due to the movements of the observer’s body, head, eyes, and due to the movements of objects, the stimuli impinging on retinotopic representations are highly dynamic and unstable. Thus, understanding ecological vision requires an understanding of how visual processes operate under these dynamic conditions. In order to explain the phenomenal stability of our environment, it is often postulated that the brain constructs non-retinotopic representations wherein the ego-centric representations (i.e., based on the observer, such as retinotopic representations) are transformed into exo-centric representations (i.e., based outside of the observer, such as spatiotopic representations). However, determining whether a given visual process operates in retinotopic or non-retinotopic representations and which visual processes operate in non-retinotopic representations remains one of the fundamental challenges in understanding ecological vision.

Retinotopy of visual masking assessed with the Saccadic Stimulus Presentation Paradigm

Saccadic eye movements constitute a major source for retinotopic instability. However, during these eye movements, the world appears phenomenally stable suggesting that retinotopic shifts caused by saccades are either dismissed or compensated by the visual system. Theories of dismission propose that very little information is kept from one saccade to another and vision starts tabula rasa after each saccade. Theories of complete compensation propose that all information is remapped across the saccade by taking into account the global shift caused by the saccade. Theories that take intermediate positions between these two extremes have also been proposed (Bridgeman, van der Heijden, & Velichkovsky, 1994). In general, compensation theories rely on three mechanisms: 1) before the initiation of the saccade, retinotopic information is stored in memory, 2) during and after the saccade, retinotopic information is suppressed to prevent inappropriate integration of pre- and post-saccadic images, 3) after the saccade, the new image is integrated with the contents of the memory by taking into account the retinotopic shift caused by the saccade. Because saccadic shifts take in general few tens of milliseconds, sensory (iconic) memoryFootnote 1 and backward maskingFootnote 2 have been viewed as the major candidates to perform the memorization and suppression tasks, respectively.

Several studies investigated whether sensory memory, masking, and information integration occur in retinotopic or non-retinotopic coordinates across saccadic eye movements.Footnote 3 The Saccadic Stimulus Presentation Paradigm (SSPP) has been the classical experimental paradigm for these studies (Davidson, Fox, & Dick, 1973; Irwin, 1991; McRae, Butler, & Popiel, 1987; Melcher & Morrone, 2003). Figure 1 shows the SSPP paradigm used by Davidson et al. (1973) to investigate retinotopic versus non-retinotopic bases of backward masking.

Fig. 1
figure 1

Saccadic Stimulus Presentation Paradigm (SSPP) used by Davidson et al. (1973). The observer makes a saccade from the first to the second fixation. Target stimuli, consisting of five letters are presented briefly just before the initiation of the saccade. A ring mask is presented after the saccade. The light gray target letters at the bottom show the relative position of the mask with respect to the target letters. In the actual stimulus, letters were only presented before the saccade. As one can see from the figure, the ring mask surrounds letter V according to spatiotopic coordinates and letter Y according to retinotopic coordinates. The nonoverlapping ring mask corresponds to the metacontrast condition. The experiments also had an overlapping pattern to examine pattern masking by structure

The observer is asked to make a saccade from one fixation point to a second one. Target stimuli (five letters) are presented briefly before the saccade, followed by a mask stimulus (either a nonoverlapping ring, as in Fig. 1, or an overlapping pattern) presented after the saccade. As depicted in Fig. 1, the mask stimulus surrounds (or covers) different letters according to retinotopic and non-retinotopic (spatiotopic) coordinates. By measuring which of the two letters is suppressed from perception, one can infer whether this mask operates in retinotopic or non-retinotopic coordinates. Davidson et al. (1973) reported that the mask suppressed the letter that shared its retinotopic coordinates but appeared to occupy the same position as the letter that shared its spatiotopic coordinates. They proposed that there is retinotopic visible persistence at which trans-saccadic masking occurs, and a spatiotopic sensory memory at which trans-saccadic integration occurs. In a follow-up study, McRae et al. (1987) reported not only retinotopic but also spatiotopic masking. They suggested that the transition from retinotopic to spatiotopic representations takes time and the reason Davidson et al. (1973) did not find evidence for spatiotopic masking could be the relatively shorter Inter-Stimulus Interval (ISI) used by Davidson et al. (ca. 80 ms) compared with the ISI used in McRae et al.’s study (153 ms). That masking is retinotopic at short ISIs also was confirmed by Irwin, Brown, and Sun (1988). These authors also presented evidence for spatiotopic memory integrating information across saccades. However, their data suggested that spatiotopic integration of information were rather abstract depending on position and identity information rather than detailed image-like fusion of trans-saccadic stimuli (van der Heijden, Bridgeman, & Mewhort, 1986).

The observation of shifts of neuronal receptive fields in the direction of intended saccades (Duhamel, Colby, & Goldberg, 1992) generated a renewed interest for the problem of visual stability across saccades from this perspective (Melcher & Colby, 2008; Wurtz, 2008; Cavanagh, Hunt, Afraz, & Rolfs, 2010; Melcher, 2011). This “remapping of receptive fields” has been associated with shifts in the perceived positions of peri-saccadically presented targets (Ross, Morrone, Goldberg, & Burr, 2001). De Pisapia, Kaunitz, and Melcher (2010) suggested that these shifts, in turn, can help a target stimulus escape from masking. Moreover, they have presented evidence for spatiotopic masking for ISIs shorter (48 ms) than the ISIs reported in previous studies. Hunt and Cavanagh (2011) presented a brief target before the saccade followed by a long-duration mask that turned on before the saccade and remained on after the saccade until the subject responded. With this paradigm, Hunt and Cavanagh (2011) showed masking when the mask was presented at the post-saccadic retinotopic coordinates of the location where the target was presented. Taken together, these studies paint a complex picture for the retinotopy of masking. Part of the reason for this complexity may be due to the fact that many of the studies used different types of targets, masks and widely different parameters. It is known that masking is not a unitary phenomenon (Bachmann, 1994; Breitmeyer & Öğmen, 2006), and the differences between the studies may be due to differences in the types of masking functions and mechanisms evoked in different studies. Notwithstanding this issue, these studies show that SSPP provides a powerful method for exploring retinotopy of visual masking across saccades. However, SSPP involves eye-movement related processes, such as saccadic suppression and efference copy, and cannot be employed to study retinotopy of visual masking independent of eye movements.

Retinotopy of visual masking in the absence of eye movements

In a recent study, Lin and He (2012) investigated the retinotopy of masking in the absence of eye movements by using a modified version of object-specific reviewing paradigm (Kahneman, Treisman, & Gibbs, 1992). We use an alternative method that directly pits retinotopic predictions against non-retinotopic predictions. We will first introduce our approach and compare in Section “Discussion” our methods and findings to those of Lin and He (2012).

The method that we have proposed for exploring non-retinotopic processing is based on the Ternus-Pikler paradigm (Öğmen, Otto, & Herzog, 2006; Boi, Ogmen, Krummenacher, Otto, & Herzog, 2009). The Ternus-Pikler display is an apparent motion stimulus, introduced by Gestalt psychologists about a century ago, and employed extensively since then to study the spatio-temporal aspects of human vision (Petersik & Rice, 2006; Pikler, 1917; Ternus, 1926). Figure 2 shows how the Ternus-Pikler stimulus has been adopted for studying non-retinotopic attribution of stimulus features (Öğmen et al., 2006). The basic Ternus-Pikler paradigm (Fig. 1a) consists of two display frames separated by a blank frame (ISI). The two display frames are identical, except all elements in Frame 2 are shifted by one inter element distance with respect to the elements in the first frame.

Fig. 2
figure 2

Ternus-Pikler paradigm for exploring non-retinotopic feature processing. (a) Ternus-Pikler Display: two display frames are separated by a blank interval called Inter Stimulus Interval (ISI). The two display frames are identical, except all elements in Frame 2 are shifted by one inter element distance with respect to the elements in the first frame. (b) Element Motion: For short ISIs (e.g., 0 ms) observers perceive the leftmost element in Frame 1 to be moving to the position of the rightmost element in Frame 2. In this case, no motion is perceived for the two central elements. (c) Group Motion: For long ISIs (e.g., 100 ms) observers perceive all elements to be moving as a group. (d) Results: A Ternus–Pikler display presented with an ISI of either 0 or 100 ms. The central element in the first frame included a small offset called the “probe-vernier.” Observers attended to one of the elements of the second frame labeled as 1, 2, or 3. e. Control Stimulus: Only the elements that overlapped in the two Ternus-Pikler frames were shown, i.e., the leftmost element of the first and the rightmost element of the second frame of the stimulus shown in D were not displayed. No motion percept was elicited (adapted from Öğmen et al., 2006)

Depending upon the ISI, two different types of motion are perceived (Pantle & Picciano, 1976). For short ISIs (e.g., 0 ms) observers perceive Element Motion, in which the leftmost element in Frame 1 is perceived to be moving to the position of the rightmost element in Frame 2 (Fig. 2b). In this case, no motion is perceived for the other two elements. For long ISIs (e.g., 100 ms) observers perceive Group Motion, in which all elements are perceived to be moving together as a group (Fig. 2c). To study non-retinotopic feature attribution, a simple feature called a vernier offset is inserted into the central element of the first frame (Fig. 2d). A purely retinotopic hypothesis predicts that features of the central element in Frame 1 should be integrated into the leftmost element of Frame 2 for all ISIs within the window of temporal integration. Hence, performance should be well above chance level when subjects are asked to report the direction of the probe vernier perceived in the leftmost element in Frame 2, and near chance for the other elements in Frame 2. However, it was shown that performance depends on the ISI. When group motion is established between the Ternus-Pikler elements (ISI = 100), performance is well above chance when subjects are asked to report the perceived direction of the vernier offset in the central element in Frame 2, and near chance for other elements (Fig. 2d). On the other hand, in the case of element motion (ISI = 0), performance is higher when subjects report the vernier offset perceived in the leftmost element in Frame 2. The illusory attribution of the vernier offset also depends critically on the elicitation of a motion percept. If the leftmost line of the first frame and the rightmost line of the second frame are omitted (Fig. 2e), no apparent motion is induced since the remaining elements spatially overlap. In this control display, percentage of responses in agreement with the probe-vernier is high only for the element labeled 1 and at chance level for element 2 for both ISIs. These results indicate that feature attribution between elements of consecutive Ternus-Pikler display frames is governed according to motion-induced grouping; i.e., according to the dashed arrows in Fig. 2b and c. In other words, perceived motion of the Ternus-Pikler elements serves as a non-retinotopic reference frame for feature attribution between elements of the Ternus-Pikler display frames.

In the present study, we use a similar Ternus-Pikler paradigm to probe retinotopic and non-retinotopic bases of visual masking and to assess non-retinotopic perception during masking.

General methods and materials

All visual stimuli were generated via a Visual Stimulus Generator (VSG 2/5) card manufactured by Cambridge Research Systems. The stimuli were displayed on a 22-inch color monitor set at a resolution of 800 × 600 with a refresh rate of 100 Hz. Subject responses were collected by means of a joystick connected to the computer hosting the VSG card. The distance between the observer and the monitor was fixed at 0.5 m, and a head/chin rest was utilized to minimize subject head motion during the experiments. Observers were asked to maintain a stable gaze at a fixation cross that remained visible throughout the experiment at the center of the monitor. Our previous experiments indicate that observers are able to keep a stable fixation while viewing the Ternus-Pikler displays (Boi et al., 2009). Nevertheless, to completely rule out the involvement of eye movements, we ran control experiments with eye movement monitoring and the results of these experiments are provided in Appendix. All experiments were conducted in a dimly lit room. Background and Ternus-Pikler disk/square luminances for all experiments were set at 25 cd/m2 and 10 cd/m2 respectively. Target and mask luminance levels were chosen at 30 or 40 cd/m2 depending upon subject thresholds for optimum masking. Four participants with age range from 25 to 37 years, of whom three were naïve to the purpose of the study, took part in the experiments. The experiments were conducted according to a protocol approved by the University of Houston Committee for the Protection of Human Subjects. Informed consent was obtained from every participant, and practice trials were conducted to familiarize the observers with experimental procedures. The results of practice trials were not included in the data analysis.

Experiment 1: Retinotopy of metacontrast masking

Methods

In this experiment, we utilized a radial Ternus-Pikler display to study retinotopy of metacontrast masking. Two display frames, each of which contained two discs and a central square aligned on the perimeter of an invisible circle centered at the fixation, were displayed sequentially to create perception of radial Ternus-Pikler apparent motion (Fig. 3a and b). The radius of this virtual circle was fixed at 2.5 degrees in all experiments. The target-mask combination shown in Fig. 3c was displayed at variable Stimulus Onset Asynchronies (SOAs). The target was predictably presented at the center of the square in the first frame of the Ternus-Pikler sequence, and subjects were asked to attend and report the location of the missing corner on the target diamond (left/right). Depending upon the spatial location of the mask within Frame 2, retinotopic and non-retinotopic masking effects were distinguished. Figure 3a displays the case of Retinotopic Mask Condition, where the mask in Frame 2 was presented at the same retinotopic location as that of the target diamond in Frame 1. Note that in the absence of eye movements, retinotopic and spatiotopic masking conditions are equivalent. Figure 3b, on the other hand, depicts the case of Non-Retinotopic Mask Condition. In this case, the mask was displayed in the central square of Frame 2. The two squares presented in Frames 1 and 2 of the Ternus-Pikler sequence correspond to one another only when group motion is established. In addition to the retinotopic and non-retinotopic mask experiment conditions, two control conditions were included in this experiment. In the Static Control Condition, masking functions were obtained for individual subject in the absence of Ternus-Pikler motion. Under this condition, the Ternus-Pikler elements remained visible throughout the experiment at the same spatial location as that of Frame 1 in Fig. 3a or b. In the No-Mask Control Condition, target was shown in the absence of the mask.

Fig. 3
figure 3

Experiment 1: Two display frames, each of which contained two disks and a square, were displayed sequentially to create the perception of radial apparent motion. The blank ISI frame is not displayed in this figure for the sake of simplicity. Subjects were asked to report the location of the missing corner on the target diamond, shown at the center of the middle square in the first frame. (a) Retinotopic Mask Condition: The mask was displayed in Frame 2, at the same spatial location as that of the target diamond in Frame 1. (b) Non-Retinotopic Mask Condition: The mask was displayed in the central square of Frame 2, which corresponded with the central square of Frame 1, only when disks were perceived to be in group motion. (c) Spatial Parameters of the Target and Mask: Variable “x” represents the size of the probe gap, which was varied to meet individual subject threshold requirements. (d) Timing Diagram: The ISI was fixed (0 or 40 ms) per block, and the target predictably appeared just before the ISI. The mask presentation time was randomized from trial to trial in order to allow for different ISI-SOA combinations per block

Spatial parameters of the target and mask are displayed in Fig. 3c. Variable “x” represents the size of the probe gap, which was varied in the range of 12’ to 25’ to meet each individual subject’s masking threshold requirements. The square and the disk in the Ternus-Pikler stimulus had, respectively, a side and a diameter equal to 1.5°. Figure 3d displays the timing diagram of a typical trial. The ISI was fixed (0 or 40 ms) for each experimental block, and the target always appeared just before the ISI. Target and mask stimuli were presented for 10 and 20 ms, respectively. Mask onset time was randomized from trial to trial to allow for different ISI-SOA combinations per block. The Ternus-Pikler disks were displayed at an eccentricity of 4° and for a duration of 120 ms. As shown Fig. 3d, the Ternus-Pikler ISI limits the shortest masking SOA that can be used. Therefore, eccentricity, background luminance, Ternus-Pikler element shapes, and target/mask/disk contrasts were chosen in such a way that Ternus-Pikler group motion was perceived by all observers at a relatively short ISI (40 ms), whereas strong masking effect was observed at the corresponding SOA (50 ms). In this study, we used only one contrast-polarity for targets and masks. The luminance values for the target, mask, disk, and background were 40, 40, 10, and 50 cd/m2, respectively. Thus, with respect to the disk within which they were presented, the target and mask had positive contrast polarity. Based on previous research, we would expect quantitatively different but qualitatively similar results for other contrast polarity combinations (Breitmeyer, Tapia, Kafalıgönül, & Öğmen, 2008). Ternus-Pikler radial motion (upward or downward) also was randomized from trial to trial. In a two-alternative forced-choice design, three naïve observers as well as one of the authors reported the perceived location of the missing corner of the target diamond (left/right).

Note that in the non-retinotopic mask condition (Fig. 3b), the target and mask always stimulate distinct retinal areas. The retinotopic hypothesis predicts no masking effect for this condition, regardless of stimulus timing and Ternus-Pikler grouping. The non-retinotopic hypothesis predicts that in such a case, masking effect follows stimulus timing and the perceived Ternus-Pikler motion. As such, if the Ternus-Pikler elements are perceived to be in element motion, the non-retinotopic prediction is same as the retinotopic hypothesis. However, when Ternus-Pikler elements are perceived to be in group motion, the non-retinotopic hypothesis predicts masking effect for non-retinotopic mask condition instead. Figure 4 summarizes the respective predictions of retinotopic and non-retinotopic hypotheses, based on the perceived motion grouping of the Ternus-Pikler disks.

Fig. 4
figure 4

Predictions of Retinotopic and Non-Retinotopic Hypotheses for Experiment 1: Panels (a) and (c) depict predictions of the retinotopic hypothesis. Masking effect is expected only in retinotopic mask experiment condition, regardless of the perceived Ternus-Pikler motion (group or element). Panels (b) and (d) depict predictions of the non-retinotopic hypothesis. Masking effect for each experiment condition is expected to depend on perceptual grouping of Ternus-Pikler disks

Results

Figure 5a shows performance as a function of the SOA for the static control condition. Repeated measures ANOVA indicates significant masking effect (F1,3 = 77.06; p = 0.003; ηp 2 = 0.963). Metacontrast masking function dips at SOA = 40 ms, indicating type-B masking function (F5,15 = 5.71; p = 0.004; ηp 2 = 0.658).

Fig. 5
figure 5

Metacontrast masking. Percentage of correct responses in detecting the missing corner of the target diamond (left/right), averaged across the four observers. (a) Static control condition. Performance is near chance at an SOA of 40 ms with a Type-B masking function. (b) Element-Motion (ISI = 0 ms). Performance is near chance at SOAs near 40 ms only in the retinotopic mask condition. (c) Group-Motion (ISI = 40 ms). Masking is observed only for the retinotopic mask. Error bars correspond to ±1 SEM. In the case of No Mask condition, ±1 SEM are shown by gray horizontal lines

Figure 5b shows performance as a function of SOA for element motion condition (ISI = 0 ms). Two-way repeated measures ANOVA shows significant effect of the mask condition (F2,6 = 29.4; p = 0.012; ηp 2 = 0.907), as well as the SOA (F8,24 = 45.89; p = 0.007; ηp 2 = 0.939). However, when retinotopic mask condition is removed from the analysis, metacontrast masking effect (F1,3 = 0.214; p = 0.675), as well as the SOA effect become insignificant (F8,24 = 36.06; p = 0.497). Figure 5c shows performance as a function of the SOA, when Ternus-Pikler disks are perceived to be in group motion (ISI = 40 ms). Once again, significant effect of the mask (F2,6 = 126.09; p < 0.001; ηp 2 = 0.977) and the SOA (F4,12 = 5.49; p = 0.009; ηp 2 = 0.647) are observed. Removal of the retinotopic mask condition from the analysis, once again, renders both masking (F1,3 = 1.000; p = 0.391) and SOA (F4,12 = 1.876; p = 0.179) effects insignificant.Footnote 4

Discussion

These results show clearly that metacontrast masking in the absence of eye movements is retinotopic. Regardless of the perceived motion of the Ternus-Pikler disks, the retinotopic mask significantly masks the target at optimum SOAs, whereas the non-retinotopic mask has no significant effect on performance. To generalize this result across mask and masking function types, we used a spatially overlapping mask that shared structural similarity with the target in the next experiment. In this structure masking paradigm, we chose a strong mask to generate a Type-A (i.e., monotonic) backward masking function instead of the Type-B (i.e., non-monotonic) backward masking function obtained in the metacontrast experiment.

Experiment 2: Retinotopy of structure masking with type-a masking function

Methods

Experimental design and procedures of Experiment 2 were identical to those of Experiment 1, with the exception of the target and mask. The target consisted of a square outline missing one side. Three bars were aligned on the screen, as depicted in Fig. 6, to form the target. The missing bar was randomly placed at the top or bottom of the square. The mask consisted of a collection of random horizontal and vertical bars, i.e., shared the same structural components as the target, to generate masking by structure (Fig. 6).

Fig. 6
figure 6

Stimuli and Respective Parameters for Experiment 2: (a) Retinotopic and (b) Non-retinotopic masking conditions. (c) The target consisted of a square outline missing one side. Three bars were aligned on the screen to form the target. The missing bar was randomly placed at the top or bottom of the square. The mask consisted of a collection of random horizontal and vertical bars. (d) Stimulus timing was identical to that of Experiment 1

Figure 6d displays the timing diagram of a typical trial in Experiment 2. As in Experiment 1, the ISI was fixed (0 or 40 ms) for each experimental block and the target always appeared just before the ISI. Mask presentation time was again randomized from trial to trial to allow for different ISI-SOA combinations per block. Background luminance, Ternus-Pikler element shapes, and target/mask/disk contrasts were chosen as explained in Experiment 1. Ternus-Pikler radial motion (upward or downward) also was randomized from trial to trial. In a two-alternative forced-choice design, three naïve observers and one of the authors reported the perceived location of the missing side of the target square (up/down).

Results

Figure 7a shows performance as a function of the SOA for the static control condition. As expected, and supported by the significant effect of the mask condition (F1,3 = 64.58; p = 0.004; ηp 2 = 0.958), as well as the SOA (F5,15 = 20.24; p < 0.001; ηp 2 = 0.871), a strong type-A masking function was found. Figure 7b shows performance as a function of the SOA for the element motion condition (ISI = 0 ms). Two-way repeated measures ANOVA shows significant effect of the mask condition (F2,6 = 76.45; p < 0.001; ηp 2 = 0.962), as well as the SOA (F4,12 = 5.42; p = 0.010; ηp 2 = 0.644). However, when retinotopic mask condition is removed from the analysis, metacontrast masking effect (F1,3 = 0.109; p = 0.763), as well as the SOA effect become insignificant (F4,12 = 1.06; p = 0.415). Figure 7c shows performance when Ternus-Pikler disks are perceived to be in group motion (ISI = 40 ms). Once again, two-way repeated measures ANOVA shows significant effect of the mask condition (F2,6 = 26.29; p < 0.001; ηp 2 = 0.898), as well as the SOA (F4,12 = 8.02; p = 0.002; ηp 2 = 0.728). However, when retinotopic mask condition is removed from the analysis, metacontrast masking (F1,3 = 0.134; p = 0.738), as well as the SOA (F4,12 = 2.151; p = 0.137) effect become insignificant.

Fig. 7
figure 7

Masking by structure. Percentage of correct responses in detecting the missing side of the target square (up/down), averaged across the four observers. (a) Static control condition. Performance is near chance at SOA of 0 ms with a Type-A masking function. (b) Element-Motion (ISI = 0 ms). (c) Group-Motion (ISI = 40 ms). Error bars correspond to ±1 SEM. In the case of No Mask condition, ±1 SEM are shown by gray horizontal lines

Discussion

Together, the results of Experiments 1 and 2 show that backward masking is retinotopic and this finding holds for metacontrast and structure masking, as well as, for type-A and type-B masking functions.

In a recent study, Lin and He (2012) investigated the retinotopy of masking by using a modified version of object-specific reviewing paradigm (Kahneman et al., 1992). A rectangular object (frame) was presented for a preview period of 200 ms. The target was presented during the last 10 ms of this preview period in one of the two sides of the rectangle. This rectangular frame was then shifted to a new location and displayed for another 200 ms. The mask stimuli were presented during the first 30 ms of the shifted frame. One side of the frame contained a weak mask and the other side contained a strong mask. Neither mask occupied the same retinotopic location as the target but one of the masks occupied the same rectangle-relative position as the target (i.e., the same side). Observers performed worse when the strong mask occupied the same relative position as the target. Lin and He interpreted this finding as evidence for non-retinotopic frame-centered backward masking. While this interpretation is plausible, it is difficult to make inferences about masking without observing the complete masking functions and comparing directly retinotopic, non-retinotopic, and baseline conditions. At the single short SOA of 10 ms (corresponding to ISI = 0 ms) used in the experiment, it is difficult to assess whether the difference in performance across the two mask types is due to masking per se or other factors. In our experiments, we included baseline no-mask measures, multiple SOA values to reveal the full typical type-A and type-B masking functions and compared directly retinotopic and non-retinotopic masking conditions according to two different motion grouping conditions. Our results reveal only retinotopic masking.

Previous studies showed that features of a masked target can be observed as being part of the mask stimulus (Werner, 1935; Wilson & Johnson, 1985; Herzog & Koch, 2001; Otto, Ogmen, & Herzog, 2006; Öğmen et al., 2006; Breitmeyer, Herzog, & Ogmen, 2008). As indicated in Fig. 1, the vernier offset of the target in the first frame can be observed on the mask stimulus shown in the second frame even though no vernier is presented at this element nor at this retinotopic location. Similarly, by using the sequential metacontrast paradigm, we have shown that features of a target, whose visibility is suppressed, can nevertheless be perceived along motion streams to which the target belongs (Otto et al., 2006; Herzog, Otto, & Ogmen, 2012). Our studies showed that the attribution of the target’s features to the mask stimulus is a consequence of motion grouping rather than masking itself (Öğmen et al., 2006; Breitmeyer et al., 2008). The goal of the next experiment was to study this motion-dependent non-retinotopic feature attribution in masking.

Experiment 3: Non-retinotopic feature attribution

In some trials of Experiments 1 and 2, subjects informally reported perceiving the target to be moving with the Ternus-Pikler elements, as one would expect from non-retinotopic feature attribution. In such cases, the target could be perceived at spatial locations different from where the target stimulus was actually presented. To formally study this, we removed the motion ambiguity from Experiments 1 and 2, and instructed our subjects to spread their attention as discussed in the following section, so as to facilitate the read-out of non-retinotopic feature attribution.

Methods

Experimental design and procedures of Experiment 3 were identical to those of Experiments 1 and 2, with the exception of the Ternus-Pikler motion. The Ternus-Pikler motion was made predictably upwards in all trials, and the subjects were instructed to spread their attention to the central Ternus-Pikler square in both display frames. The target and mask design was identical to those of Experiments 1 and 2 for the respective metacontrast and structure masking conditions. Stimulus timing was also chosen to match those of the previous two experiments. Once again, the ISI was fixed (0 or 40 ms) for each experimental block, and the target always appeared just before the ISI. Mask presentation time, was again randomized from trial to trial to allow for different ISI-SOA combinations per block. Background luminance, Ternus-Pikler element shapes, and target/mask/disk contrasts were chosen to match those of the previous experiments. The Ternus-Pikler radial motion was fixed (upward) in all trials to remove motion ambiguity. In a two-alternative forced-choice design, three naïve observers as well as one of the authors reported the perceived missing corner of the target diamond or the location of the missing side of the target square for metacontrast and structure masking conditions, respectively.

Results A: metacontrast masking

Figure 8a shows performance as a function of the SOA, when Ternus-Pikler disks are perceived to be in element motion (ISI = 0 ms). Two-way repeated measures ANOVA shows significant effect of the mask condition (F2,6 = 102.09; p < 0.001; ηp 2 = 0.971), as well as the SOA (F8,24 = 4.587; p = 0.002; ηp 2 = 0.605) when all mask conditions are included in the analysis. When retinotopic mask condition is removed from the analysis, metacontrast masking effect (F1,3 = 3.183; p = 0.172), as well as the SOA effect become insignificant (F8,24 = 0.972; p = 0.480). These findings were in accordance with our findings reported in our previous experiments. However, when the disks are perceived to be in group motion (Fig. 8b), masking effect (F2,6 = 1.509; p = 0.294) as well as the SOA effect (F4,12 = 0.738; p = 0.584) become insignificant, even when both retinotopic and non-retinotopic mask conditions are included in the analysis.

Fig. 8
figure 8

Metacontrast masking with predictable Ternus-Pikler motion. The observers attended to the central Ternus-Pikler square in both display frames. Percentage of correct responses in detecting the missing side of the target diamond (left/right), averaged across the four observers. (a) Element-Motion (ISI = 0 ms). Performance is near chance at SOA of 10 ms only in the retinotopic mask condition. (b) Group-Motion (ISI = 40 ms). No masking is observed. Error bars correspond to ±1 SEM. In the case of No Mask condition, ±1 SEM are shown by gray horizontal lines

Results B: masking by structure

Similar pattern of results was observed when a structure mask was utilized in the presence of predictable Ternus-Pikler disk motion. Figure 9a shows performance as a function of the SOA when the disks are perceived to be in element motion (ISI = 0 ms). Two-way repeated measures ANOVA shows significant effect of the mask condition (F2,6 = 10.250; p = 0.012; ηp 2 = 0.774), as well as the SOA (F8,24 = 3.569; p = 0.007; ηp 2 = 0.543) when all mask conditions are included in the analysis. When retinotopic mask condition is removed from the analysis, metacontrast masking effect (F1,3 = 0.283; p = 0.631), as well as the SOA effect (F8,24 = 0.852; p = 0.568) become insignificant. These findings were in accordance with our findings reported in our previous experiments. However, when the disks are perceived to be in group motion (Fig. 9b), masking effect (F2,6 = 4.351; p = 0.068) as well as the SOA effect (F4,12= 1.937; p = 0.169) become insignificant, even when both retinotopic and non-retinotopic mask conditions are included in the analysis.

Fig. 9
figure 9

Masking by structure with predictable Ternus-Pikler motion. The observers attended to the central Ternus-Pikler square in both display frames. Percentage of correct responses in detecting the missing side of the target square (up/down), averaged across the four observers. (a) Element-Motion (ISI = 0 ms). Performance is near chance at SOA of 10 ms only in the retinotopic mask condition. (b) Group-Motion (ISI = 40 ms). No masking is observed. Error bars correspond to ±1 SEM. In the case of No Mask condition, ±1 SEM are shown by gray horizontal lines

Discussion

In agreement with the results found in experiments 1 and 2, retinotopic masking is observed when the Ternus-Pikler disks are perceived in element motion (ISI = 0 ms). However, in contrast to the results found in our previous two experiments, when observers can focus their attention to the Ternus-Pikler element in the second frame which is grouped with the Ternus-Pikler element in the first frame containing the target, they can identify the target based on its continued appearance along the motion path of the element containing the target. This finding is in agreement with our previous results from sequential metacontrast (Otto et al., 2006; Herzog et al., 2012) and Ternus-Pikler display (Fig. 1). Informal reports of our subjects state that a faded, but complete copy of the target is perceived at the non-retinotopic destination, in accordance with the motion of the stimulus.

General discussion

The functional significance of retinotopic masking in the absence of eye movements can be understood by considering how the visual system analyzes the form of moving targets. Under normal viewing conditions, a briefly presented stimulus remains visible for approximately 120 ms after the stimulus offset (Haber & Standing, 1970; Coltheart, 1980). Due to this visible persistence, one would expect moving objects to appear highly blurred with a comet-like trailing smear. Yet our normal perception of objects in motion is relatively clear and sharp (Ramachandran, Rao, & Vidyasagar, 1974; Bex, Edgar, & Smith, 1995; Westerink & Teunissen, 1995; Hammett, 1997), a phenomenon known as motion deblurring (Burr, 1980; Hogben & Di Lollo, 1985; Chen, Bedell, & Ogmen, 1995; Burr & Morgan, 1997). We have proposed a theory according to which dynamic form computation relies on a synergy between retinotopic masking and motion-based reference frames (Öğmen, 2007; Öğmen & Herzog, 2010). According to this theory, masking and motion mechanisms play complementary roles: Masking operates in retinotopic representations to control motion blur and motion mechanisms provide the reference frame used to compute non-retinotopically features of moving targets. This is in contrast to theories suggesting that motion deblurring, dynamic form perception, and masking all result from motion mechanisms (Burr, 1980; Burr, Ross & Morrone, 1986) and those suggesting that computation of features and masking are linked by the common process of object substitution or updating (Enns & Di Lollo, 1997; Di Lollo, Enns, & Rensink, 2000; Enns, 2002; Enns, Lleras, & Moore, 2010). Whereas some models of backward masking causally linked backward masking and motion mechanisms (Kahneman, 1967; Burr, 1984), various studies showed that these two processes are largely independent and can be dissociated from each other (Weisstein & Growney, 1969; Stoper & Banffy, 1977; Breitmeyer & Horman, 1981). Below we discuss independent but complementary roles that these processes play in the perception of dynamic form.

Mechanisms of motion deblurring: Metacontrast, and not motion, mechanisms

Several models have been proposed to explain motion deblurring (Anderson & Van Essen, 1987; Burr, 1980; Burr, Ross & Morrone, 1986; Martin & Marshall, 1993). According to Burr (1980), motion estimation is achieved by the spatiotemporally oriented receptive fields of motion mechanisms (such as the Reichardt motion detector or equivalent motion energy models). These motion-based models predict that an isolated moving target should not produce motion blur provided that it sufficiently stimulates the motion mechanisms. However, this prediction does not agree with findings from various studies that show the perception of extensive blur for moving isolated targets (Bidwell, 1899; Chen et al., 1995; Lubimov & Logvinenko, 1993; McDougall, 1904; Smith, 1969a, b). By using several paradigms directly tailored to test the predictions of motion-based models for deblurring, we showed that the activation of motion mechanisms is not a sufficient condition for motion deblurring. These findings argue against aforementioned motion-based models of deblurring. Our theoretical (Ogmen, 1993), experimental (Chen et al., 1995), and computational (Purushothaman, Ogmen, Chen, & Bedell, 1998) studies suggest that, metacontrast, and not motion, mechanisms underlie motion deblurring. In agreement with these findings, several studies showed strong correlation between motion smear and metacontrast in terms of their dependence on spatial separation, timing, and eccentricity (Castet, Lorenceau, & Bonnet, 1993; Chen et al., 1995; Di Lollo & Hogben, 1985; Farrell, 1984). Motion deblurring is closely related to “sequential metacontrast masking” (Herzog et al., 2012; Otto et al., 2006; Piéron, 1935), which is an extended form of metacontrast (Breitmeyer & Öğmen, 2006).

Mechanisms of non-retinotopic feature attribution: Motion, and not metacontrast, mechanisms

However, masking mechanisms solve only partly the motion blur problem. They can make motion streaks appear shorter thereby reducing the amount of blur in the picture. Yet, although deblurred, moving objects would still suffer from having a ghost-like appearance (Öğmen, 2007). This is because, in the retinotopic space, a moving object will stimulate each retinotopically localized receptive-field briefly and incompletely processed form information would spread across the retinotopic space just like the ghost-like appearances of moving objects in pictures taken at relatively slow shutter speeds. As a solution to this “moving ghosts” problem, we suggested that features of moving objects are processed according to motion-based non-retinotopic reference frames (Öğmen et al., 2006; Otto et al., 2006; Öğmen, 2007; Öğmen & Herzog, 2010). As depicted in Fig. 10, stimuli are grouped in the retinotopic space according to their motion; a common motion vector is extracted to serve as a reference-frame; and this reference-frame is used to map features of the stimuli onto non-retinotopic representations (manifolds), a process that we have termed non-retinotopic feature attribution. In agreement with this proposal, recent studies have indicated that visual attributes of a stimulus such as form (Öğmen et al., 2006), luminance (Shimozaki, Eckstein, & Thomas, 1999), color (Nishida, Watanabe, Kuriki, & Tokimoto, 2007), size (Kawabe, 2008), and motion (Boi et. al., 2009) are computed according to motion-based non-retinotopic reference frames.

Fig. 10
figure 10

Schematic representation of retinotopic and non-retinotopic interactions. The retinotopic space (retinotopic visual areas) is shown at the bottom of the figure. Three dots are moving in the rightward directions while four dots are moving in the upward direction. Masking takes place within these retinotopic representations according to spatio-temporal properties of the stimuli. Dots are grouped into two groups according to their motion vectors. A reference motion vector is extracted and serves as the reference frame whereby dots are mapped into non-retinotopic representations. Features of the dots are computed and attributed according to these reference frames in the non-retinotopic space. For example, the rightmost dot of the group on the left has the same motion vector as the reference frame motion vector. Therefore, according to this reference frame, it is represented as a stationary dot in the non-retinotopic manifold. As this dot moves in the retinotopic space, its features are mapped to the same non-retinotopic locus yielding non-retinotopic integration of its features. Although the activity corresponding to this map can be suppressed at the retinotopic representations, observers can report the features of the dot by reading-out its non-retinotopic representation. For more details, see (Öğmen, 2007; Öğmen & Herzog, 2010)

In backward masking, it has been long known that features of the target can be perceived as being part of the mask, an effect termed feature transposition, feature inheritance, or feature attribution (Werner, 1935; Stewart & Purcell, 1970; Wilson & Johnson, 1985; Hofer, Walder, & Groner, 1989; Herzog & Koch, 2001; Enns, 2002; Sharikadze, Fahle, & Herzog, 2005; Öğmen et al., 2006; Otto et al., 2006). The close relationship between feature attribution and masking led some researchers to suggest that processing of features for moving stimuli and masking are linked causally by a common process, viz., the process of object substitution or updating (Enns & Di Lollo, 1997; Di Lollo et al., 2000; Enns, 2002; Enns et al., 2010). To test whether feature attribution results from masking or motion mechanisms, we measured magnitudes of feature attribution, motion, and masking and computed correlations between these variables (Breitmeyer, Herzog, et al., 2008). Our results showed that, when apparent motion occurs without masking, it correlates positively with feature attribution. Furthermore, when apparent motion occurs with masking, feature attribution remains positively correlated with apparent motion after the contribution of masking is factored out, but does not correlate with masking after the contribution of apparent motion is similarly factored out. Taken together, these findings support the view that feature attribution is based on motion and not on masking mechanisms.

In our theory, masking operates in retinotopic representations while features of a stimulus can escape masking and become visible in non-retinotopic representations. Little is known in terms of neural correlates of visibility and non-retinotopic representations; however, we can propose an outline of how this dissociation between retinotopic masking and non-retinotopic visibility can take place following two general neural representation schemes.

In one, that we call structural correlates, different types of neurons represent different types of information. For example, a group of neurons may represent the visibility of a stimulus in retinotopic coordinates, whereas another group of neurons may represent the visibility of features of a stimulus in motion-based coordinates (i.e., non-retinotopic representations). As shown in Fig. 11, architecture in which masking operates on retinotopic neurons and feature information flows upstream before masking can explain our results in Experiment 3. With predictable motion, observers can reliably readout activity in non-retinotopic representations and this activity is immune to masking, which operates at the level of retinotopic representations.

Fig. 11
figure 11

Schematic representation of retinotopic masking and non-retinotopic visibility. Circles represent populations of neurons and dashed circles represent those that underlie visibility. Feature information is sent to both retinotopic and non-retinotopic representations, wherein they are represented according to retinotopic and motion-based reference frames, respectively. The locus of masking is at retinotopic representations

In a second scheme, that we call activational correlates, the same population of neurons can carry out both types of information but through different activation patterns. For example, it has been suggested that synchronization between neurons at pre-determined frequency bands (e.g., alpha, gamma) may underlie conscious awareness of a stimulus. In this case, results of Experiment 3 can be explained by masking mechanisms that disrupt synchrony at retinotopic, but not at non-retinotopic, level. Of course, structural and activational correlates are not mutually exclusive and it is possible to find a mixture of the two. In general, our approach highlights the importance of motion-based reference-frames and suggests a broad and distributed neural representation that requires coordination between ventral and dorsal streams to process features in terms of motion-based reference frames.

All of these observations are in agreement with the findings of retinotopic masking under conditions where the observer is stationary with the eyes under steady fixation. However, under normal viewing conditions, both the subject and objects move. Future studies will determine how reference frames generated by ego-motions (as in the case of eye movements) and exo-motions (as in the case of moving objects, studied herein) are coordinated to work in synergy.