In our dynamically changing visual environment, an important task of the visual system is to pursue and identify moving objects and to maintain their inner representations across time and space. Recent studies have focused a great deal of attention on such perceptual topics as the identification of moving objects. Investigations of the perceptual characteristics of stream/bounce displays are prominent examples of such studies.

In a stream/bounce display, two discs approach each other, overlap at the center of the display, and then separate again. The discs in this ambiguous display can be interpreted either as streaming past or as bouncing off each other. This stream/bounce display enables examination of how visual systems interpret events involving object movements, as well as the type of information selected and integrated to identify and represent moving objects in time and space. Thus, the visual interpretation of this bistable stream/bounce display has been tested using variable intramodal (Bertenthal, Banton, & Bradbury, 1993; Kanizsa, 1979; Metzger, 1934; A. B. Sekuler & R. Sekuler, 1999) and intermodal stimulus manipulations (Kawachi & Gyoba, 2006; R. Sekuler, A. B. Sekuler, & Lau, 1997; K. Watanabe & Shimojo, 2001b). In general, the human visual system interprets the solo presentation of the ambiguous stream/bounce display as streaming, but this can be altered to a bouncing interpretation, depending on intramodal and crossmodal perturbations. These perceptual tendencies reflect the inertial properties of our visual system (Anstis & Ramachandran, 1987), which bias the recruitment of local motion signals to a straight motion path rather than a returning path, and the vulnerability of the maintenance processes of continuous motion to some perturbations.

In addition to motion integration processes, depth is an important feature in the perception of stream/bounce displays. The ambiguity of the stream/bounce perception is due to the two-dimensional nature of the display. In the natural world, the two objects stream past or bounce off each other depending on their three-dimensional spatial relationship. Therefore, perception of the stream/bounce display might involve visual interpretation of the depth dimension. Bertenthal et al. (1993) tested this by adding depth information defined by binocular disparity to the stream/bounce stimuli and demonstrated that depth cues play a crucial role in the resultant percepts.

Although the characteristics of object identification in this kind of bistable motion perception have been well analyzed in humans, few behavioral studies have directly addressed these issues in other animals. For example, in a field experiment on free-ranging rhesus macaques (Macaca mulatta), (Flombaum, Kundey, Santos, & Scholl, 2004) demonstrated tunnel effects, which also concern the individuation process of moving objects. They showed real objects (a lemon and a kiwi fruit) in motion to monkeys in order to attract their attention. When the first object (a lemon) moved behind an occluder and the second object (a kiwi fruit) appeared from the other side of the occluder at the appropriate time, the monkeys generally failed to search for the first object, as if they had noticed only one continuously moving object. When the continuity of the motion was disrupted, the tunnel effect disappeared, as is the case in human studies. These results suggest that properties of the motion integration process that maintain object identity may be shared by humans and monkeys. However, direct comparisons of the spatiotemporal characteristics of such motion integration processes between humans and nonhuman primates are lacking, and therefore further comparative experiments using controlled stimuli will be helpful to explore the phylogenetic background of our visual recognition processes.

In the present study, we examined the characteristics of stream/bounce perception in chimpanzees, the closest evolutionary relative of humans. We conducted five experiments using an object-tracking task to reveal how chimpanzees perceive the moving object in stream/bounce displays and how perception is influenced by depth cues. In the first experiment, we compared the characteristics of stream/bounce perception in chimpanzees and humans. In the second to fifth experiments, we examined the effects of two kinds of depth information, X-junctions (Exp. 2) and motion transparency cues (Exps. 3 and 4), on chimpanzees’ stream/bounce perception, and contrast the results with those in humans (Exp. 5).

General method

Since the experiments reported here used similar methods, we first describe the procedures common to all of the experiments.

Subjects

Six chimpanzees, Ai (28 years old, female), Ayumu (4.5 years old, male), Chloe (24 years old, female), Cleo (4.5 years old, male), Pal (4.5 years old, female), and Pendesa (29 years old, female), participated in Experiments 1–4, in this order. Ai did not participate in Experiments 2, 3, and 4, and Pal did not participated in Experiment 2, due to nonavailability during the experimental schedule. The subjects were experienced in various perceptual–cognitive tasks, such as matching to sample (Matsuno, Kawai, & Matsuzawa, 2004; Matsuno & Tomonaga, 2007, 2008) and visual search (Matsuno & Tomonaga, 2006; Tomonaga, 2001), and they were accustomed to the experimental settings used in this study. The performance of the 3 young subjects were not qualitatively different from that of the 3 adults; therefore, the analyses were conducted on the pooled data for all subjects.

The subjects lived with 8 other chimpanzees in an environmentally enriched outdoor compound and the attached indoor residences (Ochiai & Matsuzawa, 1997). They were not deprived of food at any time during the study. The care and use of the chimpanzees adhered to the 2002 version of the Guide for the Care and Use of Laboratory Primates of the Primate Research Institute, Kyoto University. The research design was approved by the Animal Welfare and Animal Care Committee of the Institute.

Apparatus

Chimpanzees were tested in an experimental booth (approximately 1.8 × 1.8 × 2.0 m) with acrylic panels as walls on all four sides. The stimuli were generated on a Pentium-based computer and displayed on 21- and 22-in. CRT monitors (Totoku CV-213PJ for Ayumu, Cleo, and Pal, and a Mitsubishi TSD-221 S for the others) equipped with capacitive and surface acoustic wave touch screens. This system served to present the stimuli and to accurately record responses (touch locations). The monitor resolution was 1,024 × 768 pixels in 8-bit color mode. The refresh rate was 75 Hz, and each display was synchronized with the vertical retrace of the monitor. Subjects observed the monitor at a viewing distance of about 45 cm without head restraints. The viewing distance was restricted by a transparent acrylic panel, which was attached between the monitor and subjects to prevent damage to the monitor by the chimpanzees. A universal feeder (Biomedica, BUF-310) delivered small pieces of a food reward (apples or peanuts) into a food tray below the monitor.

Object-tracking task

To assess chimpanzees’ perception of ambiguous stream/bounce displays in the absence of verbal reports of their subjective experience, we adopted an object-tracking task (see, e.g., Pylyshyn & Storm, 1988, as well as our Fig. 1). In the stream/bounce display, two discs appeared, one each on the right and left sides of the display, with one of the discs cued by flickering at the beginning of each trial. The two discs then started to approach each other, completely overlapped at the center of the display, and separated again. The subjects were required to visually track the initially cued disc through the movement phase and to point to the disc after the two discs had stopped moving. Pointing to the disc on the side where the cued disc was initially located indicated that the subject perceived the discs as bouncing, whereas pointing to the disc on the opposite side indicated that the subject perceived them as streaming.

Fig. 1
figure 1

Schematic diagram illustrating a trial in the object-tracking task

Experiment 1: Stream/bounce perception in chimpanzees and humans

Experiment 1 investigated chimpanzee perception of the stream/bounce stimulus. In the test sessions, two stimulus conditions were tested by varying the movement speed of the discs. In the 100% overlap condition (stream/bounce stimuli), the two discs completely overlapped at the center of the display. In the 50% overlap condition (intermediate partial-overlap stimuli) used as a control, the two discs stopped overlapping when the edge of one disc reached the center of the other disc. We expected that the subjects would more frequently perceive the partial-overlap stimuli as bouncing than the completely overlapping stream/bounce event.

Method

Subjects

Six chimpanzees and 5 adult humans (females) ranging in age from 18 to 25 years (mean = 21.2) participated in the experiment. All of the human observers had normal or corrected-to-normal visual acuity.

Stimuli

The displays consisted of two gray discs, identical in shape and color, subtending about 23 × 23 mm (2.9º × 2.9º of visual angle at a viewing distance of 45 cm) on a black background (see Figs. 2, 3a, and b). The two discs were initially horizontally separated by a center-to-center distance of 218 mm (27.8º). In the training phase with moving stimuli and the test phase, the two discs moved horizontally (with a slight vertical displacement in some training conditions; see below and Fig. 2c and d). The movement continued until the discs reached the approximate horizontal positions where the two discs were initially located. The separation of the terminal positions of the discs varied from 193 to 218 mm across trials, so that both the final positions of the discs and the event durations could not be significant cues for discrimination between bouncing and streaming events. The laboratory was dimly illuminated to prevent reflections on the computer screen.

Fig. 2
figure 2

Depictions of stimulus displays used in the training sessions and the baseline trials. Arrows indicate motion. (a) Unambiguous streaming display without vertical displacement. (b) Unambiguous bouncing display without vertical displacement. (c) Unambiguous streaming display with vertical displacement. (d) Unambiguous bouncing display with vertical displacement

Fig. 3
figure 3

Depictions of stimulus displays used in the experiments. Arrows indicate motion. (a) Stream/bounce display with filled disc stimuli. (b) Partial-overlap display with filled disc stimuli. (c) Stream/bounce display with open ring stimuli. (d) Stream/bounce display with random-dot stimuli

First training phase (static condition)

Prior to the test sessions, the chimpanzees were trained to track the cued disc. In the first phase of the training, chimpanzees were trained under a static condition in which they were required to simply detect a cued disc at its position.

Each trial was initiated by presenting a warning stimulus (an empty gray square subtending 31 × 31 mm) located at the bottom of the screen. After the subject touched the warning stimulus, it disappeared and two discs appeared. One of the discs started to flash at 18.75 Hz for 600 ms and then turned to the same gray color as the other disc. The two discs then remained stationary during a delay period (randomly varied from 680 to 760 ms across trials, 720 ms on average). After the delay period, a gray square (38 × 38 mm) appeared around each disc to signal the start of the response phase. Chimpanzees were required to indicate the cued disc. A correct response was followed by a chime sound and the delivery of a food reward. An incorrect response was followed by a buzzer sound and a 4-s time-out. The interval between the end of the trial and the presentation of the warning stimulus for the next trial was 2-s.

A training session under the static condition consisted of 64 trials. The left–right position of the cued disc was counterbalanced within a session. The training phase was continued until the subject reached the criterion for learning, which was set as >90% accuracy in three consecutive sessions.

Second training phase (movement condition)

In the second phase of training, chimpanzees were trained to track the movement of the target disc with two types of movement path, unambiguous streaming and unambiguous bouncing. Four kinds of trials, two for the streaming movement and two for the bouncing movement, were prepared (Fig. 2). In the training trials, the vertical locations of the two discs were initially set to be differentiated by the length of the radius of a disc. Under one condition, the discs moved horizontally to the opposite side of the display (Fig. 2a). The discs partially overlapped at the center of the display, but their identity was not ambiguous due to the vertical misalignment; human observers perceived unambiguous streaming of the discs. Under another condition (Fig. 2b), the discs moved horizontally and reversed their direction at the point where they touched an imaginary vertical center line of the display. Therefore, the discs did not partially overlap, and human observers perceived unambiguous bouncing. Under these two conditions, the vertical relationship of the two discs was maintained in each trial, so the chimpanzees could detect a cued disc by attending to the vertical position without tracking the disc. Therefore, in the other two types of trials, the relative vertical positions were reversed between pre- and postoblique movements (Fig. 2c and d). Under these conditions, the two discs moved with slight vertical displacements, so that their relative vertical positions were reversed at the center of the display. The two discs moved on (streaming) or reversed at the point where the discs touched the imaginary vertical center line of the display (bouncing). The initial vertical positions (upper or lower position for the left or right stimulus) were random across trials.

At the movement phase, the discs were horizontally displaced at 144, 287, 431, or 574 mm (18.1º, 35.4º, 51.1º, or 65.1º, respectively) per second. Each stimulus frame lasted 13.3 ms, and the displacement in each frame was less than the size of the disc radius, even at the highest speed.

Each trial proceeded as in the static condition, except that the disc movement phase described above was inserted just after the cueing flash of one disc ended. Chimpanzees were required to indicate the position of the target disc when the discs stopped moving, and a gray square appeared around each disc to signal the start of the response phase.

A training session with disc movement consisted of 128 trials (8 trials for each movement speed and movement path condition). Each training phase was continued until the subject reached the criterion for learning, which was set as >90% accuracy in three consecutive sessions.

Test phase

In the test phase, we tested how the stream/bounce stimuli and partial-overlap stimuli were perceived. In test probe trials, two discs were initially horizontally aligned and then horizontally moved toward one another. In the stream/bounce display (Fig. 3a), the discs completely overlapped and moved on to the sides of the display. In the partial-overlap display (Fig. 3b), the discs reversed their movement directions when the edge of each disc reached the center of the other. The discs moved at 144, 287, 431, or 574 mm (18.1º, 35.4º, 51.1º, or 65.1º, respectively) per second.

Probe trials were intermixed with baseline trials, which were the same as those in the training session with moving discs. For the chimpanzees, a test session consisted of 8 probe trials, during which each combination of the movement speed and overlap conditions was tested, and 128 baseline trials. The 8 probe trials appeared randomly in a session. The left–right position of the cued disc was counterbalanced within a session. Each chimpanzee participated in 20 test sessions. Feedback in the baseline trials was the same as in the second training phase. In the probe trials, no feedback was given, and the next trial started after a 2-s interval.

Test phase in humans

For humans, a test session consisted of 80 probe trials. Ten trials under each combination of the speed and overlap conditions were randomly intermixed in the session. Each human participated in one test session. Prior to the test session, each human observer received 16 baseline trials. Each observer was instructed to track an initially cued disc and to touch it after the discs had stopped.

Results

Training phase

In the first training phase, in which static stimuli were used, chimpanzees required 32 sessions on average to reach the learning criterion (53, 10, 5, 24, 80, and 21 sessions for Ai, Ayumu, Chloe, Cleo, Pal, and Pendesa, respectively). The individual differences did not reflect age, but probably reflected their motivation on the new task.

Performance in the second training phase, with moving stimuli (Fig. 2), was significantly better than chance (50%), even in the first session [71.1% correct on average; t(5) = 4.1, p < .01] both in the streaming (73.9%) and the bouncing (68.3%) conditions. The chimpanzees required 23 sessions on average to reach criterion (33, 4, 8, 18, 59, and 17 sessions for Ai, Ayumu, Chloe, Cleo, Pal, and Pendesa).

The generalisation of performance from the first training phase (static discs) to the second training phase (moving discs) meant that chimpanzees spontaneously tracked the cued disc when it moved. This also guaranteed that chimpanzees did not solve the task by learning one-to-one stimulus–response associations (i.e., associations of two cue positions, four movement patterns, and left–right responses). This result was also supported by another study (Matsuno & Tomonaga, 2011, unpublished), conducted after this one, in which 4 of the 6 chimpanzees were tested on tracking a target disc among four discs that rotationally moved on a shared path. The chimpanzees successfully tracked the target disc (74.5% correct on average) in the first session, irrespective of the novelty of movement patterns. These results confirm that the results for the following test probe trials reflected object-tracking abilities rather than other processes associated with the specific cues in the training displays.

Test phase

The performance of the chimpanzees and humans in the probe trials is shown in Fig. 4. The response tendencies differed between the two groups. Humans predominantly perceived the ambiguous stream/bounce event (100% overlap condition) as streaming, even in the partial-overlap condition at high speed. However, chimpanzees predominantly perceived the two discs as bouncing, irrespective of the conditions.

Fig. 4
figure 4

Collected response data from probe trials in Experiment 1. The vertical axis represents the percentage of streaming responses; 0% indicates a bounce response from all subjects in all of the trials, and 100% indicates complete streaming responses. The left four bars represent the 100% overlap condition, and the others represent the 50% overlap condition. Each bar indicates a different movement speed condition. Error bars denote 1 SE

A two-way ANOVA of the overlap and speed conditions in humans revealed a significant main effect of overlap condition, F(1, 4) = 63.4, p < . 01, and an interaction, F(3, 12) = 10.0, p < .01. Post-hoc simple main effect analyses revealed that performance under the partial-overlap condition varied with movement speed, F(3, 12) = 5.08, p < .05. The simple main effect of overlap condition was significant at the slowest speed, F(1, 5) = 13.8, p < .05, but not at other speeds.

In chimpanzees, the main effects of moving speed, F(3, 15) = 0.8, p > .10, and degree of overlap, F(1, 5) = 6.1, p > .05, were not significant, but their interaction was, F(3, 15) = 4.5, p < .05. Analysis of the simple main effect revealed that performance varied with movement speed under the partial-overlap condition, F(3, 15) = 4.2, p < .05, but not under the 100% overlap stream/bounce condition, F(3, 15) = 1.7, p > .10. The simple main effect of overlap condition was significant only at the slowest speed, F(1, 5) = 13.8, p < .05.

The average streaming responses of chimpanzees in the 100% overlap stream/bounce display (29.2%) were significantly below chance, t(5) = 3.0, p < .05. In contrast, those of humans (81.0%) were significantly above chance, t(4) = 3.9, p < .01.

Discussion

Humans perceived the stream/bounce stimuli as streaming, consistent with the results of previous studies (Bertenthal et al., 1993; A. B. Sekuler & R. Sekuler, 1999). They also perceived the partial-overlap stimuli as streaming more frequently when the speed of movement was higher. The effects of speed could be due to the difference in the size of the stimulus displacement per frame. At higher speeds, the frame-by-frame displacement was larger relative to the size of the stimuli, and the motion correspondence between frames was more ambiguous.

In contrast to the humans, the performance of the chimpanzees unexpectedly showed a tendency to perceive both the stream/bounce stimuli and the partial-overlap stimuli as bouncing. Baseline trial performance was very accurate (92% correct on average), indicating that the chimpanzees tracked the target correctly, regardless of the type of movement. In addition, their performance varied depending on the combination of overlap and speed conditions, as did the humans’. Therefore, it is difficult to explain the results as a simple response bias such as neglecting the tracking task and merely selecting the disc on the side on which the cued disc initially appeared.

What do these results indicate about differences in visual interpretation between chimpanzees and humans? A difference was seen in the overall frequency of percepts of streaming. On the other hand, the similar speed effects in the partial-overlap condition and the shared direction of the effect of stimulus overlap suggest common perceptual mechanisms for resolving the ambiguous motion event. Therefore, the observed species difference might reflect differences in the degree, but not the kind, of perceptual functions needed to track and identify a moving object.

As noted previously, humans sometimes predominantly perceive bouncing (Bertenthal et al., 1993; A. B. Sekuler & R. Sekuler, 1999; R. Sekuler et al., 1997; K. Watanabe & Shimojo, 1998, 2001b). The human visual system has a default tendency to assume the continuous movement of objects in order to individuate them and maintain object identity. Thus, streaming percepts are dominant in solo presentations of a stream/bounce display. However, when an external perturbation interrupts the continuous motion-integration process, such as the abrupt onset of a click sound and the sudden cessation of a movement, the percept transforms to bouncing. Thus, one possible reason for the observed species difference is that the motion integration process to one direction is more easily perturbed in chimpanzees than in humans. Although we did not include explicit perturbations to the display, the stimulus configuration of the stream/bounce display itself may have contained perturbing factors to which chimpanzees might have been more sensitive. This might also apply to the small number of trials on which humans perceived the 100% overlap stream/bounce displays as bouncing. Conceivably, a change in the stimulus configuration to be less disruptive to and to facilitate the individuation of each object at their coincidence would promote dominant stream percepts in chimpanzees. To investigate this issue further, the stream/bounce perception of chimpanzees was tested in the following experiments.

Experiment 2: Stream/bounce perception in chimpanzees using ring stimuli

In the second experiment, we confirmed that the streaming percept is also the default state in chimpanzees when no external perturbation exists and the two discs are more readily individuated. Because no distractor stimulus was used in Experiment 1, the factors that disturb the chimpanzees’ motion integration into a single direction could have been the overlap or fusion of the two discs themselves. When the two filled stimuli touched, part of the stimulus edge started to fuse and disappeared. Consequently, the local directional signal of the target disc decreased, and the two stimuli became difficult to individuate. This could impair motion integration. Thus, in this experiment, we used open ring stimuli (Fig. 3c), which provided an explicit depth cue (X-junctions) when two objects crossed. When open ring stimuli partially overlapped, their edges were still salient, and the junctions of the two stimuli could be an explicit signal for the crossover of the two objects. In addition, the local motion signal was more salient with the ring stimuli than with the filled disc. This might promote continuous motion integration to a single direction.

Method

Four chimpanzees participated in this experiment. The stimuli and procedures were the same as in Experiment 1 except as described here. The displays consisted either of the two gray discs used in Experiment 1 or of two gray open ring stimuli that matched the contours of the discs (Fig. 3c). During the movement phase, the objects were horizontally displaced at 144 mm (18.1º) per second.

In each test session, the configuration of stimuli (filled disc or open ring) was fixed, and the two stimulus conditions were presented in alternating sessions. A test session consisted of 128 baseline trials, which were the same as in Experiment 1 except for the stimuli (open rings in half of the sessions), and 8 probe trials, during which 4 trials in each overlap condition (stream/bounce condition with 100% overlap or partial-overlap stimuli with 50% overlap) were presented. All responses in probe trials were positively reinforced, as were correct responses in the baseline trials. Each chimpanzee participated in 10 sessions (5 sessions under each stimulus condition). No additional training sessions were conducted.

Results

Chimpanzee perception of the stream/bounce display with ring stimuli became predominantly streaming, whereas bouncing was perceived with the uniformly filled gray discs, as in Experiment 1 (Fig. 5).

Fig. 5
figure 5

Percentages of streaming responses in Experiment 2. The left two bars represent the 100% overlap condition, and the others represent the 50% overlap condition. Each bar indicates a different stimulus condition. Error bars denote 1 SE

A two-way ANOVA revealed significant main effects of stimulus, F(1, 3) = 26.7, p < .05, and overlap condition, F(1, 3) = 154.7, p < .01, as well as a significant interaction, F(1, 3) = 55.4, p < .01. Analysis of the simple main effect confirmed that the percentages of perceived streaming differed significantly between stimulus conditions (filled discs or open rings) in the stream/bounce condition with 100% overlap of stimuli, F(1, 3) = 243.0, p < .01, but not in the partial-overlap condition, F(1, 3) = 0.3, p > .10. The simple effects of the overlap condition were significant under both filled disc and open ring conditions, Fs(1, 3) = 33.0 and 211.0, ps < .01.

The average streaming response to the 100% overlap ring stimuli (78.8%) was significantly above chance, t(3) = 4.0, p < .05. The chimpanzees performed very accurately in baseline trials with open rings (93% correct responses on average), similar to their performance with filled discs (94%), F(1, 3) = 1.2, p > .10.

Discussion

Chimpanzees predominantly perceived the 100% overlap stream/bounce display as streaming when open ring stimuli were used. The filled disc display was less frequently perceived as streaming, though the percentage increased with increasing overlap, as in Experiment 1. The alternation of chimpanzees’ perception with the stimulus manipulation further confirmed that chimpanzees did not blindly select the disc on the side on which the target initially appeared in probe trials. Furthermore, these results suggest that the difference between chimpanzee and human perception shown in Experiment 1 reflects differences in the degree, but not the kind, of the perceptual function needed to track and identify a moving object.

When the two stimuli in stream/bounce displays overlap, the visual system tends to integrate the local motion signals along the same trajectory and to interpret the event as continuous smooth motion. Such default streaming percepts by chimpanzees, however, may be more easily perturbed. The open ring stimuli provided an explicit cue of a crossover of the two objects in the depth dimension (X-junctions of the contours), which made individuation of the two stimuli easier, and unambiguous local motion signals. This enhanced saliency may compensate for the vulnerability of the motion integration process in chimpanzees.

Experiment 3: Stream/bounce perception in chimpanzees using random-dot stimuli

In Experiment 3, we examined the effect of another depth cue, motion transparency, on the perception of stream/bounce displays in chimpanzees. Coherently moving dots produce the percept of motion transparency in humans (Braddick, Wishart, & Curran, 2002; Edwards & Greenwood, 2005). Using random-dot stimuli that coherently moved in opposite directions from each other, we expected that the two objects would be perceived at different depth dimensions and would be easily discriminated when they crossed. We tested chimpanzee perception of the stream/bounce display with random-dot stimuli relative to a display with uniformly filled stimuli.

Method

A total of 5 chimpanzees participated in the experiment. The stimuli and procedures were the same as in Experiment 2, except as described here. The displays consisted of two identical gray squares or two identical random-dot squares of 10% density (Fig. 3d). Each stimulus subtended about 23 × 23 mm (2.9º × 2.9º of visual angle at a viewing distance of 45 cm). In the movement phase, the squares were horizontally displaced at 144 mm (18.1º) per second.

In each test session, the configuration of the stimuli (filled squares or random-dot squares) was fixed, and the sessions in each stimulus condition alternated. A test session consisted of 128 baseline trials, which were the same as in Experiment 2 except for the stimuli (filled squares or random-dot squares), and 8 probe trials, in which 4 trials of each overlap condition were presented. All responses in the probe trials were positively reinforced, as were correct responses in the baseline trials. Each chimpanzee participated in 10 sessions (5 sessions in each stimulus condition), with no additional training sessions.

Results

Chimpanzees tended to perceive the stream/bounce display as streaming when the stimuli were random-dot squares, whereas bouncing was perceived with uniformly filled gray squares (Fig. 6). A two-way ANOVA revealed significant main effects of the stimulus, F(1, 4) = 42.0, p < .01, and of overlap condition, F(1, 4) = 115.1, p < .01, as well as a significant interaction, F(1, 4) = 15.8, p < .05. Analysis of the simple main effects confirmed that in the stream/bounce condition with 100% overlap of the stimuli, the percentages of perceived streaming differed significantly between stimulus conditions, F(1, 4) = 42.5, p < .01, but not in the partial-overlap condition, F(1, 4)  =  1.1, p > .10. The simple main effects of the degree of overlap were significant under both filled-square and random-dot stimulus conditions, Fs(1, 4) = 13.4 and 76.9, ps < .05.

Fig. 6
figure 6

Percentages of streaming responses in Experiment 3. The left two bars represent the 100% overlap condition, and the others represent the 50% overlap condition. Each bar indicates a different stimulus condition. Error bars denote 1 SE

The average streaming responses of chimpanzees when perceiving the 100% overlap random-dot stimuli (64%) was above chance and at a marginally significant level with a two-tailed t test, t(4) = 2.5, p = .07. The performance in baseline trials with random-dot squares (93% correct responses on average) was highly accurate and not significantly different from that with filled squares (94%), F(1, 4) = 1.1, p > .10.

Discussion

Chimpanzees tended to perceive the stream/bounce display as streaming when random-dot stimuli were used, whereas when filled gray objects were used, streaming was perceived much less frequently. These results further support the view that chimpanzees’ visual systems interprets the stream/bounce bistable motion as streaming, as do human visual systems, when the two stimuli are more easily individuated.

These results also suggest that chimpanzees have some sensitivity to the depth cues that induce subjective experiences of motion transparency perception in humans. To further confirm that the variable perceptual interpretation of the stream/bounce stimuli depended on motion coherency, in Experiment 4 we used incoherently moving random-dot stimuli.

Experiment 4: Stream/bounce perception in chimpanzees using incoherently moving random-dot stimuli

In addition to their motion coherency, the random-dot stimuli used in Experiment 3 had some other features that differed from the filled square stimuli. For example, the luminance intensity of a random-dot square was much less than that of a filled square. In addition, in the random-dot condition, the stimulus intensity (dot density) was doubled at the point where the two stimuli overlapped. Such cues, rather than motion coherency, might induce the observed alteration in stream/bounce perception. To test the effect of coherent dot motion on the perception of the overlapping stimuli, the previous experiment was replicated using incoherently moving random-dot stimuli.

Method

The same 5 chimpanzees as in Experiment 3 participated. Under the random-dot condition, the spatial arrangement of dots in the stimulus square was randomly refreshed at each displacement of the stimuli. Thus, the density of random dots was duplicated at the point where the two squares overlapped, but observers were unable to detect coherent motion by temporally integrating the proximal dot positions. The performance of the chimpanzees in the incoherent random-dot condition was compared with that in the filled-square condition, which was identical to the condition presented in Experiment 3. All other procedures were the same as in Experiment 3. No additional training sessions were given.

Results

The perception of the stream/bounce display was not at all biased toward streaming when the random dots were not coherently updated (Fig. 7). Performance in the random-dot condition did not differ much from that in the filled-square condition, even when the stimuli overlapped 100%.

Fig. 7
figure 7

Percentages of streaming responses in Experiment 4. The left two bars represent the 100% overlap condition, and the others represent the 50% overlap condition. Each bar indicates a different stimulus condition. Error bars denote 1 SE

A two-way ANOVA revealed a significant main effect of overlap, F(1, 4) = 17.0, p < .05, but neither the main effect of stimuli, F(1, 4) = 4.1, p > .10, nor the interaction, F(1, 4) = 2.5, p > .10, was significant. The streaming responses to the 100% overlap stream/bounce displays with random-dot stimuli (37% on average) were not significantly different from chance, t(4) = 2.0, p > .10. The performance levels in baseline trials with both random-dot squares (93% correct responses) and filled squares (91% correct responses) were similarly accurate, F(1, 4) = 1.8, p > .10.

A direct comparison between performance in the coherently moving random-dot condition in Experiment 3 and that in the incoherently moving random-dot condition in this experiment revealed that streaming was perceived significantly more frequently in the former condition. A two-way ANOVA revealed a significant main effect of overlap condition, F(1, 4) = 63.7, p < .01, and a significant interaction between overlap and coherency, F(1, 4) = 26.0, p < .01. Analysis of the simple main effect confirmed that the percentages of perceived streaming differed significantly between the experiments in the 100% overlap condition, F(1, 4) = 15.7, p < .05, but not in the partial-overlap condition, F(1, 4) = 0.0, p > .10.

Discussion

Although the stimuli used here were almost the same as in Experiment 3, except for the temporal coherence of each dot, they strongly influenced the chimpanzees’ perception. In this experiment, chimpanzees tended to perceive the stream/bounce display as bouncing, irrespective of stimulus type. The streaming responses diminished significantly with incoherently moving as compared to coherently moving random-dot stimuli. These results suggest that what promoted the perception of streaming in Experiment 3 was neither the low luminance intensity nor the doubling of the dot density at the overlap of the stimuli. Instead, coherent local motion and, conceivably, the perceived motion transparency arising from the coherent motion likely helped the chimpanzees maintain their continuous tracking of the target.

Experiment 5: Stream/bounce perception in humans using ring and random-dot stimuli

In Experiment 5, human subjects were tested with the filled disc, filled square, open ring, and coherent and incoherent random-dot stimuli used in Experiments 2–4, in order to reevaluate the chimpanzees’ performance in those experiments in comparison with that of humans.

Method

A group of 6 humans (1 male and 5 female) ranging in age from 22 to 29 years (mean = 24.4) were tested. The stimuli were the same as those used in Experiments 2 (filled discs and open rings), 3 (filled squares and coherently moving random-dot stimuli), and 4 (filled squares and incoherently moving random-dot stimuli). In the movement phase, the objects were horizontally displaced at 144 mm (18.1º) per second, as in Experiments 2–4.

A test session consisted of 100 probe trials. Ten trials of each combination of the two overlap conditions (stream/bounce condition with 100% overlap, or intermediate, partial-overlap stimuli with 50% overlap) and the five stimulus configuration conditions (filled discs, open rings, filled squares, and coherently or incoherently moving random dots) were randomly intermixed in the session. Each subject received a single test session, preceded by 20 baseline trials as used in the tests with chimpanzees. The subjects were instructed to track an initially cued disc and to touch that disc after the two discs had stopped.

Results

The results of Experiment 5 are shown in Fig. 8. The data for circular (filled disc and open ring) and rectangular stimuli (filled squares and coherent and incoherent random dots) were analysed separately, as each corresponded to the analysis in Experiment 2 or Experiments 3 and 4.

Fig. 8
figure 8

Percentages of streaming responses in Experiment 5. The left graph represents the conditions using circular stimuli, as in Experiment 2. The right graph represents the conditions using rectangular stimuli, as in Experiments 3 and 4

With circular stimuli, humans predominantly perceived streaming of the stream/bounce stimuli, irrespective of the stimulus configuration. In addition, with open ring stimuli, streaming was perceived more frequently than with filled discs.

A two-way ANOVA of the overlap and stimulus configuration conditions revealed significant main effects of overlap, F(1, 5) = 136.0, p < . 01, and stimulus configuration, F(1, 5) = 15.1, p < . 05. The interaction was not significant, F(1, 5) = 1.6, p > .10. The streaming responses to 100% overlap with open ring stimuli (98.3% on average) were significantly above chance, t(5) = 29.0, p < .01.

The subjects also predominantly perceived filled square and coherently and incoherently moving random-dot stimuli to be streaming when the two stimuli completely overlapped. Partial-overlap events were perceived as streaming less frequently than were stream/bounce events, and the frequency varied among the conditions of stimulus configuration. The subjects perceived partial overlap of two random-dot conditions as streaming more frequently than partial overlap of filled squares.

A two-way ANOVA revealed significant main effects of overlap condition, F(1, 5) = 53.9, p < .01, and stimulus configuration, F(2, 10) = 7.6, p < .01, as well as a significant interaction, F(2, 10) = 10.9, p < .01. Post-hoc simple main effects analyses revealed that performance under the partial-overlap condition varied with stimulus configuration, F(2, 10) = 10.8, p < .05, but the performance under the stream/bounce condition did not, F(2, 10) = 0.7, p > .10. A multiple comparison (paired t test with Holm’s correction) of performance with the three stimulus configurations in the partial-overlap condition revealed that streaming responses to filled disc stimuli were significantly less frequent than to coherent and incoherent random-dot stimuli, ts(5) = 3.6 and 3.4, ps < .05. The two random-dot conditions were not significantly different from each other, t(5) = 2.0, p > .10. The streaming responses to the 100% overlap stream/bounce displays with coherently and incoherently moving random-dot stimuli (98.3% and 96.7% on average) were significantly above chance, ts(5) = 29.0 and 22.1, ps < .01.

Discussion

Human subjects predominantly perceived the 100% overlap stream/bounce event with open rings and with random-dot stimuli as streaming. In addition, these stimuli were perceived as streaming more than the filled stimuli. These results recall those of chimpanzees in Experiments 2–4, although the overall frequency of streaming percepts by humans was much higher, as also shown in Experiment 1.

One notable difference between chimpanzees and humans was that the latter did not change their responses as a function of the coherence of the random-dot stimuli. Unlike chimpanzees, human subjects showed more streaming responses to the incoherently moving random-dot stimuli than to the filled square stimuli (in the partial-overlap condition).

The more-frequent streaming percepts with incoherent random-dot stimuli than with filled stimuli may be explained by the increased salience of the target identity when the two stimuli crossed over. As mentioned above, the filled target fused with the other filled stimulus, and target identity became ambiguous at crossover. Though incoherent random-dot stimuli did not have coherent local motion signals, the borders of the two crossed-over stimuli were still detected by virtue of the difference in dot density between the areas where the two random-dot stimuli were and were not superimposed. Such cues helped human subjects to attentively track the object motion longer and biased the human percepts more frequently toward streaming. Chimpanzees may be insensitive to such cues, or the effect for them may be too small to be statistically significant.

General discussion

This study investigated stream/bounce perception in chimpanzees and compared it with such perception in humans. We first showed that the stream/bounce perception of chimpanzees with filled discs differed from that of humans. Whereas humans predominantly perceived the stimuli to be streaming, chimpanzees exhibited many more bounce responses. Further experiments using ring and random-dot stream/bounce stimuli revealed that chimpanzees also predominantly perceived the stimuli as streaming when the two stimuli were more salient and discriminable from each other due to additional depth cues.

These results reveal both differences and similarities between the perceptual processes of chimpanzees and humans. Chimpanzees, like humans, perceived ambiguous stream/bounce events as streaming when the two stimuli were readily discriminable from each other due to the addition of depth cues, thus indicating that the tendency to keep track of unidirectional and continuous movement, and its resultant default streaming percepts, is shared by chimpanzees and humans. However, this tendency appears to be more readily perturbed in chimpanzees. Previous studies have identified multiple factors that can alternate stream/bounce perception (Bertenthal et al., 1993; Grassi & Casco, 2009; Kawabe & Miura, 2006; A. B. Sekuler & R. Sekuler, 1999; K. Watanabe & Shimojo, 2001a, 2001b), and determining a single mechanism that would fully explain the vulnerability of streaming perception in chimpanzees is difficult.

One possible explanation is that the spatiotemporal integration process of local motion signals may differ between the species. Previous studies in humans have proposed that the dominant streaming perception can be explained by an intrinsic directional bias involving temporal integration arising from the cooperative interaction between local motion detectors (e.g., Bertenthal et al., 1993). In our study, one prominent difference between the stimuli to which chimpanzees did and did not perceive more streaming (open ring and coherent random-dot vs. filled and incoherent random-dot stimuli) concerned the continuity of the local motion during the brief interval when target and nontarget stimuli crossed over. This implies that the temporal integration window may be smaller in chimpanzees than in humans and that chimpanzees’ perception may rely more on the local motion mechanism. Although the temporal integration process has not been well studied in chimpanzees, several comparative studies between chimpanzees and humans have revealed species differences in spatial integration processes (see, e.g., Fagot & Tomonaga, 1999, 2001). In these studies, chimpanzees were less sensitive to the global configuration of visual stimuli. The relative local bias in chimpanzees’ visual processing may be common between temporal and spatial domains.

Another possible explanation related to the motion integration process is a difference in the quality of sustained attention between chimpanzees and humans. Visual attention has a critical role in motion perception, selecting and integrating visual information across time and space, and keeping track of and identifying moving objects (see, e.g., Cavanagh, 1992; Choi & Scholl, 2004; Pylyshyn & Storm, 1988). Research on the effects of the state of visual attention on stream/bounce perception has revealed that poorer attentional resources directed to the moving object cause more frequent bouncing percepts (K. Watanabe & Shimojo, 1998), suggesting that a sufficient quality of attention is required for constant motion-integration processing, and thus the perception of streaming. Thus, the more frequent bouncing responses of chimpanzees might indicate that chimpanzees’ attention is more readily disrupted than is humans’.

Species differences in the quality of pursuit of moving objects may also explain the different perception of stream/bounce displays. In our experiments, both chimpanzee and human subjects were allowed to observe the display freely, without fixation. Pursuit eye movements could modulate perceived object motion (Baker & Graf, 2010; Kerzel, 2000) and promote streaming percepts in our displays. Therefore, the difference in pursuit eye movements between chimpanzees and humans could influence how each species perceives stream/bounce displays. However, previous studies have shown that humans perceived streaming of the stream/bounce stimuli even when the eyes were fixated (e.g., Bertenthal et al., 1993; A. B. Sekuler & R. Sekuler, 1999), and the frequency of streaming percepts in those studies (approximately 80%–95%) was similar to that in this study. Therefore, the higher rate of streaming perception by humans in our study can not be explained simply by the effects of pursuit eye movement. Comparative data on smooth pursuit of moving objects between chimpanzees and humans are lacking; such data are needed to examine this issue in more detail.

The effect of visual experience during training sessions should also be considered. Prior to test sessions, our chimpanzees experienced equal numbers of unambiguous streaming and bouncing displays in training trials. However, in the natural world, a plausible assumption is that an object moving in one direction continues to move in that direction (Hall-Haro, Johnson, Price, Vance, & Kiorpes, 2008; Spelke, 1994), and any bouncing event is accidental (K. Watanabe & Shimojo, 2001a). This is consistent with the tendency of our visual system to interpret bistable ambiguous visual information, such as a stream/bounce display, as unidirectional movement (Anstis & Ramachandran, 1987; Bertenthal et al., 1993). Thus, equalized experiences of streaming and bouncing percepts are unrealistic. This abnormally increased experience of bouncing percepts may have distorted the prior stochastic expectancy of the event perception by the chimpanzees, which may have biased their responses to include more bouncing percepts.

However, our data do not seem to support this idea. First, the number of training sessions experienced by each chimpanzee (ranging from 4 to 59 sessions) and the percentages of perceived bouncing were not positively correlated (Pearson’s correlation: r = −.14, p > .10). Second, in additional tests, human subjects, who received prolonged experience of baseline trials, maintained their predominant streaming percepts (see the supplemental materials: Experiment S1). The comparative test in humans, however, assessed the effect of a limited number of baseline trials (256 trials) and was not fully equivalent to the tests in chimpanzees. The effect of longer-term visual experience on stream/bounce perception should be evaluated in future studies.

An alternative interpretation of the results focuses on differences in the perception of depth when two-dimensional discs are presented on a flat monitor surface. Our interpretation of a computerised stream/bounce display may relate to structural constraints and the physical laws of the three-dimensional natural world, in which two solid objects on the same depth plane collide and those on different planes pass through (Scholl & Nakayama, 2002; A. B. Sekuler & R. Sekuler, 1999; K. Watanabe & Shimojo, 2001a). When humans perceive the stream/bounce stimuli as streaming, the two objects are perceived to be on different surfaces, not only in the open ring and random-dot conditions, but also in the filled disc condition. Chimpanzees may not perceive such depth dimensions on the planar surface of a CRT monitor when explicit depth cues are not given via the objects, and their perceptual processing may interpret objects on the same surface as never passing through, due to physical laws. Although evidence suggests that chimpanzees are capable of correlating projected movies to the real world (e.g., Hirata, 2007; Leighty, Menzel, & Fragaszy, 2008; Menzel, Savage-Rumbaugh, & Lawson, 1985) and perceiving depth from some two-dimensional pictorial cues (Imura & Tomonaga, 2003, 2009; Imura, Tomonaga, & Yagi, 2008), we cannot be sure that they employed these abilities in viewing our stimulus displays without such cues. Given that two-dimensional iconic expressions of the three-dimensional world are a human-specific innovation and that human and chimpanzee subjects differed enormously in previous exposure to such media, species differences in responding to such computerised graphical images would not be surprising.

According to this explanation, streaming perception with open ring and coherent random-dot stimuli may reflect chimpanzees’ sensitivity to two kinds of explicit depth cues, X-junctions and motion transparency. As noted above, the ambiguity of stream/bounce displays derives from a conflict between the perception of two moving objects with depth deviation (streaming) and without depth deviation (bouncing). The increase in streaming percepts supports the validity of stimulus manipulations as depth cues for chimpanzees’ perception.

An X-junction is known to be a strong cue for the detection of transparency or overlapping of objects in human vision (e.g., Dresp, Durand, & Grossberg, 2002; Kanizsa, 1979; T. Watanabe & Cavanagh, 1993). Several studies in nonhuman primates have also investigated the role of junctions (T, L, or X) in the perception of object overlap, revealing that monkeys and apes perceive the occlusion or transparency of two-dimensionally displayed objects using the clues of the junctions (e.g., Fujita & Giersch, 2005; Nagasaka, Nakata, & Osada, 2009; Sato, Kanazawa, & Fujita, 1997; Sugita, 1999). Our results, which demonstrate that the addition of X-junctions promoted streaming percepts, are consistent with the results of these studies.

Aggregations of dots moving coherently in opposite directions also strongly induce depth percepts of two different planes in humans (Braddick et al., 2002; Edwards & Greenwood, 2005). On the other hand, the perception of motion transparency has not received much attention in comparative perception studies, partly because it is difficult to find behavioral indices to assess such subjective percepts in nonverbal organisms. The method used here to measure the effects of motion transparency cues on stream/bounce perception may be valuable for assessing such visual sensitivity in other species.

In conclusion, our study is the first to demonstrate the perception of stream/bounce displays by chimpanzees. Visual interpretation in chimpanzees differed from that in humans, suggesting species differences in the process of identifying moving objects. Our study also provides evidence of chimpanzees’ sensitivity to two kinds of depth cues, X-junctions and motion transparency, which could be used to perceptually resolve interpretations of ambiguous motion events. Further studies comparing our results with those from other animals would be valuable for tracking the evolutionary origins of the perceptual mechanisms that underlie our representations of the dynamic visual world.