Skip to main content

Towards describing scenes by animals: Pigeons’ ordinal discrimination of objects varying in depth


The perception of a complex scene requires visual mechanisms that include identifying objects and their relative placement in depth. To examine apparent depth perception in birds, we tested four pigeons with a novel multiple-sequential-choice procedure. We created 3D-rendered scene stimuli containing three objects located at different apparent depths based on a variety of pictorial cues and placed small circular target response areas on them. The pigeons were trained to sequentially choose among the multiple response areas to report the object closest in apparent depth (ordinal position; front then middle object). After the pigeons learned this sequential depth discrimination, their use of three different monocular depth cues (occlusion, relative size, height in field) was tested, and their flexibility evaluated using three novel objects. In addition to the contribution to understanding apparent depth perception in birds, the use of more flexible open-ended choice discriminations, as employed here, has considerable promise for creating informative production-like tasks in nonverbal animals.

The absence of language in animals is simultaneously one of the fundamental advantages and disadvantages in examining their cognition. The human ability for open-ended language allows us to produce richly informative descriptions of both our perceived internal states and the perceived external world. While in some cases these self-reports are faulty, either lacking veridicality as in the case of perceptual illusions (Gregory, 1968) or a blindness to the implicit processing seen in prejudice (Greenwald, McGhee, & Schwartz, 1998), they are still psychologically revealing. These productive methods provide important clues and highlight complexities that would have taken much longer to discover with traditional experimental techniques (Delis, Robertson, & Efron, 1986; Porter & Coltheart, 2006). Thus, our capacities for language, expressive art, and behavioral flexibility allow us to engage in a wide variety of production tasks that effectively extend psychologists’ general experimental toolbox.

Within this tradition, the Gardners developed one innovative approach to examining cognition in animals by teaching chimpanzees to use sign language (Gardner & Gardner, 1969). While controversy surrounds the meaning of their findings for language in animals (e.g., Terrace, Petitto, Sanders, & Bever, 1979), the general approach undoubtedly gave researchers new tools and opportunities to see animal-produced signals that might reveal how they describe their external world or even reveal their (or other’s) internal states (Woodruff & Premack, 1979).

In an important extension of this approach to birds, Irene Pepperberg began to train and test a highly vocal species of parrot with the idea of using these productive reports to test various aspects of psittacine cognition. Through her extensive testing of African grey parrots across a wide variety of domains over the years, Pepperberg’s work has offered new insights into the basic cognition of these animals and their use of referential labels (Pepperberg, 1999). Beyond their important contribution to understanding cognition in parrots, her results have consistently affirmed the potential value of tests in which an animal can engage in a broad array of open-ended productive responses to various stimuli and situations. However, for testing bird species lacking the capacity for complex vocal behavior, such as pigeons, the use of such production-like tasks has had little impact. Where practical, such as in spatial behavior procedures, they have been quite revealing (Cheng, Spetch, Kelly, & Bingman, 2006; Schiffner & Wiltschko, 2014; Spetch & Edwards, 1988). Nevertheless, the difficulty in the implementation, analysis, and interpretation of such production tasks has also limited their use (Cook & Fowler, 2014; Epstein, Kirshnit, Lanza, & Rubin, 1984).

Over the past several years, our laboratory has tried to find solutions by developing approaches that permit more open-ended responding by pigeons. The general strategy has been to continue to use well-established trial-based procedures based on their pecking behavior, but evolving them towards more flexibility. One example of this approach was to develop and test continuously streaming, three-dimensional digital environments to which the birds had to constantly interact with multiple objects over extended periods of time on each trial (Qadri, Reid, & Cook, 2016). The results indicated that pigeons can simultaneously process and use at least two environmental context cues from the streaming environment to control their identification behavior of the passing objects. A second and different approach has been to develop adaptive discriminations using a genetic algorithm to test the pigeons with stimuli that adjusted their properties over sessions based on the pigeon’s responses (Cook & Qadri, 2013, 2014). The results of this approach have indicated that such open-ended techniques involving continually updating adjusting stimuli can successfully identify and track the behavioral choices of these birds.

In the current paper, we report a new and different attempt at an open-ended methodology by examining how pigeons might be induced to “describe” a perceptual scene using a multiple-choice procedure. This research builds upon previous work in different paradigms showing that pigeons see distinct objects in computer-presented scenes (Lazareva & Wasserman, 2012), can identify such 3D objects using rotation-invariant features (Gibson, Lazareva, Gosselin, Schyns, & Wasserman, 2007), and utilize numerous monocular depth cues to determine global object position (Cavoto & Cook, 2006; Cook, Qadri, Kieres, & Commons-Miller, 2012). Because of our prior experience establishing that pigeons have the capacity for perceiving apparent depth in complex scenes using various monocular pictorial cues (Cavoto & Cook, 2006), we thought this domain was ideal for combining our interest in avian visual cognition with the development of a more open-ended multiple-choice task. The use of such multiple-choice procedures in pigeons has been valuably explored in several other settings (e.g. Huber, Apfalter, Steurer, & Prossinger, 2005; Lea et al., 2018; Roitblat, 1980).

In designing the specifics of the task for studying depth perception in pigeons, we were strongly influenced by the prior work of Todd (2004) and his colleagues in examining human depth perception using curved shaded surfaces. In their work, humans reported which of several points on a surface were in the apparent front or back of a scene based on various shading cues, as indicated by the presence of target dots placed into the scene. Our thinking was to get pigeons to similarly report the apparent depth of objects within a scene by developing a procedure in which the pigeons received multiple opportunities to report the ordinal depth position of different objects placed within a 3D rendered scene.

To do so with the pigeons, we designed a multiple-choice task to examine whether they could identify the apparent positional ordering of objects based on different monocular depth cues. Using digital software, we create semirealistic 3D scenes in which three objects (sphere, cube, cylinder) appeared within the context of a flat, textured surface. In each scene, three objects varied in color, size, orientation, interposition, and relative placement to create a large pool of stimuli. The pigeons were first taught to make choice responses to multicolored discs that would be placed on the scene and would serve as the target response areas. Their task on each discrimination trial was to progressively choose the target associated with the object that appeared closest in pictorial depth. Thus, across the two choices within a trial the pigeons had to respond to the objects’ depth relationships with the general goal of having the pigeons “describe” the location of various objects within a scene, first the closest object and then second the object in the middle. The first experiment examined how the pigeons learned this discrimination, while the second experiment examined the pictorial depth cues controlling the discrimination and their generalizability by examining transfer to novel objects.

Experiment 1

The first experiment examined how the pigeons learned to indicate the ordinal position of the objects in this type of multiple-choice discrimination using response targets. After first shaping the pigeons to peck at the response targets in isolation, the first phase of the experiment examined how pigeons performed when only required to make one choice between two targets placed over the set of the three objects. After showing good evidence of learning this simple task, the second phase of the experiment added a third target, and the pigeons were required to make two successive choices to each display. A diagram of this latter procedure is provided in Fig. 1. In this case, after pecking a ready signal, a trial began with all three objects having a target area imposed on top of them. Following this three-alternative choice, there was a second choice test between the two targets not chosen on the first choice. The pigeons were rewarded after each choice for selecting the target on the pictorially closest object at the moment of each choice (front, then middle when required). We deliberately employed a large and diverse set of stimuli to prevent the pigeons from memorizing specific scenes or object configurations and to promote the creation of as flexible a response rule as possible.

Fig. 1
figure 1

Example timeline sequence of a multiple-choice trial in Phase 2 of Experiment 1. *Food reward was contingent on making a correct choice to the target on the closest object at each time for a choice and probabilistic reinforcement contingencies—see text for details



Four male pigeons (Columba livia) were tested. One pigeon (I3) was experimentally naïve at the start of the experiment. The other pigeons (C1, G2, & L4) had previously been tested with different discriminations including 3D action, time perception, and auditory processing. The pigeons were maintained at 80%–85% of their free feeding weight based on daily weighing. They had free access to water and grit and were kept on a 12:12 light:dark cycle.


Testing occurred in two flat-black Plexiglas operant chambers. Both chambers were equipped with an infrared touch screen (EZscreen EZ-150-Wave-USB) on one wall and a 28-V houselight that was continuously lit. Food reinforcement was provided through a square 5 cm × 5 cm access hole centrally positioned below the touch screen. Stimuli were presented on the monitor (Dell E153FPf or NEC Accusync LCD51VM-BK, 1,024 × 768 resolution), situated directly behind the touch screen. Pigeons C1 & G2 were always tested in the first chamber, and I3 & L4 were tested in the second. All experimental events were controlled through a Microsoft Visual Basic program.


Scene stimuli were 10.5 × 10.5 cm square images rendered using 3DS Max (v2013, Autodesk). All scenes had a depicted ground, which was a brown, textured plane that was shaded to give it the appearance of receding in depth. By vertically flipping the images, this ground was presented at either the bottom or the top of the scene. Each scene also contained three objects: a cube, a hollowed cylinder, and a sphere (see Fig. 1). These objects could be colored as either matte red, blue, or green, with lighting and shading to provide the appearance of depth and edges within the object. A total of 27 different object/color combinations were tested, ranging from having all three objects being different colors to all three being the same color, or combinations in between. The primary light source for shading the scene was situated in one of four locations equidistant from the origin of the scene, located 20° to the left of the camera, 65° to the left of the camera, 20° to the right of the camera, or 65° to the right of the camera. These scene variations (27 color configurations × 4 light configurations × 2 ground position configurations = 216) were applied to all 3D scene configurations of the objects, which are described next.

The 3D scene configurations were generated by systematically varying the rendered placement and size of the three objects. This variation allowed for different pictorial depth cues—object interposition/occlusion, object relative size, and object height in field—to be present to varying degrees. Our systematic variation allowed for four different size configurations, four different height-in-field configurations, and eight different occlusion configurations.

Object size was judged using the difference between the highest and lowest pixel in the scene image (i.e., vertical extent). The four size configurations consisted of the following arrangements: One configuration had a size difference between each of the front, middle, and back objects, thus making relative size an informative depth cue in the comparison of all the objects. The second configuration had all three objects at the same size, making relative size an uninformative depth cue. The third configuration had a size difference between the front and the middle objects, while the middle and back objects were about equal in size, and the fourth configuration had a size difference between the middle and back objects, while the front and middle objects were about equal. Thus, these two size configurations permitted two of the objects to be compared using size as an informative cue. When configurations required size differences between objects, their vertical extent differed on average by 40%, with a minimum of 10% difference, and in rare cases as large as 125% (i.e., over twice as large in comparison). In absolute terms, the front object averaged approximately 41 mm, the middle object 36 mm, and the back object 31 mm, with standard deviations of 5.5 mm.

The four height-in-field configurations were organized analogously to the four size configurations. An object’s relative height in field was measured using the midpoint of the object. One configuration placed the objects so that all object comparisons were informed by this depth cue, while the second placed them at equally heights relative to the field. The third and fourth configurations permitted two of the objects be judge using this cue, but not the third. To make this depth cue informative, the distance between the midpoints of objects had to be at least 20% of their averaged size, averaging a 78% difference in our stimulus set, with a maximum of 230% difference (i.e., more than two object-extents separated). To prevent the height on the screen acting as a discriminative cue, half the trials were tested with the scene inverted so that textured ground appeared towards the top of the image and the front object would be located at the top of array, thus inverting the height in field relative to the objects.

Eight occlusion configurations were used which varied the degree of interposition among the three objects. In the most informative condition, the front object partially overlapped both the middle and the back object, and the middle object partially overlapped the back object. In the completely unoccluded condition, there was no overlap among any of the three objects. The remaining six configurations varied the degree of overlap between different combinations of the objects (e.g., front and back overlap, but not middle). On average, the overlapping portion was about 13% of the whole unoccluded object, with a minimum of 10% occluded and a maximum of 20% occluded. In addition to these three depth-relevant properties of the displays, the front-to-back and left-to-right order of the objects’ position were also equated among the three objects.

With these different variations, we had a potential total of 995,328 distinct scene stimuli (4 size configurations × 4 height in field configurations × 8 occlusion configurations × 6 front-to-back orders × 6 left-to-right orders × 216 lighting and scene variations). Of course, certain combinations were impossible regarding 3D physics and were consequently excluded. We also ensured that each scene configuration had at least one cue available for judging among the objects. As a result, each session pseudorandomly sampled from a pool of 233,280 possible scenes by sampling 1,080 scene configurations and making them using our 216 scene and lighting variations. In this effort, we attempted to properly balance the counts for each. Thus, they were composed of all size configurations (276 of each informative condition, 252 of the uninformative condition), height-in-field configurations (276 of each informative condition, 252 of the uninformative condition), left-to-right orders (180 each), and front-to-back object orders (180 each). Occlusion was absent in half of the scene configurations (540), and the other half was composed equally of the other seven occlusion configurations.

Finally, the target response areas were identical 12-mm (outer diameter) annuli. They consisted of series of multiple colored bands around the ring so that they would be visible when placed against any shading of an object’s color (see Fig. 1). They were randomly placed in the interior of each object on each trial with the constraint they not touch any exterior edges.



After the naïve pigeon (I3) was autoshaped to peck the display for food reward, the pigeons were all trained to peck the target area in isolation over a few sessions. Each trial started with a peck at a white 2.9-mm circular ready signal. Following this, the pigeons were first trained to peck at a single target placed on a solid black background. Once pecking was established, this was followed by training to peck at a target randomly located within one of 15 uniformly colored 15.5 × 10.5 cm rectangles that were randomly used across trials. Once pecking was established, the rectangles were replaced by the scenes described previously and discrimination training began.

Phase 1: One-choice discrimination training

The pigeons were first trained to make the essential ordinal depth discrimination using a simpler one-choice variation of the task. In this procedure, only two of the three objects in the scene were randomly selected to have targets placed on them.

After starting each trial with a peck to the ready signal, a scene with targets on two different objects was displayed, located randomly within the bottom third of the computer screen. Random scene placement on the screen was used to prevent absolute spatial biases and promote the pigeons’ processing of the scene and its contents. The pigeons were then required to peck (variable ratio = 6–9) at the target placed on the object appearing closest in depth to obtain food reward (hopper duration 2.5 s, except L4 who was given 4.5 s). Pecks at the other target on the display were considered incorrect selections and terminated the trial. Trials not completed within 10 min were terminated. After a choice, there was an intertrial interval (ITI) of 3 s.

Each session consisted of 216 trials. Each session tested 216 scene configurations, pseudorandomly selected to maintain a balanced selection of different scene configurations every five sessions of testing. Due to an error in the programming, pigeons I3 and L4 received scenes only lit from the right at 20° or from the left at 65° during experiment 1. Phase 1 consisted of 20 sessions for pigeons I3 and L4, and 32 sessions for C1 and G2. Only the first 20 sessions were analyzed.

Phase 2: Two-choice discrimination training

We then started the two-choice phase. Here, targets were placed on all three objects, and the pigeons were required to make two successive choices (each choice VR6–9). In the first choice, selecting the target located on the perceptually closest object of the three provided immediate food reward. Selecting the other two targets on the display was considered incorrect. Regardless of first-choice accuracy, the screen was blanked for 1 s, and the same scene would return, less the just-selected target. If reinforcement was delivered for the first choice, food was available during the 1-s blanked period and extended for 1.5 s past the scene’s return. The object behind the just-selected target remained present. The pigeons now needed to make a second choice between the remaining two targets. Again, selecting the target on the object closest in depth from the targets still available resulted in food reward, and an incorrect selection had no programmed consequences. Either outcome ended the trial and resulted in a 3-s ITI. Each session consisted of 216 two-choice trials. This phase lasted a total of 35–45 complete sessions depending on the bird. Incomplete sessions, possibly due to satiation from the 432 possible reinforcers, were omitted.

At this point, we made two procedural modifications to help integrate the two successive choices more seamlessly as a single trial. First, we made food reward probabilistic (50%) after correct first choices. Food reward after correct second choices remained at 100%. Second, we separately required a correct first response to make a second choice on a randomly selected 64 trials. On these trials, if the pigeon made an incorrect first choice, the trial simply ended. These changes extended training for another 28–36 sessions. When evaluated, we found that neither of these modifications changed the pigeons’ choice behavior, and all of these sessions have been combined for the analyses below. Depending on the bird, Phase 2 lasted 63–79 sessions. For the purpose of the results we examined the first 60 sessions.


Phase 1: One-choice discrimination training

The pigeons acquired the one-choice discrimination easily. The left portion of Fig. 2 shows the increase in average choice accuracy for the four pigeons over the first 20 sessions of Phase 1. Because only a single choice was required between the two target areas, chance was considered equal to 50%. The pigeons were significantly above chance by the seventh session (seventh session accuracy = 61%), t(3) = 5.7, p = .010, d = 2.9.

Fig. 2
figure 2

Mean choice accuracy for the four pigeons during Phase 1 and Phase 2 of Experiment 1. Chance for Phase 1 was 50%, since only a single choice with two target areas was required. Because two choices were required in Phase 2, first choice chance was 33% as three target areas were present and was set 50% for the second choice as only two target areas were present. Error bars depict the standard error

To better understand how the pigeons learned the task, we examined their performance when the two targets were located on the objects at the three possible depth combinations (i.e., front vs. middle, front vs. back & middle vs. back). A repeated-measures (RM) analysis of variance (ANOVA; depth combination × four-session block) evaluating choice accuracy revealed a significant main effect of depth combination, F(2, 6) = 22.0, p = .002, ηp2 = .88, and a main effect of session, F(4, 12) = 3.8, p = .031, ηp2 = .56, but no interaction between the two factors, F(8, 24) = 2.0, p = .085, over Phase 1.

Looking at the levels of accuracy over the final four-session block of Phase 1, all four pigeons were found to be more accurate when the targets were located on the front and back objects (mean accuracy = 70%), t(3) = 3.8, p = .032, d = 1.9, and poorest when they were located on the front and middle object (59%), t(3) = 3.2, p = .048, d = 1.6, with the middle and back combination being in between (64%), t(3) = 4.8, p = .017, d = 2.4.

Reaction time (RT) was measured as the time from stimulus onset to the time of the first peck on a target. Long RTs over 10 s were excluded from this analysis (about 2% excluded). Over the last four-session block of Phase 1, mean RT for correct choice responses were faster (1,456 ms) than for incorrect choice responses (1,603 ms). This correct/incorrect RT difference was found in all four birds and across all three combinations of depth placement. Overall, RTs were fastest when the targets were on front and middle objects (1,405 ms), while targets on the middle and back objects were responded to the slowest (1,652 ms) and the front and back combination was between (1,433 ms). An RM ANOVA evaluating reaction time in this block on these depth combinations confirmed a significant main effect, F(2, 6) = 16.6, p = .004, ηp2 = .85; RTs were log transformed prior to analysis for normality. Post hoc comparisons suggested that the pigeons were slower with the middle/back discrimination relative to both the front/middle (p = .009) and front/back conditions (p = .039). There was no difference between the front/middle and front/back conditions (p = .627).

Phase 2: Two-choice discrimination training

The right portion of Fig. 2 shows mean choice accuracy for all four pigeons during Phase 2. Choice accuracy for the first choice (black symbols), where three targets were available, was judged relative to chance set to 33%, while accuracy for the second choice (gray symbols), where two targets were available, was judged relative to chance set at 50%. Despite the increased difficulty of the first choice with the addition of the third target, second choice accuracy in Phase 2 remained at levels comparable to that observed during Phase 1 and did not change over the phase, F(14, 42) = 0.8, p = .65. In contrast, first-choice accuracy grew slowly over the phase, F(14, 42) = 3.1, p = .002, ηp2 = .51. The pigeons gradually increased their accuracy on this more challenging first choice over the initial 40 sessions of Phase 2 before reaching an apparent asymptote at around 60%. Examining accuracy during just the first session of Phase 2, the pigeons exhibited above-chance discrimination upon the introduction of the third target (first-choice accuracy = 50%; chance = 33%), t(3) = 4.0, p = .028, d = 2.0, and the addition of a second choice (second choice: first session accuracy = 64%; chance = 50%), t(3) = 5.2, p = .014, d = 2.6.

Examining the last two blocks of this phase, the pigeons correctly selected the front object on average 68% of the time and made errors by selecting the middle object (23%) or back object (9%) on the remainder of the trials. The pigeons made two correct choices in a row 46% of the time, which should happen by chance 17% of the time. The pigeons made a correct first choice followed by an incorrect second choice 22% of the time, and made an incorrect first choice followed by a correct second choice 20% of the time. The pigeons made two incorrect choices in a row on 12% of trials where that was possible.

Certain depth combinations of objects supported better accuracy than others in Phase 1. We conducted a similar analysis for those depth combinations for the second choices in Phase 2. One important difference here is that now the pigeons had already made a first choice to one of the three targets (front, middle, or back object), which had been either correct or not. Examining the last two blocks of Phase 2, all four pigeons performed better on their second choice when the first choice was correct (66%) versus when the first choice was incorrect (62%). A more detailed RM ANOVA (depth combination) on second choice accuracy revealed a significant main effect of depth combination, F(2, 6) = 38.3, p < .001, ηp2 = .93.

When the pigeons’ second choice was between the middle and back object (i.e., they had made a correct front target first choice) they were successful at choosing the middle object (accuracy = 66%, chance = 50%), t(3) = 6.4, p = .008, d = 1.0. When the pigeons second choice was between the front and back object (i.e., they had made an incorrect middle object first choice) they were also accurate in choosing the front object (accuracy = 66%), t(3) = 4.1, p = .026, d = 0.9. When the pigeons second choice was between the front and middle object (i.e., they had made an incorrect back object first choice) they were at chance in choosing the front object (accuracy = 50%), t(3) = 0.1, p = .924.


This experiment found that the pigeons can learn a multiple-choice discrimination involving the placement of target response areas on top of the different objects placed at varying apparent depths in a complex scene. All four pigeons learned to make both above-chance first and second choices as determined by the ordinal position of the objects in pictorial depth. Although using completely different approaches, these results converge with those reached by Cavoto and Cook (2006) in their examination of depth perception by pigeons. Their discrimination involved a go/no-go task in which pigeons had to peck at one particular depth order (S+) of specifically colored objects for food reward while suppressing their pecking to the five other possible orderings of the same objects (S−). In contrast, the current task required the pigeons to choose a target on the nearest object followed by choosing the middle object, but critically, the correct object identity and scene composition varied immensely from trial to trial. Despite these differences in the nature of the required discrimination, the response methodology, and the appearance of the stimulus sets, both studies provide good converging evidence that pigeons can perceive apparent depth relations among objects rendered in 3D scenes.

In general, when two of the three objects had targets, the pigeons’ best performance occurred when the back object was a possible response option (i.e., front vs. back or middle vs. back). There are a few possible factors that may contribute to this finding. First, the constraints of 3D physics impact the appearance of the objects even though the target areas associated with the objects were equal in size. When relative size was an informative cue, objects in front were larger than objects behind, and when occlusion was an informative cue, objects in front were less likely to be occluded (i.e., never for the front object and sometimes for the middle object). The back object was therefore sometimes smaller, and its full contour and surface features were sometimes not visible. Both of these features may have made the front object more salient as the rewarded response option independent of the processing of ordinal position, and the middle object would have also shared some of these same benefits.

Second, if pigeons have a particular difficulty with amodal completion (Qadri & Cook, 2015), for example, then the depth-independent shape recognition of the back objects would be more impacted than front objects and impair possible object-specific associations. Third, given that choosing the back object was never rewarded, the back object may have been more aversive overall. In fact, selection of the back object in the first choice led to chance level performance in the second choice, possibly indicating poor perceptual conditions for the depth discrimination or disengagement by the pigeons. Alternatively, the pigeons may have recognized that since they were on the second choice of the scene, they needed to select the middle object within the scene, which would be incorrect if the front object was still available, as is the case when they make an error on the first choice. Such an ordinal-object bias would boost performance in the middle versus back discrimination and impair the front versus middle discrimination, consistent with the observed data.

Lastly, the response rule used by the pigeons generalized well in dealing with the target areas used to judge as object’s position. This flexibility was reflected in several ways. First, when the third response area and second choice requirement were added in Phase 2, the pigeons showed significant above-chance transfer. Second, the pigeons could use the rule successfully across two successive choices, first applying to the front object and next to middle object. Finally, this rule had to be general enough to deal with the very large number of variable situations employed here. Besides their difference in apparent depth, the objects varied in their identity, location, and color, along with the various scene and lighting characteristics present or absent in each trial. Given the number of possible scenes and number of training trials, stimuli rarely repeated more than a few times across the entire experiment, likely forcing the pigeons to generate as flexible a response rule as possible.

Experiment 2

With the pigeons successfully discriminating the ordinal position of the objects across successive choices in Experiment 1, the second experiment examined the basis for these discriminations. This experiment consisted of two parts.

We started with analyses of the degree of control by different pictorial depth cues, since food reward was specifically based on the ordinal position of the objects in the scene as determined by their pictorial depth. While the stimulus design of Experiment 1 created the desired variety and number of scenes, it did not allow for effective examination of the contribution by specific cues. To evaluate these contributions in Experiment 2, we constructed controlled scene stimuli that were designed to present the different pictorial depth cues in isolation. The three cues examined were (1) the occlusion caused by the interposition of the objects, (2) the relative size of the different objects, and (3) their relative height in field. These three pictorial cues had been used by Cavoto and Cook (2006) to contribute to object depth perception in their go/no-go task, and we hypothesized that their usage would be replicated in the present choice discrimination as well.

The second part of the experiment examined the transfer of discrimination to three new objects. If the pigeons had indeed learned a generalized rule about ordinal processing of objects in depth, this rule should be effective in determining the ordinal position of novel objects placed within the scene. For this test, we introduce three novel objects, pyramid, spool, and top, placed into the scenes using the same locational rules as for the trained objects. Using probe tests, any generalized rule should allow the pigeons to demonstrate above-chance accuracy to the target response areas associated with these new objects, as determined by their ordinal position. Following this transfer testing, the novel objects were integrated in to training to further examine performance.


Animals and apparatus

The same birds and apparatuses were used as in Experiment 1.


Pictorial depth cue stimuli

New scenes were generated for these tests. In these scenes, only a single monocular depth cue was available for discriminating the ordinal position of the objects. Three cues were examined involving (1) the occlusion caused by the interposition of the objects, (2) the relative size of the different objects, and (3) their relative height in field. In each case, the objects were first positioned at the same virtual depth in the scene, then only their relative sizes, height in field, or their relative interposition was manipulated. In all cases, the objects within a scene were the same color, though all three colors were tested. Representative examples of these stimuli are shown in Fig. 3.

Fig. 3
figure 3

Average choice accuracy during the isolated pictorial depth cue tests in Experment 2. Dashed lines indicate likelihood of a correct response by chance for the first (33%) and second (50%) choices. Example stimuli are included for each condition below label (see text for details). Error bars depict the standard error

For tests of relative size, successive objects (perceptually successive in ordinal position) were approximately 28% larger than the previous item, resulting in one small, one medium and one large object per scene. For these scenes, the height in field was kept constant among the objects, but varied across the high, middle, and low placements in the scene across stimuli, and objects were not allowed to overlap. The transfer stimuli were counterbalanced for object size combinations and object left-right positions.

Similarly, for stimuli varying in height in field, such that successive objects (perceptually successive in ordinal position) were vertically displaced by approximately half the object height. These scenes were counterbalanced for height-in-field combinations and objects left–right positions. The objects were always the same size within a scene, but were made to vary (i.e., small, medium, large) across stimuli, and objects were not allowed to overlap. While training stimuli continued to be tested in both their “normal” and inverted orientations, the test stimuli were presented only with the “normal” orientation.

For stimuli testing the occlusion cue, the three objects were made to overlap such that occluded objects (middle and back) were each 80% visible. For this test, the objects were all of a fixed size (medium) and all vertically centered within the scene (middle). This arrangement allowed only left-to-right or right-to-left object orderings to be tested and objects left–right positions was counterbalanced.

Novel objects

For the novel transfer test, a new set of three objects was generated. These objects were labeled the pyramid, top, and spool. The spool was derived from the original cylinder primitive, but was rotated into a different orientation with a much smaller inner diameter opening. These objects were created using the same colors and range of sizes as the training objects. A representative example of these new objects is shown in Fig. 4. A comparable set of 233,280 scenes using these novel objects was generated using the same principles as the first stimulus set.

Fig. 4
figure 4

Examples of the novel objects tested in Experiment 2. The red object is an example of the “top,” the blue object is an example of the “pyramid,” and the green object shows the “spool.” Object color varied across trials


Pictorial depth cue tests

The pigeons were probe tested using the controlled scenes containing only a single monocular depth cue and the control scenes in which none of these cues were present. On these probe test trials, both choices were nonrewarded, with any choice advancing the trial to the next stage.

In a test session, 24 trials were randomly intermixed into the standard 216-trial baseline session. These 24 trials consisted of six occlusion-only test trials, six relative size-only test trials, six height-in-field test trials, and six no-depth control trials. A total of 12 test sessions were conducted, and test sessions were separated by one or more baseline sessions. Across these test sessions, there was equal testing of all combinations of left-to-right positions, object depth orders, and colors. Pigeon I3 had intermittent difficulty completing sessions during this time, including test sessions. For this pigeon, testing continued until 12 complete sessions were obtained.

Transfer to novel objects

This part of the experiment had two phases. In the transfer phase the specific scenes with the new objects were tested as nonreinforced probe trials. After these transfer tests were completed, all scene types with the new objects were introduced into training and reinforced using the same contingences as the original objects.

For the transfer test phase, 12 scenes from the larger novel object stimulus set were selected. These were selected to have unambiguous and redundant depth cues of all three types (occlusion, relative size, and height in field). Additionally, all three objects were tested as different colors from each other to prevent any confusion regarding their shapes (one red, one green, one blue, with object identity and color varied across trials). In a test session, 12 trials were randomly intermixed into the standard 216-trial baseline session. On novel object probe test trials, both choices were nonrewarded, with any choice advancing the trial to the next stage. A total of four novel transfer test sessions were conducted.

After completing this test, the full set of novel object scenes were integrated into the baseline testing. During this phase, the pigeons were tested equally with trials having scenes containing either the original or novel objects, and all choices for both object sets were rewarded using the contingences described in in the latter portion of Phase 2 of Experiment 1. Scenes with the novel objects also received probabilistic (50%) reinforcement for the first choice and separately were just as likely to require a correct first response to make a second choice. Within a session, the presented scenes were pseudorandomly selected from the larger pool of both the original stimulus set and the novel objects’ set to permit roughly equal testing of all configurations over several sessions, with balanced selections of object color, scene lighting, and ground position within each session. The specific combinations of object type (new/old) and scene variation were not counterbalanced, but the algorithm used to randomly select scenes attempted to minimize scene reuse before repeating scenes. This phase consisted of ten 216-trial sessions.


Pictorial depth cue tests

Overall, the pigeons demonstrated successful processing of the objects’ ordinal positions based on a single depth cue, at least on their first choice. Figure 3 depicts mean choice accuracy for all four pigeons with the different monocular test scenes in comparison to baseline trials and control scenes. The pigeons were consistently better on the familiar baseline trials in comparison with the combined depth cue tests: first choice paired, t(3) = 28.0, p = .001, d = 14.0; second choice paired, t(3) = 3.8, p = .033, d = 1.9. This was likely because of the reduced number of monocular cues present in the latter condition. On the depth cue tests, the pigeons on average chose correctly 47% of time on their first choice (chance = 33%) and 57% on the second choice (chance = 50%). When no cues were present in the control scenes, pigeons chose at near chance levels on their first (mean accuracy = 29%), t(3) = 1.0, p = .373, and second choice (mean accuracy = 48%), t(3) = 0.4, p = .716.

On their first choice, the pigeons were significantly above chance on the height-in-field tests = 47%, t(3) = 3.7, p = .035, d = 1.8, and the occlusion tests = 48%, t(3) = 7.0, p = .006, d = 3.5. This one-sample t test for the relative size test was found to be nonsignificant, t(3) = 2.1, p = .125, because of a statistical anomaly. One pigeon (L4 = 66%) performed much better than the other pigeons with this cue (C1 = 38%, G2 = 42%, I3 = 42%). Analyzing just these three pigeons with poorer performance confirmed their significant difference from chance, t(2) = 4.7, p = .042, d = 2.7. As a group across all test trials, the pigeons did not differ significantly from chance on their second choice across all three depth cue tests (height in field = 58%), t(3) = 3.1, p = .053; occlusion = 56%, t(3) = 1.2, p = .316; relative size = 57%, t(3) = 1.6, p = .214.

Given that results from Experiment 1 suggested that the pigeons’ second-choice performance might have been affected by first-choice accuracy, we looked for sequential effects in these depth tests. The birds were substantially better on their second choice in these cue tests if their first choice had been initially correct to the front object. The pigeons were significantly above chance for all three depth cues in their second choice after correct initial choices: height in field = 69%, t(3) = 5.7, p = .011, d = 2.9; occlusion = 66%, t(3) = 3.3, p = .047, d = 1.6; relative size = 61%, t(3) = 4.1, p = .026, d = 2.0, and not after initial incorrect choices: height in field = 49%, t(3) = 0.3, p = .794; occlusion = 46%, t(3) = 0.6, p = .568; relative size = 56%, t(3) = 0.8, p = .491. In the baseline trials from the same session, however, accuracy after initial correct and incorrect choices were both above chance: correct 67%, t(3) = 6.0, p = .009, d = 3.0; incorrect 65%, t(3) = 8.9, p = .003, d = 4.5.

Transfer to novel objects

The pigeons showed excellent transfer of their learned discrimination to the scenes containing novel objects. During the four transfer sessions, mean first-choice accuracy on trials with novel objects was 85%, and second-choice accuracy was 83%. Both of these values were significantly above chance, ts(3) > 12.7, ps < .001, ds > 6.4, and significantly better than with the baseline objects, ts(3) > 6.8, ps < .006, ds > 3.4. The latter was likely due to the curated nature of the novel object test trials to have multiple redundant apparent depth cues present in them.

Figure 5 depicts mean first-choice and second-choice accuracy for the four pigeons in Experiment 2 from the point where the new objects were integrated with the original objects in the training regime. Separate RM ANOVAs (object set × session) using first-choice or second-choice accuracy revealed a significant main effect of object set on first-choice accuracy, F(1, 3) = 62.9, p = .004, ηp2 = .95, but not on second-choice accuracy. There were no main effects or interactions with session in either ANOVA.

Fig. 5
figure 5

Mean choice accuracy for the four pigeons in Experiment 2 with the new and original objects on first choice (left) and second choice (right) across sessions. Error bars depict the standard error


In this experiment, each of the three tested pictorial depth cues supported accurate responding when presented in isolation, especially in their first choice. The presence of just object occlusion, just a difference in relative size, or just a difference in height in field, each individually supported above-chance first-choice accuracy. These cues also supported above-chance second-choice accuracy, at least when the pigeons’ initial choices were correct. Overall accuracy on these isolated tests of the different depth cues was reduced in comparison with the baseline condition. This difference was expected as the baseline scenes typically had multiple converging depth cues to support performance that were not present in the test situation. This redundant cue facilitation was also observed by Cavoto and Cook (2006), who found that scenes using three cues supported sharper discrimination than scenes with two cues or one cue.

The pigeons’ accuracy on the second choice in the depth cue tests was more complicated. When aggregated across all trials, their mean accuracy was not different from chance. However, when conditionalized on whether the first choice was accurate or not, second choice accuracy was found to be significantly above chance following an initial correct choice. This suggests that when the pigeons could correctly identify the front object, they were also able to distinguish between the middle and back objects. It was apparently more difficult for the pigeons to recover when the first choice was inaccurate. This may have been due to general inattention or ordinal-object bias on some subset of these trials that impaired both choices. While the exact source of the reduced second choice performance is difficult to know, it is clear from the results that the birds demonstrated sensitivity to each of the three depth cues designed into the stimuli under most conditions.

Experiment 2 continued to also reveal more evidence of the considerable flexibility in the depth discrimination made by the pigeons. This was demonstrated by the excellent first and second-choice discrimination transfer found in all birds for scenes containing novel objects. In these probe tests, the presence of the new objects in the scene did not impede the birds’ accurate determination of their ordinal positions.

We believe that at least a part of this excellent transfer is attributable to the curated nature of the scenes using the novel objects. Recall, the stimulus configurations with the novel objects contained all three pictorial cues. This deliberate selection likely resulted in another example of redundant facilitation, corroborated by the lower apparent accuracy when the novel objects were introduced into daily training using the diverse set of scene configurations. This discriminative flexibility in accurately placing both familiar and novel objects in proper ordinal depth is likely a result of the large number of scenes used during training (cf. Daniel, Wright, & Katz, 2015) and perhaps from having to deal with making multiple choices to the same stimulus within a trial.

General discussion

These two experiments reveal that pigeons can successfully discriminate among the apparent ordinal depth of objects placed into a rendered 3D scene across multiple choices. The results broaden our understanding of how pigeons perceive apparent depth and how they use different types of monocular depth cues. Using a new choice methodology, both experiments provide converging evidence that pigeons can use three types of monocular cues—object interposition, relative size, and height in field—to determine the pictorial depth relationships of objects. Further, there was consistent evidence of redundant cue facilitation, where the presence of multiple cues incrementally improved discrimination. Together, these experiments provide strong evidence that pigeons perceive a three-dimensional world and do so in part by employing the same monocular depth cues used by humans. Despite the use of different response methodologies, different discriminative requirements and different stimulus configurations, the results combine nicely with the prior results of Cavoto and Cook (2006) in finding the use of these same three cues and their redundant facilitation.

Perhaps this result is not surprising since readily available depth information would be an important and valuable visual property for any rapidly moving animal. Given the considerable differences in brain organization in birds and mammals, usage of the same cues suggests similarities across these different classes of animals likely imposed by the environmental constraints of navigating a rich, object-filled world.

Not to be overlooked, these experiments offer a possible new procedure for investigating these and similar questions by attempting to allow pigeons to, in one sense, “describe” a complex visual scene by requiring multiple, successive choices. In this task, the pigeons did not respond directly to the objects themselves as the basis for their discrimination. Instead, they learned to use target response areas associated with the objects’ apparent depth to indicate their choices. This separation and independence of the response area from the stimulus per se creates opportunities to ask different types of questions to the pigeons using this procedure.

For example, these target response areas could be used to explore the perceived depth relationship of surfaces within a single object. Todd (2004) explored various aspects of surface curvature and 3D shape perception using different points located on parts of a complex curved surface. This could now be tested with the pigeons. Such results would allow the determination and computation of the degree of apparent depth supported by different surface cues within the same object. In the current experiments, we elected instead to use pictorial depth to organize the pigeons’ responses to different objects rather than a single object. Further, we chose depth because we were confident that pigeons could report this feature given prior investigation, and any comparable results would help validate this novel methodology. In principle however, extending this strategy to single objects or larger curved surfaces or textures could easily be done.

There are still several extensions of the current task that also require further investigation. One needed extension would be to train a set of birds to report the ordinal position of the objects from back-to-front rather than front-to-back, as was done here. In both the current experiment and in Cavoto and Cook (2006), the front object appeared easiest to discriminate. Given the visibility, stability and salience of the front object, this is perhaps unsurprising. Thus, a back-to-front discrimination might be more difficult for the birds to acquire. Another valuable extension of the current procedure would be to have the objects, in addition to the targets, disappear as they were chosen. Finally, presenting the birds with additional scene complexities, such as more than three objects and more choices, would also be valuable.

Additional future steps in this line of investigation would be to present the same scene multiple times to the pigeons. Through this sort of intensive repeated psychophysical testing, we could probabilistically trace out the perceived depth similarities and differences among the objects and ultimately “recreate” the scene as perceived by the pigeons. Relatedly, understanding the convergence of how the pigeons combine the features and perceive these scenes could be revealed using that same methodology (cf. Lappin, Norman, & Phillips, 2011). While we know each bird could perceive the ordinal apparent depth of the objects and use similar cues, this data set lacked the necessary repetitions to accurately estimate the correlation among the birds or determine how much each bird weighed the contribution of the different pictorial cues toward their perception of displays. If we are interested in avian visual cognition broadly, verifying that the interbird correlation and variation in perceived depth would provide us certainty that we have reconstructed a birds’-eye view, generally. Finally, testing whether pigeons could transfer their current discrimination with these 3D-rendered stimuli to real physical 3D stimuli would give us more information about the relationship between the pigeons’ perception of apparent depth and real depth.

The development of the capacity for the pigeons to respond to different and multiple aspects of a scene based on these types of target areas offers new possibilities to explore various types of other discriminative dimensions and features. Besides the depth discrimination employed here, the same method could be used to explore additional scene or object features, such as their apparent color, brightness, size, motion, lighting source, camera position, or shape. For instance, using discriminative vocabulary of different shape descriptors could open new approaches to studying amodal completion as created by the presence of occlusion in complex scenes. Variations in how successfully an animal might be able to identify the occluded object with the same response or choice label as the unoccluded object should be informative as to what the bird is perceiving as the shape of the object (cf. Pepperberg, 2006; Pepperberg & Nakayama, 2016; Pepperberg, Vicinay, & Cavanagh, 2008).

One of the marked advantages of teaching animal’s language is that it gives them a richer vocabulary to interact with the world. Such a “vocabulary” and open-ended methodology could allow the pigeons to “describe” their perceptions more fully (Pepperberg, 1987, 1988). Using procedures with multiple trained choices, perhaps in succession, presents the possibility of giving nonverbal animals a comparable and richer set of alternatives. One important feature of having more alternatives is the opportunities for making informative errors. Two-choice alternatives are not particularly revealing on this front, but when multiple or open-ended alternatives are available, the distribution and types of choice errors can be highly informative as the nature of the underlying representation (e.g., Blough & Blough, 1997; Cook & Rosen, 2010; Roitblat, 1980; Wasserman, Kiedinger, & Bhatt, 1988). This is one of the essential properties of why spatial search tasks have been so productive (Cheng et al., 2006).

We suggest that the current procedure moves us closer to a production-like task for pigeons, allowing them, for example, to make more open-ended responses with fewer discriminative constraints. This capacity revealed in this investigation that pigeons were indeed sensitive to the depth cues and able to sequentially indicate objects in depth, primarily in sequences of correct responding. Of course, with increased response flexibility comes an associated increase in the challenge of analyzing complex and more conditionalized sequences of behavior. Modern digital analysis and visualization techniques offer new tools to meet these challenges. While perhaps not as open-ended as is possible with a language-trained animal, such as Alex or other vocal African grey parrots, the kind of multiple-successive-choice task used here can be more efficiently and effectively deployed to a much wider variety of animals.

Author notes

All of the data and materials for the experiments reported here are available upon request. None of the experiments were preregistered.


Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Suzanne L. Gray.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Gray, S.L., Qadri, M.A.J. & Cook, R.G. Towards describing scenes by animals: Pigeons’ ordinal discrimination of objects varying in depth. Learn Behav 49, 85–98 (2021).

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


  • Pigeons
  • Depth perception
  • Production tasks
  • Monocular cues
  • Scene processing