Animal Cognition

, Volume 9, Issue 4, pp 271–279

Towards a “virtual pigeon”: A new technique for investigating avian social perception


    • Department of PsychologyKeio University
  • Nikolaus F. Troje
    • Department of PsychologyQueen’s University
Original Article

DOI: 10.1007/s10071-006-0048-1

Cite this article as:
Watanabe, S. & Troje, N.F. Anim Cogn (2006) 9: 271. doi:10.1007/s10071-006-0048-1


The purpose of the present study is to examine the applicability of a computer-generated, virtual animal to study animal cognition. Pigeons were trained to discriminate between movies of a real pigeon and a rat. Then, they were tested with movies of the computer-generated (CG) pigeon. Subjects showed generalization to the CG pigeon, however, they also responded to modified versions in which the CG pigeon was showing impossible movement, namely hopping and walking without its head bobbing. Hence, the pigeons did not attend to these particular details of the display. When they were trained to discriminate between the normal and the modified version of the CG pigeon, they were able to learn the discrimination. The results of an additional partial occlusion test suggest that the subjects used head movement as a cue for the usual vs. unusual CG pigeon discrimination.


Social behavior of animals can often be elicited by quite simple features. For example, a red spot on a beak releases begging behavior of nestlings in black-headed gulls (Tinbergen 1951). Many other examples for releaser or key stimuli demonstrate similar kinds of behavior in the context of conspecific cognition. To identify critical features, many researchers have come up with different sorts of artificial social stimuli. To develop a suitable, well-controlled stimulus is also crucial in identifying diagnostic features used for discrimination in operant conditioning experiments. The sophistication of artificial visual stimuli has greatly profited from the advent of computer graphics and animation and, more recently, from the possibility to generate virtual realities. The visual ability of birds is of particular interest because it is comparable to primate vision – although birds have smaller brains and a distant evolutionary relationship to primates (Cook 2000). Here we introduce experiments using a virtual bird and we study the potential of this technique for understanding animal cognition.

Still photographs

Historically, we can identify three major technical innovations for the study of animal visual cognition. The first is photography, and particularly the employment of a slide projector. Using a slide projector, the experimenter can present a huge number of stimuli to subjects (Herrnstein and Loveland 1964). A basic problem in using visual media instead of real objects is the inherent reduction of information. Photographic pictures are two-dimensional, while real objects are usually three-dimensional. Photographs cannot capture ultraviolet, while some species can perceive it. Even if the visible spectrum of an animal is similar to ours, their way of sampling it may be very different. The colors on a photographic image are designed to be metameric to the colors of the depicted scene for the human visual system but may appear very different to the animal.

Can animals perceive photographs as being real? Watanabe (1993) trained pigeons to discriminate real food (corn, hempseed, etc.) from non-food objects (stone, twigs, etc.). Each stimulus item was placed on a small box fixed on a motor-driven belt conveyer in front of a transparent pecking key so that pigeons could see the items successively through the transparent pecking key. After the pigeons learned the discrimination, they were tested with printed color photographs of food and non-food objects. They showed a clear transfer from the real objects to their photographs. Another group of pigeons were trained with the photographs first and were then tested with the real objects. The subjects also showed transfer of discrimination from photographs to real objects. Several researchers demonstrated the real objects-photograph transfer (Lumsden 1977; Delius 1992). As TV monitors became popular and affordable, photographs presented on slide projectors were gradually replaced by video and computer technology.

Still images on a TV monitor

The second technical innovation is the TV monitor. Video recorded images can be presented on a computer monitor just as they are on a slide projector, however, manipulating the images and generating a variety of artificial stimuli, such as chimera and morphed stimuli became much easier. There are, however, a few problems in cathode-ray video device. D’Eath (1988) and Watanabe and Furuya (1997) pointed out several problems with the usage of video systems in animal experiments. As is the case with printed on projected photographs, video has been designed to match human perception of color. The color mixture system of video systems is trichromatic and suited to the particular spectra of the human cone pigments. Pigeons, on the other hand, can see ultraviolet light and they have at least four different spectral photoreceptor classes that are differently distributed across the spectrum compared to humans (Remmy and Emmerton 1987).

Another problem arising from the use of conventional CRT video monitors is the fact that they flicker with a relatively low frequency. Conventional TV video signals work with refresh rates of 50–60 Hz. While this flicker is virtually invisible to humans, pigeons have a higher critical flicker fusion frequency (CFF) (Powell 1967). The CFF depends on brightness. Using Powell's data, Watanabe and Furuya (1997) estimated a pigeon's CFF to be around 58 Hz at 70 cd/m2, which is about the brightness of a normal TV monitor. Thus, if brightness of the monitor is kept low, the higher CFF of the pigeon is not expected to cause a serious problem. Another recommended way to avoid the problem of CFF is the employment of modern high-frequency CRT displays or the usage of liquid crystal displays (LCD). Even though the refresh rates of LCDs are not very high, they do not display any flicker between successive frames.

Computer-based image manipulation provides many possibilities that previously were unavailable. For example, Watanabe and Jian (1993) trained Bengalese finches on audio-visual discrimination of conspecific individuals and tested them with chimera birds that contained body parts from different individuals. After successful discrimination of individuals, the subjects were tested with the chimera images and with contact calls. When intact visual images were presented, the subjects used visual cues only. But when visual stimuli gave ambiguous (chimera) information, some of the birds switched to using contact calls for discrimination.

Morphing is another technique to investigate which cues are used in discrimination. Watanabe and Furuya (1997) trained pigeons to discriminate between still images of pigeons and starlings and then tested them with morphed versions. The subjects maintained their discrimination for morphing of mixing rate of 20% of S+ and 80% of S−, but not for mixing rate of 40% of S+ and 60% of S−.


Video technology allows not only the showing of still images but also for presenting motion sequences. In an early study, Piscreata (1982) demonstrated stimulus control by movement rate of stimuli displayed on TV monitor in pigeons. Pigeons could discriminate patterns of movement of dots (Bischof et al. 1999). Domestic hens, however, showed difficulty in Y-maze discrimination learning with video images of hens (Patterson-Kane et al. 1997). Using an auto-discrimination procedure (discriminative autoshaping), Dittrich and Lea (1993) could teach pigeons to discriminate moving images of conspecifics. The discrimination was maintained when novel images of pigeons under different viewing angles and other types of motion were presented. Thus, motion provides invariants that help pigeons to form categories. Pigeons could also discriminate particular movement such as “walking” and “pecking” (Dittrich et al. 1998; Jitsumori et al. 1999) and they could even discriminate two words of sign language (Japanese sign language) gestured by a video-taped human demonstrator (Watanabe and Furuya 1997) or between two kinds of movement patterns of dots, which to human observers appear “intentional” and “non-intentional” (Goto et al. 2002). They discriminated the words even when the images were played backwards or still images were presented, but not when a new person demonstrated the words. Thus, they might have used a particular aspect of motion displayed by a particular person. In other words, their discrimination was not discrimination of “words” per se. Quails can discriminate videos of normal, healthy conspecifics from those in an abnormal state induced by either a stimulant or a depressant (Yamazaki et al. 2004). They clearly discriminated hyperactive motion induced by methamphetamine and hypoactive motion induced by ketamine from normal motion displayed after saline injection. The authors concluded that they have a concept of “abnormality”.

Natural, spontaneous reaction to video footage of conspecifics is contradictory. A life-sized moving video of a conspecific did not cause any social reaction in pigeons (Ryan et al. 1994). Contrary, Shimizu (1998) could elicit courtship behavior of male pigeons by video images of female pigeons. Eliciting female courtship behavior by presenting video footage of male birds was demonstrated by Partan et al. (2005). The presence of an unfamiliar conspecific delayed the start of feeding in hens, but its video images failed to affect feeding behavior (D’Eath and Dawkins 1996). Evans and Marler (1991), however, demonstrated an audience effect on alarm calls induced by video displays of conspecifics in chicks. Zebra finches responded with song to video images of conspecific depending on the content of the images (Adret 1997). Video displays of other conspecifics produced a reinforcing effect. Bengalese finches showed preference for self-image to other TV images (Watanabe 2002) and zebra finches learned to peck a key in response to presentation of conspecific video footage (Adret 1997).

In summary, experiments using video displays of conspecifics showed (1) that they can be discriminated, (2) that they have reinforcing effects, and (3) that they can elicit spontaneous, natural behavior.

Computer graphics

Computer graphics technology provides new tools for the examination of animal perception. Animation techniques can be used to create and modify realistic motion stimuli in order to investigate which features are essential for animal cognition. Johansson stimuli, or so-called “biological motion”, is movement of point lights attached to the main joints of the human body (Johansson 1973). While still frames do not reveal any structure, a salient and stable percept of a moving person emerges as soon as the display is set into motion. Omori and Watanabe (unpublished data) examined discrimination of biological motion in pigeons. Pigeons could clearly discriminate point-light displays of a pigeon from that of a toy dog but transfer from the motion of dots to real objects was not obvious. Dittrich et al. (1998) also demonstrated perception of biological motion in pigeons using a different procedure.

Computer animation can also be used to create virtual realities for animals. Cook et al. (2001) created computer-generated moving objects and trained pigeons to discriminate two different ways of virtual approaching. The pigeons could readily learn to discriminate between different patterns of motion.

In this study, we used advanced animation techniques to generate a computer graphic (CG) pigeon and we use it to examine visual cognition of conspecifics. On the one hand, the appearance of the animation is realistic enough to elicit spontaneous social responses in pigeons (Troje et al. unpublished data). On the other hand, we can manipulate the animation in order to produce impossible behavior in the CG bird. Pigeons have their species-specific movements. They walk, but they never show the hopping locomotion observed in many other bird species. While walking, pigeons show a typical head-bobbing behavior: Synchronized with the movements of the feet, the pigeon's head alternates between a hold phase, during which the head is kept stationary in space, and a thrust phase, during which the head is saccadically thrust forward. Pigeons may use features of this species-specific locomotion behavior for species recognition. We investigate this possibility by applying the respective manipulations to the CG pigeon.

Materials and methods


We used four experimental naïve pigeons obtained from the Japanese Association of Racing Pigeons and maintained them at 80% of their free-feeding weights. The temperature of the animal room was maintained at 23°C and artificial illumination cycle was 12L:12D. Water and grit were freely available in the cages.


Standard operant chambers were used (30×25×30 cm, MED). The front panel contained a rectangular transparent pecking key (10×7 cm) through which the subject could see an iMac (Power PC G4 with liquid crystal display) computer monitor. An electronic liquid shutter (UM glass, Tokyo) was placed between the key and the monitor. The distance between the key and the monitor was 15 cm. Stimuli were displayed on the display of the iMac using PowerPoint software. A computer with a MED-SKED system controlled the experiment.


Real animals

We recorded video material of a pigeon and a rat with a digital video camera (Panasonic, NV-MX3000). The same background was used in both cases. The footage was clipped to 30-s sequences. Four such sequences were created showing the pigeon and four sequences showing the rat. The size of the pigeon was approximately 12 cm in height and the body length of the rat was approximately the same (without tail) on the computer monitor.

CG pigeon

Animations of the CG pigeon were created as follows: The model of the bird was generated using the 3-D modeling and animation software Alias|Wavefront Maya. It consisted of an internal, invisible skeleton that was used to control the surrounding surface of the pigeon. The surface itself was modeled as a deformable NURBS object mapped with a texture and covered with feathers that gave the model a somewhat ruffled surface structure. The skeleton was an articulated chain of rigid segments. The head and the body were each modeled as one single rigid segment and were connected by a neck that consisted of eight articulated bones. The head and body segment could be explicitly animated by specifying time series of positions and orientations in three-dimensional space (six degrees of freedom for each segment). Using an IK-solver, the motion of the neck was determined only through head and body motion. The legs consisted of a chain of three segments arranged to model the positions of the femur, tibia, and foot bone. Two “toe” bones originated from the end of the foot bone, one controlling a single back toe, the other one controlling an array of three front toes. Specifying the position of the end of the foot bone and the positions of front and back toe controlled the motion of the feet and legs.

The motion data required to drive the skeleton were acquired using an optical motion capture system equipped with nine CCD cameras (Vicon, Oxford Metrics). The system provides the three-dimensional trajectories of a set of retroreflective markers (4 mm in diameter) with a spatial resolution in the sub-millimeter range and a sampling frequency of 120 Hz.

The markers were attached to the body of a sexually receptive female pigeon and courtship behavior was elicited by presenting her with a male bird. The female's motion was captured during a 1.5-min period and a central piece of 30 s each was eventually used to animate the virtual pigeon.

This sequence was then modified in three different ways. In the first sequence, we replaced the motion of the right foot with a copy of the motion of the left foot. The bird now appeared to be hopping rather than walking. In the second sequence, we erased the head-bobbing movement of the head and fixed the head's position with respect to the body. The head now moved rigidly with the body rather than showing the typical head-bobbing behavior. Finally, we generated a sequence that combined both modifications so that it showed a hopping, non-head-bobbing bird. The size of the virtual pigeon was approximately the same as the real one on the computer monitor.



The subjects were first trained to peck a transparent key. They could see the computer monitor through the key, on which the images were displayed. Then, they were trained on a reinforcement schedule with a variable interval (VI) with a mean of 5 s, 10 s, and finally 20 s (VI 5 s, VI 10 s, VI 20 s).

Training 1: Real pigeon vs. rats

The subjects were trained to discriminate video of the real pigeon from the video of the rat. Their pecks were recorded but had no consequences on the reinforcement schedule during the first 10 s of each trial. After this, pecking to the pigeon video (S+) was rewarded by a 4-s period of access to a feeder after a variable interval with a mean of 7 s. After feeder access, the monitor was darkened for 5 s by a liquid shutter and then the next trial began. The video of the rat (S−) disappeared without reward if the subject pecked, after a similar 7-s variable interval had elapsed, or after 15 s without a peck. During the dark period, differential reinforcement of zero rate (DRO) was effective, that is, a peck prolonged the dark period for 5 s. S+ and S− were presented in a pseudo-random sequence, constrained so that images of S+ or S− never appeared more than three times in succession. Daily sessions consisted of 40 trials; 20 with S+, and 20 with S−. Both S+ and S− consisted of four different video clips, thus each single clip appeared five times during one session. The number of pecks during the first 10 s of stimulus presentation was used as an index of discrimination. Training continued until 80% of these pecks were made in response to S+ in two successive sessions. Test 1 started on the next day after reaching the criterion.

Test 1

After training 1, the subjects were tested with the video of the pigeon used as the discriminative stimulus (S+), videos of a new, real pigeon, the CG pigeon and the video of the rat (S−). One of four clips of each of S+ and S− category was selected for the test. Other test stimuli were one video clip each. Each stimulus was presented five times, each lasting 10 s and separated by a 5-s blackout period. No reinforcement was available during the test. Test 1 was repeated twice, and the mean of the two tests was used for analysis. Ordinal training was given between the two tests to maintain the discrimination.

Training 2: Real and CG pigeons vs. real rat and background

During this training session, both the video of the real pigeon (two clips) and the animation of the CG pigeon (two clips) were presented as S+ and the video of the rat (two clips) and a still image of the background used for the animation were presented as S−.

The background of the CG pigeon animation was presented as S− because the background was slightly different from that of real animals and we wanted to exclude that pigeons could use this as a cue for discrimination. Each movie clip was presented five times separated by a 5-s blackout period. The background stimulus was presented ten times. Other procedures were identical to training 1.

Test 2

After the subjects accomplished the discrimination in training 2, they were tested with two versions of “unusual” pigeons. In addition to the two kinds of S+ (real pigeon and CG pigeon), we showed the walking CG pigeon without the head bobbing, the hopping CG pigeon, and two kind of S− (rat and background) five times each in random order. One presentation lasted for 10 s followed by a 5-s blackout period. The whole test session was repeated twice and the mean of the two tests was used for analysis.

Training 3: Normal CG pigeon vs. modified CG pigeon

The subjects were trained to discriminate between the normal CG pigeon (S+) and the modified version featuring hopping locomotion without head bobbing (S−). Thus, the S− video was unusual in two aspects; hopping and absence of head bobbing. Both S+ and S− consisted of four different video clips, thus one clip appeared five times during one session. S+ stimuli were the stimuli used in training 1. Otherwise, the training procedure was identical to the previous discriminative training.

Test 3

In addition to the S+ (normal CG pigeon) and the S− (unusual CG pigeon) video, a walking CG pigeon without head bobbing, a hopping CG pigeon with head bobbing, and backward playbacks of S+ and S− were presented five times in random order. Other than that, testing was the same as in the previous tests.

Test 4: Masking

After test 3, the subjects received two sessions of discriminative re-training (training 3) before they accomplished two kinds of masking test. In the lower occlusion test, the lower half of the monitor was covered by a grey cardboard, while in the upper occlusion test, the upper half was covered. Reinforcement was available during the masking tests, so the procedure was the same as in the discriminative training, except for the occlusion. The subjects received two sessions of usual discriminative training between the two masking tests.


Figure 1 presents individual learning curves during training 1. The fastest bird accomplished the criterion in nine sessions and the slowest in 36 sessions. As shown in the figure, one feature of the learning curves is the sudden increase in discrimination ratio just before reaching the criterion. Three of the four birds showed such rapid increase in discrimination ratio.
Fig. 1

Individual learning curves in training 1 (pigeon vs. rat)

Results of test 1 are presented in Fig. 2. The birds showed clear generalization from the movie of real pigeons to new movies of real pigeons. Repeated measures ANOVA revealed a significant effect of stimuli (F(3/15)=6.25, p<0.01). There is no statistically significant difference in relative response between them (paired t-test, t(3)=1.30, p=0.28). The subjects also showed generalization to the CG pigeon. The difference in relative response between the trained images and the CG images was not statistically significant (t(3)=1.07, P=0.36). These results show that the pigeons classify the CG pigeon as a real pigeon after the discriminative training between real pigeons and real rats.
Fig. 2

Results of test 1. The pigeons showed generalization to the virtual pigeon. The four small panels show individual results and the bottom panel shows the mean. New and CG mean novel images of real pigeon and CG pigeon, respectively

The subjects easily accomplished discrimination in training 2. The fastest birds required five sessions and the slowest nine sessions to reach the criterion. Figure 3 shows the test results of test 2. The birds treated the modified CG pigeon movies similar to the S+ stimuli. Repeated measures ANOVA revealed a significant effect of stimuli (F(5/23)=49.12, p<0.001). There was no statistical significant difference in relative response between the normal CG pigeon and hopping CG pigeons (paired t-test, t(3)=2.29, P=0.10), nor between the normal CG pigeons and walking CG pigeon without head bobbing (t(3)=2.07, P=0.13). Thus, the pigeons did not show decreased response to unusual or impossible images.
Fig. 3

Results of test 2. The four small panels show individual results and the bottom panel shows the mean. The pigeons responded to unusual CG pigeons. –Bob and Hopp indicate CG pigeon walking without head bobbing and hopping CG pigeon, respectively

Figure 4 presents the individual learning curves of training 3. The mean number of sessions to reach the criterion was 53.5 sessions. The slowest bird needed 70 sessions. The subjects showed a gradual improvement of performance in comparison with learning in training 1. Yet, all birds were able to learn the task eventually.
Fig. 4

Individual learning curves in training 3

Results of test 3 are shown in Fig. 5. The birds responded often to the images of hopping pigeons but less to the images of the walking pigeon without head bobbing. Repeated measures ANOVA revealed a significant effect of stimuli (F(5/23)=23.4, p<0.001). There was a statistically significant difference between the S+ and walking without head bobbing pigeon (paired t-test, t(3)=3.11, p=0.05) but not between S+ and hopping pigeons (t(3)=0.77, p=0.50). There was also a significant difference between the hopping and walking without head bobbing pigeons (t(3)=4.07, p=0.03). Therefore, the subjects attended to head movement but not to leg movement. The subjects responded less to reversed moving S+. Relative response to the backward S+ differed significantly from that to S+ (t(3)=3.07, p=0.05). The pigeons responded to the backward S+ similar to S−.
Fig. 5

Results of test 3. The pigeons responded to a hopping CG pigeon but not to a pigeon walking without head bobbing. The four small panels show individual results and the bottom panel shows the mean. W/B; walking with head bobbing, H/B; hopping with head bobbing, W/-Bob; walking without head bobbing, RevS+; reversed display of S+, RevS−; reversed display of S−, H/-B; hopping without head bobbing

Figure 6 presents results of the masking tests (test 4). Generally, the masking impaired discriminative behavior, however, the subjects still showed reasonable discrimination performance when the upper half of the movie was visible, whereas performance dropped down to chance level when the lower half was occluded. There was a statistically significant difference in the discrimination ratio between the upper occlusion test and the pre-test session (paired t-test, t(3)=15.3, P=0.0006), whereas no significant difference between the lower occlusion test and the pre-test session (t(3)=1.02, P=0.38). Thus, the upper part of the training images controlled the discriminative behavior of the subjects. The results are in accordance with those of test 3 that demonstrated the significance of head bobbing.
Fig. 6

Results of the masking test. The pigeons maintained good discrimination performance when the lower part of the stimuli was masked (left) but not when the upper part was masked (right). Error bars indicate standard error of the mean


The present results demonstrate that pigeons perceive the animation of our CG pigeon to be equivalent to a video of a real pigeon. There is a possibility that this identification is based on meaningless stimulus features that are simply associated with reinforcement without recognizing the contents of the display. However, Troje et al. (unpublished data) observed spontaneous courtship behavior of male pigeons to the virtual pigeon used in the present experiment. Hence, the present results suggest that the virtual pigeon seems to be perceived as a real pigeon for the observing pigeons.

The impossible images of pigeons, namely hopping or walking without head bobbing, did not disturb the discrimination. Thus, these features of species-specific movement were not a cue for the species discrimination tested in this experiment. Training 1 and 2 involved the coarse discrimination between a pigeon and a rat. Since there are plenty of morphological differences between pigeons and rats, the subjects could use one of these differences for their discrimination. However, difficulty in learning of discrimination 3 suggests that unusual movement was not a very salient feature. If unusual movement was easily detected, the discrimination between the usual and unusual sequences should be easy to learn. However, there are observable differences in the saliency of the manipulated features. As shown in test 3, the absence of head-bobbing seems to be more disturbing than the unusual movement of the feet while hopping. The observation supports the one from test 4 that head and upper body are more important than feet and lower body.

Using video-recorded images of quails injected with psychoactive drugs, quails easily discriminated normal quail and abnormal quail (Yamazaki et al. 2004). Hopping or walking without head bobbing is a rather local abnormality of motion, while hyper or hypo-active motion induced by the drugs in quails is more general motion. Therefore, abnormality induced by the drugs might be move easily detected. Discrimination of the species-specific motion should be examined more clearly after the discrimination between images of a pigeon and naturally hopping species such as small birds. Low response to the reversed S+ was another interesting result. The S+ and the reversed S− were identical in each frame. The only difference between them is the direction of movement. The subjects responded to the reversed S+ as if it belonged to the S− category, thus the results suggest that the subjects learned to categorize usual vs. unusual locomotion.

Results of tests 3 and 4 suggest that the pigeons attended to the head part of the conspecific, a result that confirms earlier findings by Shimizu (1998). From this experiment it is not clear whether the subjects really attended to the head or simply attended to the upper half of the monitor. In a previous study, however, we used a similar apparatus to test pigeon's ability to discriminate between paintings and found no difference between upper and lower screen occlusions (Watanabe 2001). Hence, attending to the upper part of the display is not attributable to the apparatus but suggests the importance of the head for conspecific recognition.


This research was supported by the 21st Century Center Of Excellence Program (D-1) and the Volkswagen Foundation. We are also grateful to Alias|Wavefront who supplied us with a research donation of their software Maya. Treatment of the animals used in testing was in accordance with the Guidelines of Animal Experiments (Keio University).

Copyright information

© Springer-Verlag 2006