Introduction

Visual perspective taking in nonhuman animals has attracted much research attention in recent decades. Investigations have focused mainly on what Flavell (1978) would describe as Level 1 perspective taking: understanding that others, like oneself, visually perceive objects and events in the world around them, and recognizing the conditions under which they are or are not able to do so. Level 1 perspective taking includes, for instance, the knowledge that others cannot see an object occluded by another opaque object. Researchers have investigated a variety of abilities in attempting to discover the precise nature and extent of visual perspective taking in species spanning several taxa.

At the most basic level, a wealth of studies report gaze following in animals as diverse as dogs (Miklósi et al. 1998), goats (Kaminski et al. 2005), dolphins (Pack and Herman 2004; Tschudin et al. 2001), ravens (Bugnyar et al. 2004), and nonhuman primates including all four great apes and a variety of New and Old world primates (see Tomasello et al. 1998 and reviews in Emery 2000; Gómez 2005; Johnson and Karin-D’Arcy 2006). Within nonhuman primates, gaze following has been most extensively studied in chimpanzees, with results clearly indicating that it is not a mere automatic co-orienting response. Chimpanzees follow gaze past distracter objects to locations outside of their own visual field, often check back with the gazer when they find nothing interesting at the target location, and discontinue following the gaze of an individual who repeatedly gazes at nothing (Bräuer et al. 2005; Call et al. 1998; Itakura 1996; Povinelli and Eddy 1996, 1997; Tomasello et al. 1999; Tomasello et al. 2001). These behaviors suggest that chimpanzees expect another’s gaze to have some specific external referent, which they are able to accurately identify by following the gazer’s imaginary line of sight to its ultimate target destination.

Studies involving visual occluders provide ample evidence that chimpanzees recognize that opaque objects impede gaze, an appreciation human children develop within the first year-and-a-half of life (Butler et al. 2000; Moll and Tomasello 2004). In a study by Povinelli and Eddy (1996), for example, when an experimenter looked toward an opaque partition, chimpanzees rarely followed her gaze to the back wall of the room where it would have landed if the partition was absent. More often, they moved around to look at the other side of the partition, in apparent recognition that the experimenter’s sight could not project through and beyond it. Tomasello and colleagues (1999) obtained similar results in a task involving four visual barriers, each requiring chimpanzees to move to a different location in an enclosure to see what an experimenter was gazing at. Gaze following around barriers has additionally been observed in other great ape species and ravens (Bräuer et al. 2005; Bugnyar et al. 2004; Schloegl et al. 2007).

Paradigms in which animals must steal or compete for food have also been used to test for an understanding that occluders impede visual perception. Hare et al. (2006) found that chimpanzees more often stole food from a competitive experimenter when they could approach the food from behind an opaque barrier, out of the experimenter’s sight. In another test in which chimpanzees competed with conspecifics, subordinates more often approached food that was hidden from a dominant’s view by an occluder than food that was out in the open and clearly visible (Bräuer et al. 2007; Hare et al. 2000; but see also Karin-D’Arcy and Povinelli 2002). Using this same paradigm, Hare and colleagues (2003) found no support for the hypothesis that subordinate capuchin monkeys recognized what dominants could or could not see. And while common marmosets tested with the same paradigm preferred food hidden from a dominant’s view by an occluder, they showed no understanding that an opaque barrier impeded an experimenter’s gaze (Burkart and Heschl 2007). Finally, while rhesus macaques stole food from a human competitor who held a visual occluder in front of his face more often than one who held the occluder in front of his body (Flombaum and Santos 2005), long-tailed macaques did not avail themselves of an opaque screen to drink undetected from a forbidden juice dispenser (Kummer et al. 1996). Once again, it is interesting that some corvids appear, like apes, to have a good understanding of how occluders block visual access. For example, ravens (Bugnyar and Kotrschal 2002) and scrub-jays (Dally et al. 2005) take advantage of barriers to cache food out of view of potential pilferers.

The studies described above represent just a small sampling of the extensive research to date investigating visual perspective taking in a variety of species. In contrast, very little work has explored animals’ abilities regarding their own visual experiences. In one rare study, Call and Carpenter (2001) manipulated the visual experiences of chimpanzees, orangutans, and 2-year-old children by either allowing or not allowing them to watch as a reward was hidden among an array of opaque tubes. Participants were then given the opportunity to look into the tubes before choosing. All three species responded differentially across conditions by looking into the tubes significantly more often when they had not witnessed the hiding, thereby demonstrating that they discriminated between their own past states of seeing versus not seeing. Call (2005) replicated this finding and extended it to bonobos and gorillas, and Hampton et al. (2004) found the same effect in rhesus macaques. In contrast, capuchin monkeys performing an equivalent task looked into the tubes in a high proportion of trials, even when they could see which tube was baited because the tubes were transparent (Paukner et al. 2006). Dogs also did not discriminate between conditions in which they witnessed or did not witness food being hidden, although their response pattern was to rarely search in either case (Bräuer et al. 2004).

In the current study, we sought to extend what is known about chimpanzees’ abilities regarding their own visual experiences. We used a variation on Call and Carpenter’s (2001) tubes paradigm described above to investigate chimpanzees’ understanding of what they would or would not be able to see from different visual perspectives.Footnote 1 Specifically, we asked if chimpanzees could adopt the correct visual perspective within an enclosure to gain visual access to occluded food. Our primary goal was to determine just how refined is chimpanzees’ appreciation of where one must be in physical relation to occluded objects in order to see them. Our task involved presenting chimpanzees with a collection of open containers, one of which contained a hidden food item. Before making a choice, our subjects had the opportunity to look into the openings in the containers to locate the food. We used a variety of different container types, each of which provided visual access to its interior from a unique viewing angle. Subjects had to recognize which of a number of possible visual perspectives within their enclosure would be the appropriate one in any given trial, depending on the type of container used. Thus, unlike Call and Carpenter’s (2001) task, in which there was only one response subjects needed to make to see the food (crouching downward to look into the tubes in front of them), our task directly tested chimpanzees’ ability to compute visual angles. A secondary goal was to replicate Call and Carpenter’s (2001) finding that chimpanzees discriminated between situations in which they had or had not seen a hiding event. We therefore allowed our chimpanzees to observe the hiding in half the trials but blocked their view in the other half. Based on the results of past research, we expected subjects to look into the containers before choosing more often when they had not witnessed the experimenter hide the food.

Our project differed from earlier perspective taking studies employing visual occluders in two important respects. First, we required chimpanzees to demonstrate their understanding of visual perspective in the absence of any gaze cues. In previous studies involving gaze following around barriers (e.g., Bräuer et al. 2005; Tomasello et al. 1999), subjects could use the imaginary line of sight of the experimenter to help them determine where to move to see the gazed-at object. In contrast, our task demanded that subjects have a more general appreciation of what can or cannot be seen from where, because there were no gaze cues present for them to make use of. Second, in Tomasello et al. (1999) and Bräuer et al. (2005), subjects were judged as having made the correct response if they moved to the correct position at any point within a trial lasting 1 min, regardless of where they had moved before that point. In the current study, we considered subjects to have responded correctly only if the very first response they made in an attempt to see the hidden food was the appropriate one for the particular type of container used. By requiring subjects to adopt the correct visual perspective as their initial response, we could be more confident that they were anticipating, in advance of moving, which perspective would gain them visual access to the occluded food.

Experiment 1: perspective-shifting test

We first gave subjects a tubes task similar to that used by Call and Carpenter (2001), both to introduce them to the paradigm of searching for food in open containers and also to see if we could replicate the basic finding that chimpanzees looked more often in unseen than in seen trials. The perspective-shifting test that followed constituted the main experimental component. In those trials we tested whether chimpanzees anticipated the specific perspectives they needed to adopt to see into various types of containers. Before and after the perspective-shifting test, we also measured baseline levels of responding when food was hidden in completely closed, opaque containers. This allowed us to compare chimpanzees’ frequency of looking in the perspective-shifting test with their frequency of looking when the food was not potentially visible from any viewing angle. If the chimpanzees were making deliberate attempts to see the food when they did not know its location in the perspective-shifting test, we would expect to find differential looking across seen and unseen trials in that test but not in the baseline sessions.

Method

Subjects

Ten chimpanzees housed socially at the Wolfgang Köhler Primate Research Center (WKPRC) in Leipzig, Germany participated: two males and eight females, all captive-born, five hand-reared and five mother-reared, ranging from 5 to 30 years old at the start of the study (mean age = 13 years). All subjects had taken part in a variety of social and physical cognition experiments, with seven participating in a study 6 years earlier that involved searching in tubes for hidden food (Call 2005). Subjects were not food deprived and had access to water ad libitum.

Design

Subjects received, in succession, 12 pretest baseline trials, 18 tubes trials, 36 perspective-shifting trials, and 12 post-test baseline trials. In half the trials subjects witnessed the baiting of the containers (seen condition) and in half their view was blocked (unseen condition).Footnote 2 In the perspective-shifting test, each of the three container types was used the same number of times in each condition. Baiting condition, food location, and container type were randomized with the following constraints: (1) there could not be more than four consecutive seen or unseen trials, (2) the food could not be in the same location for more than three consecutive trials, and (3) in the perspective-shifting test the same container type could not be used for more than two consecutive trials. Subjects received a variable number of trials per session depending on their availability and willingness to participate (mode number of trials per session was 12 for the baseline, 9 for the tubes trials, and 18 for the perspective-shifting test).

Experimental set-up

Testing took place in two different testing rooms, each having a three-sided windowed booth (105 cm wide, 90 cm deep) set in between two adjoining enclosures (Fig. 1). The windows extended to 202 cm above floor level in one testing room and 272 cm in the other. The bottom section of the windows on the left and right sides of the booth comprised a 2.5-cm-thick Plexiglas panel (68 × 49 cm) inside a metal frame. A table (80 × 52 cm) with a sliding section on top (80 × 35 cm) was mounted below one of these panels, allowing the containers to be slid within or out of subjects’ reach. Subjects chose a container by poking a finger through one of three holes (diameter = 3.5 cm) spaced evenly across the bottom of the panel. This location was labeled the home window.

Fig. 1
figure 1

Testing room. Note that the left and right sides of the booth, fanned outward in this diagram to make them visible, are actually at 90° angles to the center window

Stimuli

Baseline containers

Three identical opaque plastic boxes (16 × 11.5 × 4.5 cm) were used for the pre- and post-test baseline trials. The boxes had lids and were completely closed, so they did not allow visual access to the food from any viewing angle. Subjects had become familiar with such boxes in previous object-choice studies.

Tubes

For the tubes task, we used three identical rectangular tubes constructed of 1-cm-thick gray plastic (30 cm long with a 5 × 5 cm opening), similar to those used by Call and Carpenter (2001). The tubes were baited through an opening at the back, and duct tape occluded the top half of the front opening of each tube so that subjects had to clearly lower their head to see inside.

Perspective-shifting test containers

There were three container types (shown in Fig. 2) for the perspective-shifting test: (a) cylindrical containers open only at the top and painted white inside for better visibility, (b) triangular containers open only at the sides, and (c) trapezoidal containers open only at the back. The cylinders were cut from gray PVC piping and the triangles and trapezoids were constructed from 1-cm-thick gray plastic. We had three of each type of container, and in any given trial we always used three identical containers. The same containers were used for seen and unseen trials so that we could directly compare across conditions. Cylinders could only be seen into from above, triangles only from the sides, and trapezoids only from the back.

Fig. 2
figure 2

Perspective-shifting test containers as viewed from the home window and also, for each container type, from the perspective allowing visual access to the food

Procedure

Familiarization

Subjects were first familiarized with the properties of the tubes and the perspective-shifting test containers. The containers were placed in a pile on the floor of an enclosure, and the subject was then let into the enclosure alone for 10–20 min. After they completed the tubes task subjects were again given exclusive access to the perspective-shifting test containers for 10–20 min to refresh their memories. All subjects showed interest in the containers by inspecting, biting, licking, sniffing, carrying, hitting, or throwing them. Interest waned quickly, however, and when the test began subjects no longer appeared to consider the containers to be interesting objects in their own right. The familiarization phase ensured that subjects knew where the openings in the different types of containers were located. Once they had this knowledge, we could then present them with the containers in position on the platform and see if they recognized specifically where they had to position themselves to see into the various openings.

Baseline, tubes, and perspective-shifting test trials

The same basic procedure was used for all trials. The experimenter (E) sat facing the home window and placed three identical containers onto the table, spaced evenly apart. She showed subjects that the containers were empty and then placed a food item into one of them. In seen trials subjects were allowed to watch as E did so, and they thus saw which container she baited. However, in unseen trials, E first blocked subjects’ view by positioning opaque screens at the home and center windows of the booth (the screen at the home window had a lip on top that also obstructed subjects’ view from above). If subjects moved to the opposite window during baiting or tried to see around the screens, E blocked the view of the containers with her body and waited until the subject moved away before proceeding. Thus, in unseen trials, subjects could only know where the food was located by waiting until E was finished the baiting process and then moving to the appropriate location for the containers used in any given trial.

When baiting was complete, E exited the booth and removed the screens in unseen trials (the screen at the center window first and then the screen at the home window, in quick succession). So as not to inhibit subjects’ movements, she sat or stood with her back to the booth for 20 s (60 s in baseline trials). She then re-entered the booth and slid the containers forward so subjects could choose. When subjects chose correctly E gave them the food inside the container. Otherwise, E showed them that the container was empty and then removed the food from the correct container and returned it to a nearby bucket.

Measures and coding

There were three dependent measures: choice of container, first look, and first move. Choice was coded live and looking and moving were coded from videotapes.

First look

Of most interest in the perspective-shifting test was the first look measure, defined as the very first attempt subjects made to look into the containers. Three possible responses were coded:

UP: Subjects moved upward and looked down at the containers from above. Doing so provided visual access into the cylinders but was ineffective for the other containers. Subjects’ head had to reach or exceed the height of the metal frame around the Plexiglas panel (see Fig. 1). An UP look could also occur at the center or opposite window, but only if it was not preceded by a look through the lower portion of that window (i.e., below the level of the metal frame).

SIDE: Subjects clearly examined the containers from the side by moving close to the center window and lowering their head. Alternatively, at the home window, they could lean far and low to one side (i.e., into the bottom left or right corner of the window) and examine the sides of the containers from that position. In this case, their head had to reach the edge of the metal frame and their chin had to reach the bottom of the frame for a SIDE look to be coded.

OPPOSITE: Subjects clearly examined the backs of the containers, either by moving to the opposite window or to the extreme far end of the center window. In the latter case their head had to reach the far edge of the center window frame. Note that to get from the home to the opposite window subjects obviously had to move past the center window. This was only counted as a SIDE response if subjects clearly hesitated or lowered their head to examine the containers before continuing on to the opposite window.

For all three responses, subjects had to come close to the window (within about half a meter) for a look to be scored. Coding began at the moment the food was potentially visible. In the seen trials this was when E had completed the baiting and removed her hand from the opening in the container. In unseen trials, our use of screens to block subjects’ view caused the food to become potentially visible at slightly different times for each type of container. For trapezoids, it was as soon as E moved out of the booth so that she no longer blocked subjects’ view through the opposite window; for triangles, it was when E next removed the screen from the center window; and for cylinders, it was when E finally removed the screen from the home window (recall that this screen had a lip blocking subjects’ view from above). Note that we could not begin coding while all screens were still in place because they obscured our view of the subjects’ faces, making it often impossible to tell if they were attempting to look into the containers. On the other hand, if we waited until all screens were removed we might have missed looks that occurred in the interim (e.g., when subjects ran over to the opposite window to look into the trapezoids after E had exited the booth but had not yet removed the screens from the other windows). Beginning coding as soon as the food was potentially visible was thus a compromise between these two options. For tubes trials, we also coded looks in seen trials during the couple of seconds it took E to perform the baiting. This was because subjects could look into the tubes while E deposited the grape into the opposite end. Looking during baiting in seen trials was not an issue with the other containers because subjects could not see the food until E had removed her hand.

In some cases (19% of trials) subjects had already adopted the UP, SIDE or OPPOSITE perspective before coding began, for example because they had moved there as E was leaving the booth or removing the occluding screens. In these cases, if subjects then clearly examined the containers from that perspective once coding began, this was coded as the first look. In addition to the first look, we also noted whether subjects looked from any other perspective throughout the remainder of the trial.

First move

The looking measure was somewhat conservative because subjects had to not just move to the correct location but also clearly examine the containers. Instances in which subjects quickly glanced at the containers from the incorrect perspective could have therefore been missed. To address this, we also scored where subjects first moved irrespective of whether or not they looked. The same criteria were used as for first looks except that subjects did not have to approach the window nor visually examine the containers. However, for moves that occurred within the confines of the home window (i.e., UP moves that did not extend beyond the top of the metal frame and SIDE moves into the lower left or right corner of the window) the subject had to at least be oriented forward. As with the first look measure, we began coding from the moment the food was potentially visible to the chimpanzee. Because it was possible to see where subjects were moving even with all screens in place, we also did a second coding beginning from the moment E finished the baiting and before she had removed any of the screens.

Because looks were irrelevant for this measure, if subjects had already adopted the UP, SIDE or OPPOSITE perspective before coding began then the next location they moved to was coded as the first move. (Note that this meant that in some trials a look could be coded when a move had not been coded.) A few locations within the enclosures were considered to be neutral, in the sense that subjects could not see the food inside the containers from those locations. For example, subjects occasionally waited out the trial in the doorway between the two enclosures or on a raised bench at the back wall. These locations were counted as equivalent to the home position and were not coded as moves.

Reliability

An independent coder naïve to the hypotheses of the study coded 20% of test and baseline trials for choice of container, looking versus not looking, number of different kinds of look (none, one, or more than one kind), first look, and first move. Kappas were 1.00, 0.80, 0.76, 0.84 and 0.77, respectively (all P’s ≤ 0.001). A second naïve coder judged a third of tubes trials for subjects’ choice and whether or not they looked. Kappas were 1.00 and 0.65,Footnote 3 respectively (both P’s ≤ 0.003). All trials were chosen randomly with the constraint that half be seen and half unseen.

Analysis

All statistical tests were non-parametric and reported P values are two-tailed unless otherwise noted. One-tailed tests were used only when results from previous research allowed strong directional predictions.

Results

Tubes task

The tubes task confirmed that our subjects understood the task of searching in containers for hidden food and were motivated to do so. Looking was in fact at ceiling levels, with subjects looking into the tubes in 100% of unseen trials and an average of 89% of seen trials. This difference, although not large, was statistically significant in a one-tailed exact Wilcoxon test (T + = 15.00, n = 5 [5 ties], P = 0.031).

Perspective-shifting test

As Fig. 3 shows, in the baseline sessions looking was comparable across seen and unseen trials (pretest: T + = 11.00, n = 6 [2 ties], P = 0.88; post-test: T + = 29.00, n = 9 [1 tie], P = 0.49).Footnote 4 In the perspective-shifting test trials, however, subjects looked significantly more often in unseen than in seen trials (T + = 55.00, n = 10 [0 ties], P = 0.002). Looking was effective in locating the food: in unseen trials, subjects chose correctly significantly more often than the chance proportion (0.33) when they looked into the containers (T + = 55.00, n = 10 [0 ties], P = 0.002), whereas their performance was at chance when they did not look (T + = 26.00, n = 8 [0 ties], P = 0.31). In the seen trials, subjects performed above chance whether they looked or not (for both analyses, T + = 55.00, n = 10 [0 ties], P = 0.002). Overall, subjects looked far less often than they had in the tubes trials, likely because the effort required to look into the perspective-shifting test containers was greater.

Fig. 3
figure 3

Mean proportion of trials (±SE) in which subjects attempted to look into the containers in seen versus unseen trials, across the pretest baseline (n = 8), perspective-shifting test (n = 10), and post-test baseline (n = 10) trials. **P < 0.01

First look

Subjects’ performance in the perspective-shifting test trials demonstrated that they anticipated the correct visual perspective for seeing into the different containers. As Fig. 4 shows, subjects’ first look was almost always the correct one for the type of container used. For unseen trials only, exact Friedman tests for each container type were all statistically significant (F r  ≥ 10.41, df = 2, n = 10, P ≤ 0.003 in all cases). Pairwise tests revealed that the correct response for each container type occurred first in a significantly greater proportion of trials than either of the two incorrect responses (Wilcoxon exact tests: T + ≥ 43.00, n ≥ 9 [ties ≤ 1], P ≤ 0.016 in all cases), whereas there were no significant differences in frequency of the two incorrect responses (T + ≤ 3.00, n ≤ 2 [ties ≥ 8], P ≥ 0.50 in all cases). Additionally, when subjects did look, they rarely looked from more than one different perspective. Of the 184 total test trials in which subjects looked, there were only 11 in which they performed more than one type of look.

Fig. 4
figure 4

Mean proportion of trials (±SE) in which subjects’ first look was from the UP, SIDE, or OPPOSITE (OPP) perspective across all three container types, in both the a seen baiting and b unseen baiting conditions. **P < 0.01, ***P < 0.001

For seen trials only, the pattern of responding was generally the same, except that the Friedman test was not significant for the cylinders (cylinders: F r  = 4.57, P = 0.14; triangles: F r  = 16.20, P < 0.001; trapezoids: F r  = 12.00, P = 0.004; df = 2, n = 10 in all cases). For the triangles and trapezoids, the correct perspective was adopted in a significantly greater proportion of trials than either of the two incorrect perspectives (T + ≥ 21.00, n ≥ 6 [ties ≤ 4], P ≤ 0.031 in all cases), and the two incorrect responses were not significantly different (T + ≤ 1.50, n ≤ 2 [ties ≥ 8], P = 1.00 in both cases). In the baseline trials, there were no significant differences in the incidence of different types of response. Collapsing across the pre- and post-test and across the seen and unseen conditions (all of which were not significantly different from one another: all P’s ≥ 0.125), subjects looked from the UP, SIDE and OPPOSITE perspectives in the same average proportion of trials (F r  = 4.34, df = 2, n = 10, P = 0.12).

To investigate whether subjects learned over time to perform the correct response for each type of container, we compared their performance in the first versus the second half of trials in both the seen and unseen conditions. There was no detectable sign of learning, as subjects looked just as often in early trials as in later ones (seen: T + = 31.50; n = 10 [0 ties], P = 0.71; unseen: T + = 23.00, n = 8 [2 ties], P = 0.52). And when subjects did look, it was from the correct perspective just as often in the first half as in the second half of trials (seen: T + = 6.00; n = 3 [5 ties], P = 0.25; unseen: T + = 18.00, n = 6 [4 ties], P = 0.16). Further, chimpanzees did not succeed by simply learning the correct responses during seen trials and then transferring them over to the unseen trials. Of the 30 trials in which subjects experienced a given container for the very first time, 13 were unseen trials. Examining just those trials, we found that chimpanzees visually searched for the food in 8 of them, and in every case their first look was from the correct perspective.

First move

First, we examined the first move chimpanzees made after the food became potentially visible. The results were consistent with the first look measure (see Fig. 5). In unseen trials, subjects’ first move was most often the correct one (F r  ≥ 9.92, df = 2, n = 10, P ≤ 0.005 in all cases). And again, the correct response occurred more often than either of the two incorrect responses (T + ≥ 26.50, n ≥ 7 [ties ≤ 3], P ≤ 0.047 in all cases), while the two incorrect responses occurred with equal frequency (T + ≤ 12.50, n ≤ 5 [ties ≥ 5], P ≥ 0.25 in all cases). Moving in seen trials was less systematic than in unseen trials, as only the Friedman test for the triangles reached significance (cylinders: F r  = 1.13, P = 0.60; triangles: F r  = 11.21, P = 0.002; trapezoids: F r  = 4.79, P = 0.11; df = 2, n = 10 in all cases). In the triangles condition, the SIDE versus UP comparison was significantly different (T + = 36.00, n = 8 [2 ties], P = 0.008) but the other two paired comparisons were not (T + ≤ 21.00, n ≤ 7 [ties ≥ 3], P ≥ 0.125 in both cases). These results make sense if subjects were moving in seen trials not in an attempt to find the food but for other reasons such as boredom or frustration at the long wait. Finally, we considered the possibility that coding from the moment the food became visible in unseen trials could have biased the moving results, because subjects might have simply moved to a window at which a screen was currently being removed. To address this possibility, we reran the analyses using our second coding that began from the moment E finished the baiting, before any screens were removed. The pattern of results did not change (F r  ≥ 9.75, df = 2, n = 10, P ≤ 0.007 in all cases).

Fig. 5
figure 5

Mean proportion of trials (±SE) in which subjects’ first move (independent of looking) was UP, SIDE, or OPPOSITE (OPP) across all three container types, in both the a seen baiting and b unseen baiting conditions. **P < 0.01, ***P < 0.001

Discussion

In experiment 1, subjects searched more often when they had not seen the baiting, but only when there was a real possibility of seeing the food (i.e., when the containers had openings in them). Searching was systematic: subjects’ most common response by far was to adopt the correct viewing perspective for the particular containers used, look into the containers, and then return to the home window to choose a container. The results were robust across different measures (first look or first move) and across different starting points for coding (from the moment the food was potentially visible, or from the moment E finished baiting). In the tubes trials, looking was at ceiling levels and was much higher than in Call and Carpenter (2001). This can be explained by changes we made to their procedure to make it more comparable to our perspective-shifting test. First, our tubes were 50 cm above floor level, making them easier to look into than Call and Carpenter’s (2001) tubes, which were at a height of 35 cm. Second, we had a 20-s delay between baiting and letting subjects choose, whereas Call and Carpenter (2001) had a 5-s delay or no delay at all. Our subjects, therefore, had far more time to look into the tubes than theirs, and they may have done so because they became bored or forgot where the food was, or simply to double-check since no other response was possible at that time. If we exclude any looks that happened beyond the first 5 s, subjects looked, on average, in 60% of seen trials and 72% of unseen trials (T + = 29.50, n = 9 [1 tie], P = 0.22, one-tailed), which is more similar to Call and Carpenter’s 5-s delay condition (2001, experiment 1) in which subjects looked in approximately 52% of seen trials and 87% of unseen trials. In our perspective-shifting test, looking frequency was lower than in both our own and Call and Carpenter’s (2001) tubes test, probably owing to the greater effort required to search in the cylinders, triangles and trapezoids than in the tubes.

Our findings strongly suggest that our subjects anticipated which visual perspectives would be effective in gaining visual access to the hidden food and which would not, depending on the type of container. We considered, however, an alternative explanation. When subjects were seated at the home window, the openings in the cylinders and triangles were visible to them, and the shape of the trapezoids was suggestive of an opening at the back. Subjects could have therefore solved our task by aligning themselves with visible openings when they knew food to be hidden somewhere among an array of containers. In other words, they could have simply responded to a general tendency to look into visible openings. We tested this alternative explanation in experiment 2.

Experiment 2: UP versus OPPOSITE

In this experiment, we examined whether subjects preferred to search in a visible opening for hidden food when there was also a non-visible opening available. Subjects chose between two identical cylinders open on one end. One cylinder was oriented vertically and the other was oriented horizontally with its open end turned away from subjects. If chimpanzees simply searched in visible openings, they should look more often into the vertical cylinder. Otherwise, they should look into both cylinders with equal frequency. Note that if subjects looked into one cylinder and did not find the food, they could infer by exclusion that it must be in the other cylinder, and so looking in either cylinder was an effective response. Call (2004) and Call and Carpenter (2001) found that some of their chimpanzees were capable of making such inferences by exclusion.

Method

Subjects

The same ten chimpanzees as in experiment 1 participated. Five months elapsed between experiments.

Design

Each subject had 48 trials. The reward in the first 32 trials was a grape or a small paper cup of yoghurt. Because of waning motivation, in the last 16 trials we used a more desirable reward: a cup of yoghurt perched atop a banana slice and surrounded by several grapes (the cornucopia trials). Within each block of 16 trials, there were equal numbers of seen and unseen trials, the visible opening was on the right or left an equal number of times, and the food was on the right or left an equal number of times. The various combinations of these variables were presented in random order with the constraint that none of the factors could repeat for more than three consecutive trials. Subjects were given a variable number of trials per session (mode = 8).

Experimental set-up

The set-up was as for experiment 1 except that the opposite window was occluded during baiting and the center window remained occluded throughout the entire trial.

Stimuli

Containers were two identical 7.5-cm diameter cylinders similar to those used in experiment 1 except that they were 16.5 cm in height and had a removable cap on one end. We used slightly taller cylinders than for experiment 1 because we wanted the effort of moving up to look into them from above to be comparable to the effort of moving over to the opposite window to look into them from the other end. A plastic rectangle attached to the side of each cylinder kept it from rolling when horizontal.

Procedure

The procedure was as in experiment 1 with two changes: (1) In seen trials, to minimize the possibility that subjects would simply look into openings where they saw food disappear a moment before, we baited the cylinders through the end opposite the opening. E removed the cap from the bottom of the vertical cylinder or the front of the horizontal cylinder, placed the food inside, and then replaced the cap; and (2) We shortened the trials to 10 s because 20 s had proven to be unnecessarily long in experiment 1 (subjects typically responded within the first few seconds).

Measures and coding

Dependent measures were as for experiment 1, except that first moves were no longer coded. This measure was now unnecessary because subjects had to climb above the level of the metal frame (because the cylinders were taller) or move all the way over to the opposite window (because the center window was occluded) to see into the containers. These UP and OPPOSITE responses, respectively, were almost invariably accompanied by extremely clear attempts to look into the containers, so that first looks and first moves amounted to essentially the same measure.

Reliability

A naïve coder blind to hypotheses and experimental condition coded 20% of randomly chosen seen and unseen trials for choice, looking versus not looking, number of different looks, and first look. Kappas were 1.00, 0.90, 0.89, and 0.90, respectively (all P’s ≤ 0.001).

Results and discussion

As in experiment 1, subjects looked significantly more often in unseen than in seen trials (T + = 45.00, = 9 [1 tie], = 0.004), and they chose correctly significantly more often than the chance probability (0.50) in unseen trials only when they looked (looked: T + = 45.00, = 9 [0 ties], = 0.004; did not look: T + = 35.50, = 10 [0 ties], = 0.45). Overall, looking frequency was lower than in experiment 1, possibly because looking required more effort. Additionally, there were only two containers in this experiment, so subjects had a 50% chance of guessing correctly without looking. The greater effort of looking, combined with the lower cost of not looking, may have caused reduced motivation to search. To see if the reward’s desirability made a difference, we compared looking in the first 32 trials (grape/yoghurt reward) with the last 16 trials (cornucopia reward). Subjects looked in a significantly greater proportion of cornucopia trials than grape/yoghurt trials in the unseen condition (T + = 34.50, n = 8 [2 ties], P = 0.023) and results approached significance in the seen condition (T + = 15.00, n = 5 [5 ties], P = 0.063). Increasing the desirability of the reward thus boosted subjects’ motivation to search, especially when they did not know the reward’s location.

Most important was whether subjects showed evidence of preferring the visible to the non-visible opening. In the seen and unseen conditions, subjects moved OPPOSITE just as often as UP (seen: T + = 6.00, = 4 [6 ties], = 1.00; unseen: T + = 34.00, = 9 [1 tie], = 0.20). Thus, there was no clear evidence that subjects preferred to look into a visible opening when looking into a non-visible one was just as effective (Fig. 6). Our subjects’ success in experiment 1 therefore cannot be explained by a general tendency for chimpanzees to selectively search in visible openings for hidden food. More convincing evidence, however, would be if subjects actually demonstrated a preference for non-visible over visible openings in some situations. We tested this in experiment 3.

Fig. 6
figure 6

Mean proportion of trials (±SE) in which subjects’ first look was from the UP or OPPOSITE (OPP) perspective

Experiment 3: UP versus DOWN

In this experiment, we tested whether subjects would prefer non-visible openings to visible ones when looking into the non-visible ones was the less effortful response. If so, this would provide stronger evidence against the hypothesis that chimpanzees simply look into the first visible opening they see when searching for food. We presented chimpanzees with a situation in which both visible and non-visible openings were present at the same time, as in experiment 2. In this case, however, the non-visible openings required far less effort to look into than the visible ones. To create visible and non-visible openings, we stood cylinders open on both ends on top of a raised transparent platform. Subjects could climb upward to look into the visible openings at the top of the cylinders, or they could crouch downward to look into the non-visible openings at the bottom. Because apparent lessened motivation to search in experiment 2 may have been partly caused by having only two containers, for this experiment we increased the number of containers used in each trial to four.

Method

Subjects

Nine of the same ten chimpanzees participated (one adult female failed the pretest training criterion). Three months elapsed between experiments 2 and 3.

Design

Subjects received a variable number of training sessions as required and then 24 test trials across two sessions (all unseen). At the beginning of each test session subjects were given two refresher trials exactly like the training trials.

Experimental set-up

The set-up was similar to the earlier one, with the following changes: (1) The door between the enclosures was closed; (2) Because there were four containers, we replaced the three-holed Plexiglas panel with a steel mesh panel through which subjects could poke their fingers to choose a container; (3) A third testing room with a different window height (192 cm) was also used; and (4) Instead of placing containers directly onto the table, we placed them onto a transparent, 1/2-cm-thick Plexiglas platform (75 × 30 cm) that stood on top of the table and was 8–10 cm high, depending on the height of the subject.

Stimuli

Training cups

For training we used four white opaque plastic drinking cups, 13 cm tall with an 8-cm diameter opening.

Test containers

We again used 7.5-cm diameter cylinders similar to those in experiments 1 and 2, but with different heights. There were three identical tall cylinders (16.5–23.5 cm high, depending on the height of the window) and three identical short cylinders (11 cm high). The tall cylinders were open on both ends but the short cylinders were only open on the bottom.

Procedure

In experiments 1 and 2, subjects had gained experience climbing upward to look into containers from above (the UP response), but they had never encountered a raised transparent platform that provided visual access from underneath. Thus, we needed to be sure that subjects could also crouch downward to look into containers from below (the DOWN response). Once we were confident they had both responses in their repertoire we could present subjects with the test, in which there were both visible and non-visible openings present at the same time.

Pretraining

Just before training, E showed subjects that the platform was transparent by holding it vertically in front of her face and also holding up a grape and an empty cup behind it (and she repeated this after the first two training trials). E then gave subjects two warm-up trials in which she placed four cups onto the platform, all upright in one trial and all upside-down in the other, with order determined randomly. She put a grape into one of the cups as the subject watched and then slid the table forward so the subject could choose.

Training

E showed subjects that all four cups were empty and positioned a screen to block their view. She positioned the cups all upright or all upside-down in a row, hid a grape in one of them, and removed the screen. E waited until the subject made the appropriate response (UP for upright cups and DOWN for upside-down cups) or for 2 min, whichever came first. She then slid the table forward so the subject could choose.

Training was administered in blocks of four trials. Within each block, the cups were upright in two trials and upside-down in two trials and the food was at each location once, with order determined randomly. The subject had to perform the appropriate response (not necessarily first) in all four trials to proceed to testing; otherwise another block was run. After each block, if subjects had not reached criterion then E provided training tailored to individual needs. For subjects who were not moving downward, she tilted the platform upward until the grape was visible from below. For subjects who were not moving upward, she tilted the cups forward until the grape was visible from above. If subjects continued having trouble E demonstrated the responses. One or two blocks a day were administered until subjects reached criterion. Three subjects did so within two blocks, three within four blocks, and three within five blocks. One subject did not reach criterion within six blocks (she would not move upward) and was therefore dropped from the experiment. In summary, our goal for training was to ensure that subjects could make both types of response, UP and DOWN, so that they could choose between these responses in the test trials.

Test trials

For the test, there were again four cylinders on the transparent platform in each trial. We ran two conditions: (1) the three-tall condition included three tall cylinders plus one short one, and (2) the three-short condition included three short cylinders plus one tall one. The fourth, odd cylinder was always positioned on the far left or right side of the others. It was never baited and subjects knew this, for reasons that will become clear below. The height of the three identical cylinders (tall or short), the position of the fourth cylinder (left or right), and the location of the food (left, right or center) were all counterbalanced and randomly ordered, with the stipulation that none of these variables could repeat for more than three consecutive trials.

We included the fourth, odd cylinder in the array so that we could run the strongest test we could think of for whether chimpanzees prefer to look into visible openings. Namely, if there were three non-visible openings and one visible opening present at the same time and the visible opening had clearly not been baited (i.e., the three-short condition), would subjects nevertheless prefer the visible opening to the non-visible ones? If they did so, this would indicate an undeniable preference for visible openings. Given the results of experiment 2, we expected that, in the three-short condition, subjects would have little trouble ignoring the visible opening in the tall cylinder. We predicted that they would instead choose the only effective response in that condition, which was to move DOWN and look into the cylinders from underneath. The three-tall condition, in which either the UP or DOWN response would be effective in locating the food, was therefore of more interest to us. In the three-tall condition, subjects could choose to either climb UP and look into the cylinders from above or crouch DOWN and look into them from below. While both responses were effective, the UP response required significantly more effort than the DOWN response.

The procedure for all trials was as follows. E positioned the three identical cylinders in a vertical line in the center of the raised platform, one in front of the other, and let subjects watch as she baited one of them. She also placed the fourth cylinder onto the platform, apart from the others so it was clear she did not bait it. Next, E blocked the subject’s view of the three identical cylinders and positioned them in a horizontal row across the platform, with the fourth cylinder at the far left or far right of the row and still visible to the subject. (When the fourth cylinder was a tall one, E also placed a lid on it so that subjects could not look down into it during this time.) Note that although the subjects saw the baiting, they did not see the positioning of the containers and so did not know where the food was located. All trials were thus unseen. When E had finished positioning the containers she removed the screen blocking the subject’s view (and the lid from the fourth, tall cylinder, when relevant), sat for 10 s looking at her stopwatch, and then pushed the containers within reach. E did not leave the window during the delay because she was not obstructing subjects’ view by remaining there, and also because the metal grating made it difficult to see the subjects’ faces on video and it was easier to code the responses live.

Measures and coding

E coded live whether subjects chose the correct container and also whether they performed either of two responses: moving UP beyond the level of the metal frame and looking into the containers from above, or crouching DOWN and looking into them from below.

Reliability

A naïve coder judged 20% of randomly selected test trials for container chosen, first look, and whether the subject looked from more than one perspective. Kappas were 1.00, 0.94, and 0.94, respectively (all P’s ≤ 0.001).

Results and discussion

Looking was at ceiling levels, with every subject looking into the containers in every trial. It was also very effective, as there was only a single trial in which a subject chose incorrectly. Most interestingly, subjects almost always crouched downward to look into the containers from below rather than climbing upward to look into them from above (Fig. 7), and the difference was highly significant (T + = 45.00, n = 9 [0 ties], = 0.004). There were only 11 test trials in 216 in which a subject performed more than one type of response, either moving UP to look after previously moving DOWN (2 trials) or moving DOWN to look after previously moving UP (9 trials).

Fig. 7
figure 7

Mean proportion of trials (±SE) in which subjects’ first look was from the UP or DOWN perspective (all trials unseen). **P < 0.01

It is very likely that subjects preferred the DOWN response because it was the far less effortful of the two responses. Supporting this view is the fact that when subjects did climb upward as their first response, it was almost always (except in two cases) within the first six test trials. Subjects thus quickly learned that the DOWN response was less effortful than the UP response and they came to rely on this response almost exclusively as time went on. Furthermore, there were only 12 trials in which a subject’s first or only response was to climb upward. Of these 12 trials, only two were in the three-short condition, in which looking into the one visible opening (at the top of the fourth, tall cylinder) could not have possibly been effective in locating the food because that container was never baited. Thus, as predicted, we found that subjects did not show a preference for visible openings. Instead, they clearly preferred to look into non-visible openings when this was the easier of the two options. Together with the results of experiment 2, this finding allows us to rule out the possibility that our subjects succeeded in experiment 1 by acting on a general tendency to look into visible openings.

General discussion

Chimpanzees visually searched significantly less often when they had witnessed the hiding of a food reward than when they had not, replicating findings from previous studies (Call 2005; Call and Carpenter 2001; Hampton et al. 2004). More notably, we found that chimpanzees immediately recognized which one of various viewing perspectives they needed to adopt to spy a reward hidden inside a container. These results, together with those from studies in which chimpanzees recognize what other individuals can or cannot see, suggest that chimpanzees’ abilities regarding both themselves and others are grounded in a more general appreciation of visual perspective. It is impossible to know from this study alone whether chimpanzees’ perspective-taking abilities involve explicit reasoning of any kind. However, in the current study, the fact that they often readily left the home window (the only place where food could be physically accessed) with the clear intention of visually locating the food before making a choice suggests a deliberateness to their responses that goes far beyond any reflex foraging reaction.

Let us further elaborate on our findings and their relation to tests of chimpanzees’ understanding of visual perspective in others. In experiment 1, chimpanzees were presented with a situation in which they could move around an enclosure to visually search for food hidden among three identical open containers. The correct perspective for seeing into the containers depended on the location of the openings, which varied across trials. In an overwhelming proportion of cases, subjects’ first response was to go immediately to the appropriate location for gaining visual access to the hidden food. Experiments 2 and 3 ruled out the possibility that our subjects succeeded in experiment 1 because of a general tendency to align themselves with visible openings. In experiment 2 they looked into visible and non-visible openings equally as often when both responses required similar levels of effort, and in experiment 3 they even preferred non-visible openings to visible ones when looking into the non-visible openings was the easier option. In short, our subjects demonstrated that they could anticipate the effect that specific shifts in their own perspective would have on their ability to see the food. Further, it is important to note that the chimpanzees did not move in an attempt to acquire the food. They were quite aware of the fact that the food could only be acquired at the home window, as evidenced by their returning to the home window to choose a container after they had spotted the food.

One common criticism of visual perspective taking research with animals is that subjects could solve the tasks presented to them by simply responding to the observable behavior of others. For example, instead of understanding that a social partner with its back turned cannot see them, chimpanzees might learn from experience that others oriented away from them are typically unresponsive to their gestures (for an extended and ongoing debate on this issue see Heyes 1998; Penn and Povinelli 2007; Povinelli and Vonk 2003, 2004; Tomasello et al. 2003a, b). Our task involved no social partners, so there were no social cues available for subjects to make use of. Furthermore, subjects did not learn (either through previous experience with similar situations or within the context of this study) where they needed to position themselves to see into the different types of containers. From the beginning, they almost invariably adopted the correct perspective for seeing into the particular containers they were confronted with in any given trial, despite the fact that they had never encountered these containers before the start of this study. While it is true the cylinders were somewhat similar to drinking cups, which some of the chimpanzees may have had previous experience with, the triangles and trapezoids were entirely novel to them, as was the transparent platform in experiment 3. Their immediate mastering of the task thus suggests that they were putting their knowledge of the containers’ openings (gained during the familiarization phase) together with their current perceptions of how the containers were positioned in front of them to judge where they needed to move. Although numerous species search for occluded objects and find appropriate viewing (and approach) angles during their search (e.g., Chapuis 1987; Regolin et al. 1995; Zucca et al. 2005), chimpanzees were capable of anticipating the appropriate viewing angles. Future studies could investigate whether anticipating correct viewing angles is as widespread an ability in other species as is simply finding correct viewing angles.

We can also be confident that the Clever Hans effect, whereby animals learn to read subtle, unintentional experimenter-given cues to solve the task, was not at work here. Subjects did not choose the correct container at greater than chance levels in unseen trials when they failed to look before choosing. Furthermore, the experimenter could not have acted differently when subjects had not looked than when they had because (in experiments 1 and 2, at least) she turned her back to the subject after finishing baiting. In most cases, she therefore did not know until viewing the videotapes whether or not the subject had looked into the containers.

We conclude that chimpanzees clearly demonstrated excellent perspective taking abilities with regard to themselves. The precise mechanism by which they were able to solve our task remains an open question. That they did not appear to rely on learned associations between configurations of cues suggests they were exercising a more general understanding of visual perspective. Such an understanding would likely involve a sense of how the relative spatial relations between oneself and other objects change as one moves through space or as the objects move. Positive evidence for such capacities in chimpanzees comes from spatial rotation studies, in which subjects see food placed in one of an array of containers and then must either wait as the array is rotated (Beran and Minahan 2000; Call 2003), or walk around the array 180 degrees (Hoffman and Beran 2006) before choosing. We should note, however, an important difference between the spatial rotation task and our perspective-shifting task. Subjects could arguably solve the former by tracking the correct container throughout the movement; however, in our task subjects did not know which container held the food in the unseen trials and so they could not track the correct container. Instead, their responses suggested that they were exercising a more abstract knowledge of what could be seen from where in repositioning themselves to spy the reward located inside the container. Conceivably, our chimpanzees drew upon a representation of the containers’ features (including the position of the openings) and the relative spatial positions of the containers and themselves to anticipate the proper viewing angle. Although further work is needed to elucidate the precise nature of this mechanism, it would certainly seem to require at least some implicit knowledge of spatial relationships and visual perspective. Future studies could be devoted to charting the ability reported here across various taxa.

It is currently unclear whether chimpanzees might also use a similar mechanism in judging others’ visual access to occluded objects. We can only speculate at this point. Chimpanzees could, for example, project themselves into the perspective of the other and, putting this together with a more general understanding of what can be seen from where, judge when others are or are not able to see something. Such a process would fit with the so-called ‘simulation’ view of theory of mind, which suggests that individuals judge the content of others’ minds and predict their behavior by using themselves as a model [see, for example, edited collections by Carruthers and Smith (1996), Davies and Stone (1995a, b) and also a recent debate among Saxe (2005), Goldman and Sebanz (2005), Gordon (2005), Mitchell (2005)]. Simulation theory, at least in some varieties, predicts that because one’s own perceptual and mental experiences are used to understand others’ experiences, abilities regarding the self should necessarily precede abilities regarding others. On the other hand, if judging one’s own perceptual access is a fundamentally different kind of process altogether from judging others’ (for example, if the latter relies on simply learning to associate configurations of cues with behavior patterns), then we might expect to find particular situations in which chimpanzees show abilities for others but not for the self. Bräuer et al. (2004), for example, found no evidence that dogs recognized when they themselves had or had not seen food being hidden, even though dogs typically perform well in visual perspective taking tasks involving others (Miklósi 2008). Note, however, that we do not mean to conclude on the basis of one study that dogs do not understand anything about their own perceptual states.

So far, the small amount of available evidence for chimpanzees suggests similar capacities with regard to self and others, although many more studies looking at abilities regarding the self are needed. Most useful would be investigations of chimpanzees’ abilities regarding the self and others across comparable tasks. For example, a social version of the task used in the current study could be instructive: would chimpanzees understand that a social partner who looked down onto the trapezoids from above could not see the hidden food, but that one who moved opposite to look into them from the back could see it? If not, this would suggest that their ability to judge visual access in others is limited to situations involving simpler kinds of occluding objects, as in the studies involving barriers described earlier. This would in turn support the view that chimpanzees may indeed be relying on configurations of observable cues to predict others’ behavior without truly understanding what others can or cannot see. Future research involving such direct comparisons of abilities for the self and others across similar tasks will shed more light on just how much chimpanzees do or do not understand about both their own and others’ perceptual and mental experiences.