Subjects and housing
Nine great green macaws (Ara ambiguus) and eight blue-throated macaws (Ara glaucogularis) were tested. Their age and sex are shown in Table 1. All the birds were hand-raised and group-reared by the Loro Parque Foundation in Tenerife, Spain, and all were housed in the Comparative Cognition Research Station, within the Loro Parque zoo in Puerto de la Cruz, Tenerife. The birds were housed in groups of two to six individuals, according to species and age, in seven aviaries. Six of these aviaries were 1.8 × 6 × 3 m (width × length x height) and one was 2 × 6 × 3 m. Windows of 1 × 1 m could be opened between the aviaries to connect them together. One half of each aviary was outside, so the birds followed the natural heat and light schedule of Tenerife. The half inside the research station was lit with Arcadia Zoo Bars (Arcadia 54W freshwater Pro and Arcadia 54W D3 Reptile Lamp) that followed the natural light schedule.
A large variety of fresh fruit and vegetables were provided first thing in the morning and again in the afternoon. In the afternoon, each individual received a portion of Versele Laga Ara seed-mix that was modified based on their daily weight. The parrots were never starved, and portions of nuts and seeds were controlled to prevent overeating and obesity. Testing took place at least an hour after the parrots had breakfast to ensure there was sufficient motivation to get food rewards. Unless otherwise specified, walnut halves were used as rewards. These were highly prized by all subjects, and they were able to obtain them on a daily basis through voluntary testing.
Experiments took place in separate testing rooms away from the aviaries. All birds had been previously trained to enter them. These rooms were 1.5 × 1.5 × 1.5 m and also lit with Arcadia Zoo Bars to cover the birds’ full visual range. One wall of each one of the experimental rooms had a 50 × 25 cm window through which an experimenter could place apparatuses into the testing room from a neighbouring chamber. This window could be occluded with a white board so that the experimenter could hide anything they were doing from the subjects’ view, such as re-setting apparatuses between trials. The experimenter always wore mirrored sunglasses during experiments to prevent their gaze being a cue for the parrots. A second wall was made of sound-proofed one-way glass so that zoo visitors could observe experiments without disturbing the subjects.
One individual, Hannibal, was tested in a separate cage connected to his home aviary as he was not comfortable going to the experimental rooms. Each aviary had a separate, removable cage attached to it, which was used to transport subjects between their home aviaries and the experimental rooms. However, the cages were also large enough for testing (1 × 1 × 1 m). They had sliding doors that could separate an individual from the rest of their aviary, and were also on wheels, which meant they could be moved to an area occluded from the other individuals in their aviaries. Finally, they also had a second door through which testing apparatuses could be placed into the cage with a test subject. In this way, Hannibal was able to complete the test in a similar manner to the other subjects.
This task required subjects to push a reward out of a horizontal transparent tube, which was 30 cm long with a diameter of 6 cm (Fig. 1a and b). The tube was fixed in an elevated horizontal position at a height of 10 cm by a metal stand at either end (see Supplementary Fig. 7 (OSM) for picture). The stands were attached to a solid wooden board (45 × 30 × 2.5 cm) and the central tube was wrapped in wire mesh (1 × 2 cm) to show the solidity of the transparent acrylic. The reward was placed in the middle of the horizontal tube, thus the only way to obtain the reward was by pushing it to one of the two edges with the aid of a tool. There was an additional 5-cm tube (vertically positioned) of the same diameter attached to the centre of the top-side of the main tube (see Supplementary Fig. 7 (OSM)). This additional tube was non-functional as no tool could interact with the reward from here. Its purpose was to provide a place for the subjects to put the stone tools onto the top of the apparatus. This was so that the subjects’ tendency and persistence for placing tools in an opening on top of the apparatus could be assessed, even when it had no effect on the outcome of the task.
For each test, the subjects were provided with a selection of ten stones. All were natural volcanic rocks taken from the beach in Puerto de la Cruz, Tenerife (they were returned to the beach after testing was finished). They all fit into the tube, had a diameter of 4.5–5.5 cm and weighed between 60 and 100 g. Three stones had to be placed into the tube, one behind the other, before the reward moved. At least four stones had to be inserted to place the reward within reach. This meant that the subject received no perceptual feedback of the reward moving until at least three specifically directed actions were made.
The experiment consisted of three test phases (pre-test, critical test 1 and critical test 2), three different experience phases (pre-inserted stick, short tube 1, pre-inserted stone and short tube 2) and two transfer tasks (parallel tubes and blocked tubes). An overview of the order in which these phases took place is shown in Fig. 2. Not every subject took part in every phase as they either skipped phases due to early success in an experimental phase or did not take part due to failure in an experience phase.
In essence, in the pre-test the subjects were given an opportunity to spontaneously solve the task in order to examine if the experiment could be solved without any mechanistic experience, just by exploration behaviours characteristic of the parrot subjects. The experience phases presented the subjects with progressively more specific direct mechanistic experience of how the experiment could be solved: the first mechanistic experience (pre-inserted stick) provided them with knowledge of the basic mechanism that the food could be pushed from contact with a stick, but without cueing them towards stones in particular. The second mechanistic experience (short tube) allowed the subjects to move the reward by just inserting a single stone, thus reinforcing subjects that showed exploratory stone-inserting behaviour by letting them experience that their behaviour can move the reward and lead to success. The third mechanistic experience further helped subjects that would not insert stones (pre-inserted stone) as it showed that just pushing stones already pre-inserted in the tube would move the reward, hence cueing them both towards stones and the opening of the tube. The critical test phases, after each of these mechanistic experience phases, were implemented to test if the subjects could generalise and extrapolate from these progressively more specific mechanistic experiences to produce the necessary actions to solve the task. Finally, the subjects were offered the parallel and blocked tubes tasks as follow-up tasks to investigate if they could flexibly solve slightly modified tasks, verifying whether they indeed attended to the functional properties of the task or not.
Within the protocol descriptions, the term ‘session’ describes a single period of testing in the experimental chamber. Typically subjects took part in a single session per day. Within each testing session (pre-test, both critical test phases, and the parallel and blocked tubes phases), subjects could be given a maximum of three trials (i.e. chances to solve the task and obtain a reward). Within each session in the mechanistic experience phases (pre-inserted stick, short tube and pre-inserted stone), subjects could be given a maximum of six trials. This was because the successful method in the test sessions took slightly longer and we did not want subjects to lose motivation. If subjects failed a trial, the whole session was stopped and subjects were brought back to their group aviary. A trial was considered as failed if subjects did not solve the task within 10 min of the apparatus being placed into the experimental room with them. If subjects succeeded in a trial, then the apparatus was removed, re-baited and replaced into the subject’s experimental room for another trial (described below), unless it was the last trial of a session, in which case the session ended. All subjects finished testing, from habituation to final test trial, within 2 months. All testing took place between November 2017 and May 2018.
Every subject was first habituated to the apparatus without the presence of the stones. These sessions took place in the experimental rooms, with an experimenter sitting in the neighbouring chamber. This was necessary for re-baiting if further trials were needed in the session and it also served to create a more relaxing environment for the subjects to work in. Once the subject was in the experimental room, the experimenter introduced the apparatus through the window in between them. Subjects had 5 min to approach the apparatus and take a piece of walnut placed at the base of the apparatus. If they took it, the experimenter waited for 30 s then removed the apparatus. Next, the experimenter, out of sight of the subject, re-baited the apparatus and reintroduced it 1 min after removing it. The subjects had up to six of these trials per session and if they took six walnut pieces in a row within a single session, they moved onto testing.
The subjects had already been habituated with the stone tools in a previous experiment and all were eager to interact with them, so they did not need habituating to them again. We state that the parrots were ‘eager’ to interact with the stones because they would frequently pick up the stones and just manipulate them between their feet and beaks with no apparent purpose. This was typical for these macaws for any object that was a similar size and shape to these stones.
Pre-test and critical test procedures
The pre-test and both critical test procedures were identical. The difference was that subjects had been given different mechanistic experience phases after the pre-test and between critical tests 1 and 2, which provided functional information about how the task could be solved. The order of these tests and mechanistic experience phases can be seen in Fig. 2.
Subjects were given a minimum of six 10-min sessions for both the pre-test and the two critical tests. Each session had between one to three trials, i.e. they were given more trials in a session if they succeeded until the maximum trial number of three was reached. The apparatus was first prepared out of sight of the subjects. A half walnut reward was placed inside the middle of the tube and ten of the stones were placed on the board beneath it. The whole apparatus was pushed into the subject’s chamber, in the centre of the testing area; thus the subjects were able to access it from all sides. The subjects were given a maximum of 10 min to interact with the apparatus. If they failed to retrieve the reward in this time, then the trial and the session ended, the apparatus was removed, and the subject was returned to its social group. If, on the other hand, the subject managed to successfully retrieve the reward by pushing multiple stones into the side of the apparatus, then the subject was given further trials within that testing session. To do this, the experimenter waited for 30 s after the subject had retrieved the reward and then removed the apparatus. It was re-baited with another half piece of walnut out of sight of the subject, ensuring that there were ten stones available underneath the apparatus. The apparatus was then placed back into the room with the subject for another trial exactly 1 min after it was removed from the subject’s chamber. Subjects received a maximum of three trials in a session. They needed to complete 12 successful trials to be considered consistently successful within this experimental phase (although Captain and Enya, see Table 1, progressed from the second critical test phase with 11 successful trials due to an experimenter’s error). This repetition was to validate that subjects had not accidentally succeeded in a task but could replicate the method they used to get the reward. Additionally, if subjects had six failed trials in a condition, they moved onto the next mechanistic experience phase (or stopped testing, depending on which condition they had ‘failed’; see Fig. 2).
Sessions were only counted as valid if the subjects approached and touched the apparatus. Thus unsuccessful trials were ones in which subjects had touched the stone tools or the apparatus but had not been able to retrieve the reward. Subjects that were able to succeed in the pre-test or either of the critical test phases moved onto the follow-up tasks (parallel tubes and blocked tube).
Pre-inserted stick experience
In the first mechanistic experience phase, called pre-inserted stick experience, a stick tool was used to show how the reward could be pushed out of the tube (Fig. 1c and d). In the first stage of this phase, the stick tool was attached to the tube so that it could only be pushed into the tube, i.e. it could not be pulled out. We assumed the parrots would be likely to pull the stick tool in their initial exploratory interactions with it, so attaching it in this way made sure they could only experience the ‘rewarding’ outcome of pushing the stick first. The experimenter placed a half walnut reward in the centre of the apparatus with the stick tool inserted flush against it. The subjects were given multiple 10-min sessions to push the stick through explorative behaviour alone. If they succeeded within a session, the experimenter gave the subjects more trials. They removed the apparatus, re-baited it out of sight, ensured the stick tool was correctly positioned and gave it back to the subject. If subjects successfully obtained the reward six times within a session, they moved onto the next stage of this mechanistic experience phase.
In this following stage, the procedure and setup were identical, except the stick was no longer attached to the apparatus, so it was possible for the birds to pull the stick out of the apparatus. We did this to ensure that the subjects recognised that the stick specifically needed to be pushed to get the reward, and it wasn’t just exploratory interaction with the stick that led them to obtaining the reward. If subjects pulled and thus removed the stick at this stage, the experimenter would remove the apparatus and the stick, reposition the stick inside the apparatus out of sight of the subjects, and then give it back. There was no limit to the number of times the stick could be re-positioned like this (but there was still a 10-min time limit per trial). To pass this final stage of the experience phase, the subjects had to push out the reward 12 times in two consecutive sessions (six trials per session). All subjects completed the first stage of the pre-inserted stick experience in a maximum of six sessions and the second stage in a maximum of three sessions. They then moved onto the first critical test.
Short tube experience
If the subjects failed the first critical test, they were given a second mechanistic experience phase. In this mechanistic experience phase, called the short-tube experience, a modified apparatus was used (Fig. 1e). A 15-cm long tube was mounted on a separate wooden board at the same height as the original apparatus. It did not have the additional vertical tube in the centre. With this shorter tube, the subjects were still unable to reach a reward placed in the middle, but now they only needed to insert one stone to push the reward out of the tube. Some subjects had inserted one or more stones into the apparatus in the initial critical test, but not a sufficient number to move the reward (see Results section). If they repeated this behaviour with a shorter tube it would be enough to move the reward.
The short tube was given with the same procedure as the critical test, i.e. the subjects were given the short-tube apparatus with ten stones and they then had 10 min to solve the task by placing one stone into the tube and pushing the reward out. After successful trials, the apparatus was removed, re-baited, and placed back into the subject’s room. However, as this solution to the short-tube experience phase was potentially very rapid, as the subjects only needed to insert a single stone, subjects were given up to six trials per session to speed up the testing process. To reach the criterion in the short-tube experience, and in turn proceed to critical test 2, they had to succeed 12 times consecutively over two sessions. If subjects failed a trial in a session, i.e. they did not obtain the reward within 10 min, then the trial and session ended, apparatus was removed, and the subjects were taken back to their social group. If subjects failed six trials in the short-tube experience, then they were given another mechanistic experience phase, the pre-inserted stone phase.
If the subjects succeeded in the pre-inserted stone experience, then they still had to succeed in the short-tube experience before they could proceed to critical test 2. However, the subjects only had one opportunity to succeed in the pre-inserted stone experience, i.e. subjects only moved between the short-tube experience to the pre-inserted stone experience once, not continually until they succeeded.
Pre-inserted stone experience
The pre-inserted stone mechanistic experience used the same short tube apparatus (Fig. 1f). However, now one of the stone tools was already placed inside the tube next to the reward so the subjects only needed to push the stone to obtain the reward. The purpose of this was to cue the subjects to the correct positioning of the stone and that interaction with the stone when it was in this position would also make the reward move. If subjects succeeded in this stage, then they were given another chance to succeed in the short-tube experience, which they then had to succeed on in order to proceed to critical test 2.
In this experience phase the experimenter placed one of the stones in the tube, flush against the reward, before giving it to the subject. This setup was done out of sight of the subjects, so they did not receive a cue from the experimenter to specifically insert the stones into the tube, they only saw the apparatus with a stone already inserted. For each trial, the subjects had up to 10 min to obtain the reward. If the reward was obtained, the apparatus was removed, re-baited, and given back to the subject. If subjects could repeat this behaviour six times in one session, they were given another round in the short-tube experience phase, exactly as described above. If they passed criterion in the latter, then they were given critical test 2, but if they failed, their participation in the test ended.
All of the subjects that were given the pre-inserted stone phase succeeded (nine subjects), but only one of these subjects (Rita) managed to succeed in the following short tube 2 experience phase (see Table 1).
Parallel tube and blocked tube transfer tasks
Two follow-up tasks were devised for subjects that managed to reliably succeed in one of the critical test phases. The purpose of these was to inform on how the subjects had achieved their successful methods in the critical test phase.
Specifically, the parallel tubes test (Fig. 1g) was used to see if subjects were inserting stones into the tubes to move the reward or whether they had just learnt that inserting stones into the tubes was rewarding. In this task, two identical versions of the horizontal tube were attached to a single board, parallel to each other. The key difference was that only one of the two tubes was baited with a walnut reward, the testing protocol was otherwise the same as the critical test phases. The subjects were still only provided with ten stones to try and obtain the rewards and they were still given 10 min per trial to try and obtain the reward. If they obtained the reward, the apparatus was removed and re-baited; however, the reward was randomly placed in either of the two different tubes in different trials in a counterbalanced schedule, i.e. rewards were not just swapped between the tubes on different trials but were placed in each of the two tubes an equal number of times over all trials. A more stringent success criterion was used in this phase. Subjects had to obtain the reward 12 times in a row in four sessions (three times per session), i.e. without unsuccessful trials between the successful trials. This was to ensure that random placement of stones would not lead to subjects reaching the success criteria in this phase. If they succeeded, they moved onto the blocked tube task, but if they failed, they stopped testing.
The blocked tube task used the same apparatus as the parallel tubes, but a ‘bung’ attachment was created to block one end of one of the tubes (Fig. 1h). This transfer task was used to see if the subjects could recognise they needed to push the reward to a location that was then accessible to themselves, not that the rewards just needed to be pushed. In this task, both tubes were now baited, but the bung would prevent the subjects from pushing the reward out of the blocked tube. Between every trial the bung was randomly repositioned in one of the four available positions, again in a counterbalanced fashion. The success criterion would have been the same as the parallel tubes task, but no subject succeeded here. This was the last available task in the experiment.
All experiments were recorded on four static CCTV cameras. These covered all angles of the testing room. Two recordings from separate cameras were saved for each experimental session, but more recordings were saved if it was necessary for specific trials, for example if there was partial occlusion of a camera view from the subject standing in the way. Behaviours were scored from the videos using Solomon coder (András Péter, solomon.andraspeter.com).
Firstly, the location of the placement of the stone tools was recorded. For every trial in the ‘pre-test’, ‘critical test’ and ‘critical test 2’ the number of stones that subjects placed in each side of the tube was counted. The apparatus was always placed into the testing rooms in exactly the same position and orientation so the locations we recorded of the stones was ‘absolute’, i.e. the left side of the tubes from one specific camera angle was considered ‘side 1’ and the right side was ‘side 2’. We did not call these ‘left and right’ as occasionally a second camera angle was consulted to confirm stone insertions and the other cameras were pointed from different angles, hence left and right may have made this confusing whilst coding. A stone was classified as inserted into the tube if the subjects placed it inside the tube, released it from their beak, and then it stayed inside the tube without falling out. Sometimes, subjects removed stones from the tubes and then placed them back inside the tube, either on the same or on the other side. If the stone was picked up from inside the tube, removed from within the tube entirely, then placed back inside the tube and released, it counted as another insertion. In this manner, it was possible for more than ten stone insertions to be counted per trial even though there were only ten stones available for each trial. In the parallel tubes and blocked tube transfer tasks, the number of stones inserted on each side of both tubes was counted, i.e. there were four possible places where the stones could be inserted. For both of these tasks, one tube was the ‘correct tube’ and the other was the ‘wrong tube’, as rewards could not be obtained from the ‘wrong tube’.
In all of the test trials, from the pre-test and critical tests to the parallel tubes and blocked tube transfer trials, we also counted the number of times subjects placed stones into the vertical-tube on top of the horizontal tube. In some trials, subjects again removed and re-inserted stones on multiple occasions into these tubes. The same rule as above was followed: if the stone was picked up, removed entirely from tube, then re-inserted and released, it was counted as another stone insertion. Hence, in some trials many more than two ‘stone-in-top’ insertions were counted even though the vertical tube could only hold two stones at once. In the parallel tube and blocked tube trials, there was no discrimination between subjects placing stones in the top of the ‘correct’ or the ‘wrong’ tube.
Ten percent of the videos (28 sessions, 38 trials; these were randomly selected from all of the pre-test, critical tests and parallel/blocked tubes tests) were then coded by a second observer to check inter-observer reliability on the number of stones inserted into both the top and the sides of the tube. There was ‘excellent’ reliability between the two observers (Intraclass correlation = 93.9% consistency, R-package “irr”).
Finally, we also coded subjects’ exploration of the stones and the apparatus. This included the latency until subjects touched the stones and the apparatus in each trial as well as the amount of time the subjects touched the stones and apparatus in each trial. Finally, the amount of time the subjects spent touching the apparatus with the stones (combining the objects together) was also recorded. This data was not used in the final analysis, but a summary of it is provided in Supplementary Table 2 (OSM), and the complete data are with the raw data to be found on figshare (O’Neill & Bayern, 2020).