Subjects, housing and experimental history
We tested five male and one female adult, captive-born and hand-reared Goffin’s cockatoos. All subjects participated in a Colour test and in Size tests I & II and four males participated in the Shape test (one subject lost motivation and one developed a sudden aversion to inserting paper). Subjects are permanently housed together in a large, enriched aviary with indoor and outdoor area (ca. 200 m2 ground space, up to 6 m high) at the Goffin Lab associated with the University of Veterinary Medicine (Vienna, Austria). All parrots are kept on ad libitum diet (boiled and raw seeds, fresh and dried fruits, boiled vegetables, fresh water) and participate in the experiments on a voluntary basis. As all experiments were appetitive, non-invasive and based exclusively on behavioural tests, they are not classified as animal experiments under the Austrian Animal Experiments Act (§ 2. Federal Law Gazette No. 501/1989). All animals had CITES certificates and were registered at the district’s administrative animal welfare bureau (Bezirkshauptmannschaft St. Pölten Schmiedgasse 4–6, A-3100; St. Pölten, Austria). These housing conditions comply with the Austrian Federal Act on the Protection of Animals (Animal Protection Act—§ 24 Abs. 1Z1 and 2;§25Abs.3—TSchG,BGBl.INr.118/2004Art.2).
Prior to this experiment all birds had participated in a study on social transmission of tool use and manufacture (only Figaro, Dolittle & Kiwi had sculpted stick-tools out of larch-wood, Pipin had used but not made tools; for details see Auersperg et al. 2014), four individuals were tested in a study on tool manufacture (only Figaro and Dolittle made stick-tools out of cardboard; for detailed results see Auersperg et al. 2016) and all subjects had participated in a hook-bending experiment (from the six subjects that were tested in present study only Fini manufactured hook tools and was consistently successful, Figaro was occasionally successful; see Laumer et al. 2017). For detailed information on the individual subjects see SI, section A, table S1.
Apparatus and insertion training
Subjects were habituated to the apparatus. The apparatus consisted of a grey insertion tube (length 7 cm; diameter 6 cm) and a shorter tube used as a feeding tray, in which the experimenter placed the rewards (see Fig. 1). All birds had used cardboard strips as tools before (Auersperg et al. 2016; 2017). Note that in the present context the cockatoos were required to use the cardboard as a token rather than as a foraging tool. The Goffins were first trained to insert strips of white paper (5 cm in length) into the tube to receive a food reward (small piece of cashew). This was done by encouraging them to insert the strips (combination of tapping onto the tube with the fingertip; note that no secondary reinforcers other than food were used in later training phases). If they successfully inserted the strip into the tube in 30 consecutive trials they entered the card-ripping test (see below).
Pretest: card-ripping test
Prior to the actual experiment, subjects were presented with a card-ripping test to examine whether they would spontaneously rip a strip of paper out of a white paper card and insert it. They first received three trials in which they had to insert a strip of white paper (5 cm; same length as in the insertion-training). Then the experimenter (IBL) placed a 10 × 10 cm piece of white paper card on the testing table. If the subjects successfully carved and dropped a strip into the tube, they were given a food reward (small piece of cashew). Subjects received a total of five more trials to move on to the next stage of the experiment (Colour test). If subjects did not manufacture a strip of paper, they received three insertion-trials and the card-ripping test trial was repeated on the subsequent testing day.
Colour learning and colour test
Subjects first learned that only one colour out of two differently coloured paper strips was rewarded (Fig. 1a). The Goffins received a varying amount of trials per session depending on their individual motivation (usually between 10 and 30 trials), in which they had to choose between two differently coloured paper strips (5 cm in length). Subjects were sitting on the back of a chair while the experimenter placed the two items on the testing table. The side of the correct coloured strip was semi-randomly counterbalanced across sessions. During the entire testing period the experimenter wore mirrored sunglasses, avoided any head-movements and was not speaking to the animal. Directly after placing the strips, she signalled the bird to wait by extending her right arm with the palm facing towards the subject (wait-signal). After three seconds she removed her hand, thereby allowing the Goffin to leave its starting position. Only one colour was rewarded. If the Goffin chose the wrong colour, it was immediately placed for a duration of 30 s in a cage directly next to the testing table. Since the Goffin cockatoos are generally highly motivated to participate in the experiments, this timeout, although short in duration, served as a mild and effective treatment to show them their failure. Note that time-out was only used during training and never in any of the test conditions (Colour, Size and Shape test).
As soon as the birds were able to select the strip of the correct colour in at least ten consecutive trials, they received two discrimination test sessions per day, conducted directly after the other. In each session, eight strips of the correct colour and eight strips of the unrewarded colour were placed in a randomized fashion on the table. To eliminate stimulus-enhancement, the experimenter touched the 16 items all at once with spread hands before allowing the bird to leave its starting position. If the Goffin was able to select and drop all eight correct coloured strips in a row in both test sessions it passed the criterion and entered the Colour test, in which it had to manufacture a strip on its own. If not, it was tested again on the subsequent test day.
In the Colour test (see movie S1), in each trial two paper squares in the rewarded and unrewarded colour (10 × 10 cm in size) were placed simultaneously on the table. Only strips made from the correct colour were rewarded. Subjects received a total of two sessions of 12 trials each.
Size learning and size tests I
The Goffins learnt that only strips of card of a specific size (either short or long) were rewarded (Fig. 1b). The paper that was used had a different colour than in the Colour test. We randomly divided the subjects in two groups, with one group being rewarded for inserting short strips, while the other group was rewarded for dropping long strips. Subjects received a varying number of trials per session depending on their individual motivation (usually between 10 and 30 trials), in which they had to choose between a short (2 cm in length) and a long piece of paper (8 cm in length). As in the Colour test, the subject was sitting on the chairback and the experimenter placed the two items on the table. Then the subject was allowed to leave its starting position. The side of the correctly sized strip was semi-randomly balanced across sessions. If the Goffin made the correct decision in the training it got immediately rewarded, if not it was placed in the cage for 30 s.
Before the Size test, the Goffins had to pass five error-free criterion-sessions in a row to be tested. In each of the sessions, eight short and eight long paper strips were placed in a randomized fashion on the testing table. Again, the experimenter touched the 16 items all at once to prevent stimulus-enhancement, before the subject was allowed to leave its starting position. If the cockatoo made a mistake, testing was continued on the subsequent testing day. Subjects received up to three sessions within one test day.
In the Size Test (see movie S1), subjects received 10 test trials per session. Before the first Size test trial and after every second Size test trial, subjects received a reminder phase, in which two short and two long strips were placed on the table. Only the previously learned size was rewarded. After correct insertion of the two reminder strips subjects received two Size test trials, in which the experimenter placed a 10 × 10 cm sized paper card on the table. Goffins were given the opportunity to manufacture a strip of paper from the large square and insert it into the tube. To prevent trial-and-error learning, the Goffins were rewarded at random on 50% of the Size test trials, regardless of the size of the ripped strip that they inserted. The Goffins received a total of two sessions of 10 Size test trials each.
Reversal size learning and size test II
Once birds had completed their first Size test condition (long or short templates rewarded), they were tested again using the other size of template. Exactly the same size learning and Size test procedures were repeated, now rewarding the alternative size (long or short), using a new colour of card to draw the bird’s attention to the new task affordances.
The cockatoos were first trained to select L-shaped paper-objects (see SI, Figure S3; length of each side 4.3 cm, widths 1.4 cm) over straight ones (6 cm in length, widths 1.4 cm). They were rewarded for correct choices and received a time-out for negative choices only during the training. Subjects had to pass two out of three criterion-sessions error-free in a row (one session consisted of placing 16 items on the testing table: 8 L-shapes, 8 straight strips) to be tested in the Shape test.
In the Shape test (see movie S1), subjects received 10 test trials per session. Before the first Shape test trial and after every second Shape test trial, subjects first received a reminder phase, in which two L-shaped paper object and two straight ones were placed on the table. Only the L-shaped piece was rewarded. After correct insertion of the reminder strips, subjects received two Shape test trials, in which the experimenter placed a 10 × 10 cm sized paper card on the table.
Due to their manufacturing technique (each strip was carved by a large number of bite marks alongside the edge of the paper), producing a L-shaped object could only be achieved by carving the L-shape around the corner of the paper square. Subjects received a total of four sessions of 10 test trials each and were rewarded in only 50% of test trials, regardless of the shape of the carved strip.
Additionally, we investigated whether subjects could apply the previously learned shape-concept to another material (wire) and spontaneously form a L-shape out of a straight wire (without any additional wire-bending training; note that all subjects had participated in a hook-bending/unbending experiment using the same material; see Laumer et al. 2017). Subjects that previously were successful in selecting L-shaped paper objects, were now confronted with straight and L-shaped templates made out of wire. In the test the birds received a straight piece of wire (length 10 cm) and were tested in the exact same fashion as in the previous Shape test.
All trials were videotaped (JVC Camcorder) and coded in situ. All pieces of inserted and discarded pieces were collected and stored for measurement. In contrast to the New Caledonian crows, the Goffin cockatoos did not rip the paper pieces, but each strip was carved by a large number of bite marks alongside the edge of the paper until the bird cut in a curve after reaching a certain length. Therefore, the length of each strip was measured (Colour and Size test).
In the Shape test the widths of each bite mark along the ripped piece of the Shape test (40 pieces) and Size test (both Size tests combined = 40 pieces) were measured and compared (curved out ends under 1 cm were excluded from the analysis).
Analysis colour test
To evaluate choices in the Colour test we conducted binomial tests for individuals and used a Generalized Linear Mixed Model (GLMM; Baayen 2008) with binomial error structure and logit link function (McCullagh and Nelder 1989) to evaluate group performance. We included a random intercept of subject.
Analysis size test
To test whether the length of the carved-out pieces differed with the presented template size in the Size test we used a GLMM and controlled for group (short or long template first), session (1–4) and trial per condition (1–20). As length constitutes a continuous variable with a maximum of 10 cm (length of paper block) we used a beta distribution error structure and logit link function (Cribari-Neto and Zeileis 2010). Prior to fitting the model, we transformed length to be bound between 0 and 1 as recommended by Smithson and Verkuilen (2006). Additionally, we z-transformed trial and session to a mean of zero and a standard deviation of one to achieve easier interpretable estimates (Schielzeth 2010). We included a random intercept of subject to avoid pseudo-replication. Furthermore, the model entailed random slopes (Schielzeth and Forstmeier 2009; Barr et al. 2013) for session, trial and template size (manually dummy coded and centered) within subject. To account for daily differences of length produced per subject we included a random intercept combining subject and session (Sub.Sess) with a random slope of trial. As Jelbert et al. (2018) found an interaction of template size and number of trials we included this interaction term in the model. However, it did not have a significant effect and we therefore then excluded the interaction to obtain estimates for fixed effect. After fitting the model, we confirmed that there was no issue of collinearity (maximum Variance Inflation Factor: 1.176; assessed for model lacking the random intercept and slopes). Model stability was assessed by comparing estimates obtained from the model based on all data with estimates obtained from models in which the levels of random effects were excluded one at a time (Nieuwenhuis et al. 2012; function kindly provided by Roger Mundry). All estimates proved to be fairly stable (see Table S2 for model estimates). We first compared the full model with a null model lacking the main predictors (template size and trial) to avoid ‘cryptic multiple testing’ (Forstmeier and Schielzeth 2011) and then tested all predictors by single deletion, using likelihood ratio tests (Dobson 2002).
We further fitted the same model for each separate subject without the random intercept of subject (as there is only one subject per model) and without session (because of collinearity issues). Although we did not find a significant interaction of template size and trial in our model, to compare our results with the previous study on New Caledonian crows (Jelbert et al. 2018) we additionally fitted separate models for each template size. We included trial and group as fixed effect, random intercepts for subject and for subject and session combined (Sub.Sess) and random slopes of trial within both.
Analysis shape test
To test whether the shapes of the strips differed between conditions (short or long) we first determined maximum width of each strip and then compared them between conditions (Size test and Shape test) using a general linear model (i.e., assuming normally distributed and homogeneous residuals). To control for the possibility that individuals differed in the general widths of strips or that they responded differently to the two conditions, we further included individual and its interaction with condition into the model. Finally, we also included session and trial number to control for potential learning or fatigue. As an overall test of the effect of condition (i.e., as a full null mode comparison; Forstmeier and Schielzeth 2011), we compared this full model with a null model lacking condition and its interaction with individual.
To check for normality and homogeneity of the residuals we inspected a qq-plot of the residuals and residuals plotted against fitted values (Quinn and Keough 2002; Field 2005), which did not reveal strong deviations from these assumptions. We estimated model stability by means of DFBeta (Field 2005), which revealed the model to be stable. Collinearity was no issue (maximum Variance Inflation Factor: 1.333; assessed for model lacking the interaction.
To test whether the shapes the individuals produced resembled the templates we correlated their widths with the relative width of the template. We used a relative measure as subjects might have produced a similar shape but smaller in size. To do this we first measured the length of each piece and calculated the width relative to it (according to the original template). We then aligned the length of the template with the length of each piece produced (after aligning their bases). Note that when carving the strips the cockatoos placed a large number of bite marks alongside the edge of the paper, sometimes producing variations in widths. Since a potential bump in the width of the shapes the individuals produced could be on either end of them, we correlated them with template once in each of its two possible orientation and chose the larger of the resulting correlation coefficients (Fig. 2). We used Spearman's rank correlation coefficient as an estimator of the degree of similarity between the template and the shape the animals produced. We used the L-shaped template for shapes produced in the shape test as well as those produced in the size test condition. If the Goffins indeed matched the shapes they produced to the L-shape provided in the shape test condition, we would expect the correlation coefficients to be higher in this condition as compared to the size test condition.
All statistical tests were conducted in R (R Core Team 2018) (versions 3.6.1 and 3.5.1; RStudio (RStudio Team 2016) version 1.1.453). We used the packages ‘car’ (Fox and Weisberg 2019) (version 3.0-3) to assess collinearity (function vif) and ‘glmmTMB’ (Brooks et al. 2017) (version 0.2.3) to fit the mixed model (function glmmTMB). Further we used the function lm to fit the general linear model and lmer to fit the binominal model. Plots were drawn with ‘ggplot2’ (Wickham 2016) (version 3.2.1.) and base R.