Animal Cognition

, Volume 15, Issue 4, pp 711–717

Further evidence for addition and numerical competence by a Grey parrot (Psittacus erithacus)

Authors

    • Department of PsychologyHarvard University
Short Communication

DOI: 10.1007/s10071-012-0470-5

Cite this article as:
Pepperberg, I.M. Anim Cogn (2012) 15: 711. doi:10.1007/s10071-012-0470-5

Abstract

A Grey parrot (Psittacus erithacus), able to quantify sets of eight or fewer items (including heterogeneous subsets), to sum two sequentially presented sets of 0–6 items (up to 6), and to identify and serially order Arabic numerals (1–8), all by using English labels (Pepperberg in J Comp Psychol 108:36–44, 1994; J Comp Psychol 120:1–11, 2006a; J Comp Psychol 120:205–216, 2006b; Pepperberg and Carey submitted), was tested on addition of two Arabic numerals or three sequentially presented collections (e.g., of variously sized jelly beans or nuts). He was, without explicit training and in the absence of the previously viewed addends, asked, “How many total?” and required to answer with a vocal English number label. In a few trials on the Arabic numeral addition, he was also shown variously colored Arabic numerals while the addends were hidden and asked “What color number (is the) total?” Although his death precluded testing on all possible arrays, his accuracy was statistically significant and suggested addition abilities comparable with those of nonhuman primates.

Keywords

Parrot cognitionParrot numerical competenceNonhuman additionAvian cognition

Introduction

Many nonhuman species exhibit various forms of number-related abilities (see citations and review in Beran 2011; Dehaene 2011), but only those trained to represent quantity symbolically with Arabic and/or vocal numerals (notably, apes and a Grey parrot) appear to exactly map such numerals to precise cardinal values of sets. Some apes and the parrot spontaneously transferred to novel arrays, demonstrated comparable levels of competence in comprehension and production, understood ordinality, and exhibited abilities extending well beyond levels possibly explained by non-exact strategies (e.g., subitizing, estimating, analog magnitude, or object file representations: Biro and Matsuzawa 2001; Boysen 1993; Boysen and Berntson 1989; Boysen and Hallberg 2000; Boysen et al. 1993; Matsuzawa 1985; Matsuzawa et al. 1991; Murofushi 1997; Pepperberg 1987, 1994, 2006a, b; Pepperberg and Gordon 2005; note Carey 2009).

Interestingly, only two nonhumans—Boysen’s chimpanzee Sheba (Boysen and Berntson 1989) and the Grey parrot, Alex (Pepperberg 2006a)—could label the quantity of a summed set, an ability once thought to be uniquely human and based on language skills (Spelke and Tsivkin 2001). Sheba could sum arrays of 0–4 food items (to a total of 4) that were placed in two of three possible sites; Alex could sum arrays of 0–6 items of variously shaped and randomly sized food or nonfood items (to a total of 6) that were placed in two sites. In both cases, the nonhumans were asked about the sums when the addends were no longer visible; that is, they had to remember the separate addends, perform the summation, and then either touch (Sheba) or produce vocally (Alex) the label that represented the exact total quantity. Unlike other studies involving addition (or subtraction) of objects (e.g., review in Beran 2011; Rugani et al. 2009), neither Sheba nor Alex were making judgments based on relative quantity between or among sets, but rather were exactly mapping numerals to the cardinal value of the total number of objects present; their results were not subject to issues such as Weber’s Law (i.e., a discrimination between two quantities determined by the ratio of those quantities, not their absolute difference). Sheba, furthermore, transferred, without training, to summing the Arabic numerals themselves, a unique behavior among numerically trained nonhumans, and one demonstrating further knowledge of the representational nature of the numerals.

The present studies were designed to determine whether Alex could demonstrate similar levels of numerical competence. Success would demonstrate that such abilities are not limited to humans and nonhuman primates, but may also be available to other nonhuman, nonprimate, nonmammalian species with training in exact symbolic representation of number.

Two experiments were thus initiated. One study tested whether Alex could duplicate Sheba’s ability to sum Arabic numerals: He was sequentially shown two Arabic numerals and, in their subsequent absence, was asked to vocally produce a label to indicate their sum. In a separate small set of trials, he was shown the same stimuli in the same manner, but was also presented with various Arabic numerals of different colors, and asked for the color of the numeral representing the sum; colors changed on each trial. The second set of trials ensured that Alex could not learn a particular pattern over time (e.g., “if I see X + Y, I say Z”); that is, had he lived longer, this procedure, with its additional step, would have allowed testing the same sums many more times without training him to produce a specific response, unlike tasks given other nonhuman subjects. A second study tested whether Alex could remember and sum sets of objects in three separate locations under the same constraints as had been in place for two locations (Pepperberg 2006a); that is, could he maintain numerical accuracy under what could be an additional memory load, in an experiment that required two updates in memory rather than one? Neither study was completed because of Alex’s premature death; however, preliminary data, consisting of first and rarely second trial results—and thus lacking any possibility of training—were statistically significant and suggest his ability to perform these more difficult tasks.

Methods

Subject

Alex, a 30-year-old male Grey parrot (Psittacus erithacus), had been the subject of cognitive and communicative studies for 29 years, including many involving numerical competence (Pepperberg 1987, 1994, 2006a, b; Pepperberg and Gordon 2005). Testing locations and living conditions when neither testing nor training was in progress were described in Pepperberg and Wilkes (2004). Food and water were available at his vocal request at all times during testing. In this study, he used his previously documented ability to use English speech to referentially label wooden and plastic Arabic numerals, their colors, and sets of quantities up to and including eight (Pepperberg 1994, 2006a, b; Pepperberg and Carey submitted).

Apparatus

Testing involved familiar objects. The tray that formed the substrate for all trials had been used for previous studies on label comprehension (Pepperberg 1990, 1992), object permanence (Pepperberg and Kozak 1986; Pepperberg et al. 1997), optical illusions (Pepperberg et al. 2008), and other number capacities (Pepperberg 1987, 1994, 2006a, b; Pepperberg and Gordon 2005). All trainers also concurrently used the same tray for experiments on spatial concepts, phoneme recognition, and optical illusions (e.g., Pepperberg 2007; Pepperberg and Cavanagh unpublished), so the tray was not a cue for addition. Plastic cups used to cover items to be added were also familiar, having been the subject of queries on color and on the concepts of same versus different and relative size, and used as containers for object permanence. Items to be summed in the first task were two Arabic numerals (of the same color and equal height, either 2 or 3 cm), each placed under a separate cup. The values and labels of these numerals were also familiar from a previous study on ordinality (Pepperberg 2006b). In three trials, Arabic numerals of the same size, representing possible sums, were placed on the tray along with those under the cups, randomly ordered with respect to their value (Fig S1). The color of the Arabic numerals under the cups matched that of the cups and differed from any of the possible choices for a response. In the second task, items were familiar treats (e.g., variously sized jelly beans, candy hearts, pieces of nuts, crackers, or pasta). Such items were used rather than nonfood objects because of their small size and extreme interest to the parrot. All food items both across and within trials were differently sized, so that each trial in the task involved objects of different mass and contour. The same system was used as in Pepperberg (2006a), except that here three cups were on the tray in every trial.

Procedure

In the absence of any prior training, an addition trial began when an experimenter, out of Alex’s line of sight, placed objects, counterbalancing number sets under cups across trials, onto the surface of the tray and covered the items with plastic cups. Depending upon the experiment, items were either sets of objects or wooden or plastic Arabic numerals. When multiple objects were placed under a single cup, each object was spaced less than 1 cm from other nearest items, and generally, the distance was less. In the Arabic numeral task, the experimenter brought the tray up to Alex’s face, lifted the cup on Alex’s left, showed him the numeral under the cup for about 5 s (Pepperberg 2006a), and then replaced the cup over the numeral; the procedure was replicated for the cup on Alex’s right. In the second, three-cup task, the experimenter showed Alex the items under the left, middle, and right cups in that order, always covering the items under a given cup before uncovering the items in the next cup. In all trials, the experimenter then made eye contact with Alex, who was asked, vocally, and without any training, to respond to the question “How many total?” or “What color number (is the) total?” None of the objects that had been under the cups were visible during questioning. Alex had previously shown he could respond vocally to summation queries for which a numerical answer was appropriate, but these had involved two sets of objects (Pepperberg 2006a). The trials that required him to view many Arabic numerals as possible responses and to produce the label for the color of the correct numerical response were novel. For all tasks, to respond correctly, he had to remember the numeral—and the quantity it represented—or the quantity under each cup, perform some combinatorial process, and then produce the label relevant to the total amount. He was not given any time limit in which to respond, but if he did not answer within about 5 s, the question was restated. Given that his time to respond was generally correlated with his current interest in the items being used in the task, rather than the task itself (Pepperberg 1987), latency to respond was not recorded.

If Alex produced the appropriate label, he received praise and the items to which the query referred, or could request an alternative reward. No further presentations of the same material then occurred; that is, there was only a single, “first trial” response. After an incorrect or indistinct response, the examiner removed the tray of objects, turned his/her head, and emphatically said “No!” The tray was then re-presented to Alex, but the materials under the cups were kept hidden, and the question repeated. The procedure penalized both a “win-stay” strategy and noncompliance1; presentation continued until a correct response was made or four attempts occurred; errors were recorded. Four attempts were sometimes required in order to separate factual errors from (1) errors involving noncompliance, when he might toss all items to the ground, ignore us and start preening, request treats he then discarded, etc. (see Pepperberg and Gordon 2005); (2) errors based on perceptual issues—confusion stemming from parrots’ splitting color categories, such as orange, somewhat differently than do humans (sometimes labeling it “rose” or “yellow”; Bowmaker et al. 1994, 1996; Pepperberg 1994, 2006a, 2006b)2; (3) errors in which the numerical response was indeed correct but we had asked for the color label of the Arabic numeral; (4) occasions when Alex might initially repeat part of a query before responding appropriately; such behavior was not discouraged to maintain vocal interaction, but the query had to be repeated until he gave an actual answer. Test sessions, involving only a single numerical array, occurred intermittently between October 2006 and August 2007 for the Arabic numeral tasks (other studies in progress had precedence, Pepperberg and Cavanagh, unpublished), and 2–3 times per week from July to August 2007 for the three-cup task, with breaks for student vacations and semester intersessions. For the Arabic numeral task, the complete set of all possible addends summing from one to eight had been constructed and randomized, but only the first 12 trials requiring a numerical response and the first three trials requiring a color response were administered before Alex’s death in September 2007. For the three-cup task, the complete set of all possible addends summing from one to eight had also been constructed and randomized, but only the first 10 trials were administered before testing ended with Alex’s death. In the three-cup task, the unfortunate preponderance of trials with small quantities and trials with zero addends occurred by chance.

Test questions were presented intermittently either during free periods (when birds were requesting various foods or interactions) or during sessions on current (and thus unrelated) topics (e.g., using Alex to assist in training another parrot on color labels, studies on optical illusions) to avoid expectation cuing: A tester posing a series of similar queries may come to expect a particular answer and unconsciously accept an indistinct (and by our criteria incorrect) response of, for example, “gree” (a mix of “green”/“three”) for “green.” Likewise, the subject could quickly learn to concentrate on a small set of responses, thus simplifying the task. Given that Alex’s responses had to be chosen from his entire repertoire (~100 different vocalizations) and from among numerous possible topics during each session, with each session containing only one number array, chances for such cuing were minimized. Details of test procedures, including descriptions of precautions against inadvertent and other forms of cuing, can be found in Pepperberg (1981, 1990, 1994, 2006a, b) and will not be repeated here. Videotaping a percentage of trials for interobserver reliability, usually begun about one-third of the way through an experiment (our standard procedure, so that the testing has become routine and is not disturbed by filming), was not possible because of Alex’s death. Nevertheless, as is standard procedure, in all trials a second experimenter, blind to the exemplars, had to repeat Alex’s response before it would be classified as correct or incorrect in order to avoid cuing (see Pepperberg 1981, 1990, 1992, 2006a, b).

Scoring

Alex’s test scores were calculated in two ways. Because test procedures required that, if Alex erred, a query could be repeated for various reasons (see above), both first trial and all trial responses were scored. First trial results were the percentage of correct responses on first trials. The overall test score (results for all trials) was obtained by dividing the total number of correct identifications (i.e., the predetermined number of collections) by the total number of presentations required to obtain correct responses. Statistics were performed on first trials only. Note, again, that addends were not shown a second time before a query was repeated; thus, after an error Alex had to remember values of the addends for an even longer time than usual, presumably increasing the difficulty of the task.

Binomial tests were used to determine whether Alex’s results were statistically significant. For the Arabic numeral task requiring a numeral as a response, chance could be based on the number of overall labels relevant to the task, that is, 1/8, as if Alex was randomly guessing among all number labels after hearing “How many total?” A second, more conservative test used a larger value (1/3), as though Alex were choosing to respond with one of the two viewed numerals (the addends) without summing, as well as the possible answer. For the trials in which a color was the answer, chance was 1/6, using the number of choices presented. For the three-cup task, although eight number labels were indeed available for Alex to produce, sums only reached to 6 (by chance, not design; see above); thus a value of 1/6 was used as if Alex was randomly guessing among all relevant number labels after hearing “How many total?” (NB: 1/6 was more conservative.) Here, a second, even more conservative test used a 1/4 value of chance, as though Alex were choosing to respond with one of the three viewed quantities (the addends) without summing, as well as the possible answer. All calculations assumed that Alex would always (p = 1) attend and respond correctly to the “How many… ?” question (i.e., not provide a random label that had no connection to the task at hand); if he were totally random, probability would include the chance of his producing any of the approximately 100 labels in his repertoire.

Results

For the Arabic numeral task requiring a numerical response, Alex’s first trial score was 9/12 correct (75%), p = 0.004 (chance of 1/3; p < 0.001 for chance of 1/8). Trials are reported in Table 1. His all trials score was 12/15 (80%). He had only three trials on queries requiring a color response (Table S1); his first trial score, 2/3 (66%), was too low for statistical significance (p = 0.07), but the small number of trials preclude real statistical power. His all trials score was 3/4 (75%). For the three-cup task, Alex’s first trial score was 8/10 correct, 80%, p < 0.001 (binomial test, chance of either 1/4 or 1/6). For all trials, his score was 10/12 correct or 83.3%. Even if only those trials are considered in which all three cups contained items, Alex’s first trial score was 4/5 correct, p = 0.015 (chance of 1/4; for chance of 1/6, p = 0.003); his all trials score was 5/6 or, again, 83.3%. Trials are reported in Table 2.
Table 1

Results for summation of Arabic numerals

Questiona

Response

Comment

2G + 1G

3

 

3O + 4O

7

 

4P + 2P

6

 

3O + 2O

8

Replied “5” on second try

2Y + 2Y

8

Replied “4” on second try

4P + 4P

8

 

1G + 1G

2

 

3P + 1G

4

 

5R + 3O

7

Replied “8” on second try

2B + 3R

5

 

1G + “0”

1

NB: “0” represented nothing under one cup

5P + 1G

6

 

aLetters represented colors of the numerals: O orange, R red, Y yellow, B blue, P purple, G green

Table 2

Results for summation of three addends

Question

Material

Response

Comment

2 + 2 + 2

Hearts

6

 

2 + 1 + 1

Cracker bits

5

Replied “4” on second try

3 + 0 + 2

Pasta pieces

5

 

1 + 1 + 1

Cracker bits

3

 

1 + 0 + 1

Cracker bits

2

 

2 + 0 + 0

Cracker bits

2

 

0 + 1 + 0

Pasta pieces

1

 

1 + 1 + 2

Jelly beans

4

 

3 + 0 + 0

Jelly beans

2

Replied “3” on second try

1 + 4 + 1

Hearts

6

 

Discussion

Alex demonstrated some competence in summing two Arabic numerals, each representing quantities less than or equal to 5, to a total of 1–8, and three quantities of differently shaped and sized items of quantities less than or equal to 4 to a total of 1–6. In both studies, the numbers of correct trials requiring a response of a vocal numerical value reached statistical significance. Although neither study contained enough trials to test all possible sums and combinations of addends or to repeat most queries, the Arabic numeral study contained at least one trial for each sum from 1 to 8, and the three-cup study contained at least one trial for each sum from 1 to 6. Alex made few errors overall. The lack of replication of the various sums over trials, however, emphasizes the first trial nature of the results and shows that no training could have been involved.

Even the data for the few trials requiring him to respond with the color of the Arabic numeral representing the sum suggest a capacity for exact number representation. Conceivably, his one error, on the first trial, may have represented a misunderstanding of the task: He might have responded to the number of objects rather than their values, given that no training of any sort had preceded questioning on this novel task. Note, however, that he did not persist in this response but was correct when asked a second time and responded appropriately on the next two trials.

Given that Alex died before finishing either experiment, that is, before being subject to complete sets of questions involving all possible types of sums, no pattern could be distinguished in his responses (correct or incorrect) to suggest his use of a particular mechanism. Conceivably, he may have used different mechanisms for the different tasks, or different mechanisms depending upon the numbers of items involved. Learning about the correct responses for a given array, however, can be discounted, given that almost all queries are first trial responses for each array. Also, given that all objects in the three-cup task were always of different shapes and sizes within and between trials, Alex’s responses were not likely to have had a nonnumerical basis (e.g., mass, contour, density, etc.).

Clearly, just as for the chimpanzee Sheba, all of the addends in the three-cup task were within subitizing range (Pepperberg 2006a); thus, he could easily have tracked these without specifically counting. However, he, again like Sheba, still would have needed to remember the values under each of the three cups, for several seconds for each cup, and update his memory after seeing what was under each cup, even if nothing was present. In contrast to Sheba, however, the possible total sum tested reached 6.

Possibly, Alex may have summed sets sequentially, rather than representing three addends separately before summing—that is, summed the first two, kept that sum in memory, then added the third, or, after seeing the first set, “counted up” for the next two (see Boysen and Berntson 1989). Such procedures are not unlike those used by children (Fuson 1988) and argue for somewhat different, but still advanced, number representations. In the first case, Alex would have had to retain numerical representations longer than if he were summing only two sets; that is, add the first two sets, update memory, then add the third and again update memory, all in ~15 s. In the second case, he would have had to understand the number line—that each successive number denoted one more item than the previous number—a fairly complex notion (see Pepperberg and Carey, submitted).

As discussed in Pepperberg (2006a), other mechanisms were possible but unlikely. For example, had Alex used a nonverbal accumulator, he would have had to visually partition and scan individual items in each addendum at a constant rate (an event for each pulse) and not reset his accumulator for the next addenda. The system is inherently inexact because of the variability in scanning sets of static items (i.e., the rate of pulses; see Mix et al. 2002); it produces errors normally distributed around the correct response and shows increasing errors with increasing set size, specifically above three; it would not provide the exact numerosities Alex’s task requires. Similarly, an object file represents number only implicitly (i.e., with no mental symbols for cardinal values per se). Mental models in working memory are created for small sets of individuals, with one symbol for each individual, and although these models also support computations of numerical equivalence and order, and addition and subtraction, based on 1-1 correspondence, the system cannot capture, even implicitly, any number beyond working memory limits (~4; see Carey 2009, Pepperberg 2006a).

In the Arabic numeral task, Alex, like Sheba, had had no training on summing the Arabic numerals, and, like Sheba, spontaneously transferred from summing items to summing symbols. He was significantly above chance on the task that asked him to produce vocally the label for the sum in the absence of any visible numeral. His data on the related task—although extremely limited—which was somewhat more like that of Sheba’s, in that possible responses were available from which to choose, tended toward significance. In contrast to Sheba, however, he had to indicate the label not just for the sum but for the color of the numeral that represented the correct numerical sum (an additional step), and the total summed quantity on which he was tested could reach 8.

Specifically, despite the small number of trials, several aspects of the data support, although cannot prove, Alex’s competence. (1) Alex was not simply using his biggest number as a default response when sums were large. Interestingly, when he erred with two Arabic numerals whose sum was large (trial 9, using the numerals 5 and 3) his error was to state “seven”, not “eight”. (2) He proficiently summed the contents of all the cups, whether the sum involved specific objects or numerals that represented sets of objects. The only times he produced a label for only one of the addends in the three-cup task was when such a response was appropriate, that is, in trials when nothing was under the other cups. In the Arabic numeral task, the only time he made such an error was on the first trial of the color response task, which, as noted above, could have been a different type of error. Nor was he merely avoiding using the label of an addend as a response, as multiple other options were available. (3) If, in the Arabic numeral summation task, the numerals had only approximate meanings, Alex’s errors would likely have exhibited a range close to the correct response. In contrast, such was the case only once; his other two errors appeared to involve some fixation on responding “eight”, even though trials were held weeks apart; thus, his data surpassed what would be expected if he were using the kinds of systems employed by most nonhumans or preverbal infants—for example, analog magnitude systems or object files, which cannot represent any positive integer above 4 exactly (see Carey 2009, for a review). In the three-cup task, his two errors occurred within the subitizing range, where he should have been exact. Interestingly, Shimomura and Kumada (2011) report increased errors for humans within the subitizing range when memory load is increased (as is the case in the three-cup task), but possibly because of encoding errors and not from working spatial memory per se.

Taken together, with previous studies on addition (Pepperberg 2006a), number comprehension (Pepperberg and Gordon 2005), and ordinality (Pepperberg 2006b), these data provide additional evidence that Alex likely understood the cardinal representation of his Arabic numerals, vocal and graphic; that his capacities with respect to addition were, like those of Sheba, spontaneously transferred from object sets to symbolic representations of the sets; and involved sums slightly beyond those of chimpanzees (e.g., Boysen and Berntson 1989). The reported data, despite involving only partial results, support previous studies on the number competence of a Grey parrot, suggesting that such abilities are not limited to humans and nonhuman primates but may also be available to other species that also have training in exact symbolic representation of number.

Footnotes
1

With respect to noncompliance: In earlier studies (e.g., Pepperberg 1992; Pepperberg and Gordon 2005; Pepperberg and Lynn 2000), Alex, caring little for any reward, realized he could quickly finish a test no matter what he said. After learning his trainers would persevere, he realized he had to respond correctly for a test to end.

 
2

Wooden numerals—the initial stimuli when he was taught Arabic numerals–were always standardized with nontoxic paints for which Alex learned to use a specific color label; however, he chewed these numerals as his reward and they could not be replaced. We thus had to use magnetized plastic “refrigerator letters” for much of this study, and color errors occurred with these numerals. Paint that sticks to plastic is toxic, and because Alex is given the numeral to chew as his initial reward, could not be used. Note he did not always err on such items and would answer correctly when asked a second time, suggesting that the color clearly was on a boundary (e.g., if not yellow or red, it had to be orange).

 

Acknowledgments

This study was supported by many donors to The Alex Foundation. I thank Harrison’s Bird Diet and Fowl Play for food and treats, Avian Adventures for Alex’s cage, Carol D’Arezzo for Alex’s perch, and the several students who assisted in testing. This manuscript was written under the support of donors to The Alex Foundation (particularly Anita Keefe, the Michael Haas Foundation, and the Sterner family) and NSF grant BCS-0920878 (to Ken Nakayama). The study procedures comply with the current laws of the country under which they were performed.

Supplementary material

10071_2012_470_MOESM1_ESM.doc (30 kb)
Supplementary material 1 (DOC 28 kb)
10071_2012_470_MOESM2_ESM.doc (26 kb)
Supplementary material 2 (DOC 27 kb)

Copyright information

© Springer-Verlag 2012