We found a wide range of scores in our assessment of the quality of MSR studies in gorillas. Most of the studies with no mark test reported self-directed behaviour, while just over half of the mark test studies reported both self-directed and mark-directed behaviour. Over time, studies—with or without the mark test—have become methodologically more rigorous; however, this has not led to more positive outcomes. We found no link between when studies were conducted and either outcome or total scores (methodology and findings). Our prediction that studies would obtain progressively higher total scores as procedures and behavioural coding methods improved was not supported. However, when looking at the methodological criteria alone, the prediction was supported, as scores for methodological rigour did increase over time. While methodological rigour is clearly important, improvements in methods do not guarantee stronger evidence of self-recognition in gorillas. This lack of association could be taken as evidence that, at the species level, gorillas do not show compelling evidence of MSR. Alternatively, it may reflect wide intra-species variability. Like many studies on various aspects of cognition, most gorilla MSR studies have small sample sizes. Much remains unknown about how other factors, such as rearing, experience and setting, interact with basic individual differences in self-recognition propensity.
Awarding additional points for positive instances of both self-directed and mark-directed responses revealed that studies with no such responses received a low score, even if the method score was high, a trend reflected in the negative but non-significant correlation between methods score and outcome. Looking only at the methods totals (the 15 criteria), it is clear that the reference study (Suarez and Gallup 1981) scores the highest (10 out of 15 points), along with Shillito et al.’s (1999) Experiment 3. As methodologically stronger studies do not appear to yield more evidence of self-recognition in gorillas, procedural details seem unlikely to explain why positive evidence is so modest (de Veer and van den Bos 1999), although some authors have criticized use of a ‘chimpanzee standard’ to investigate MSR across species (Shumaker and Swartz 2002). Here, the argument is that the frequent failure of gorillas to pass the mark test may be due to as yet unidentified limitations of the mark test for revealing self-recognition in this species.
Contrary to the criticism of using a chimpanzee standard to investigate MSR in gorillas, it is important to examine those factors associated with positive responses in gorillas. Mark-directed responses occurred in studies involving visually inaccessible marks, tactile and olfactory controls, subjects of at least 5 years of age, and a clear distinction between responses in front of versus away from the mirror. These are clearly important factors which future studies on mirror self-recognition in gorillas should seek to replicate. Although gorillas often fail to respond to marks on their faces that can only be seen in a mirror, they do show an avid interest in comparable control marks on their wrists (Suarez and Gallup 1981). The results of studies that use dyes, stickers, or lasers, as in the trained monkey studies (Chang et al. 2017), have reduced validity as long as there are possible olfactory, tactile, or irritant cues from the marks. Shumaker and Swartz (2002) claimed to have found evidence of MSR in an individual gorilla who had previously failed (Shillito et al. 1999) using a training paradigm involving the use of stickers and lasers. According to these authors, their training procedures provided the necessary motivation for the gorilla to reveal his true ability. But it is important to bear in mind that trained positive outcomes are not the same as spontaneous ones (Gallup and Suarez 1986). Some other MSR studies with gorillas have included specific experimental manipulations designed to facilitate successful self-recognition, including the use of angled mirrors, but without success (Shillito et al. 1999).
Additional important quality-related features of studies reporting mirror-guided self-directed responses include video-recorded tests, more than one subject, subjects with adequate social rearing, post-marking observations with mirror absent, and mirror exposure in a social versus individual setting. It is noteworthy that three gorillas reported to pass the mark test (Patterson and Cohn 1994; Swartz and Evans 1994) were raised in enculturated, enriched environments with extensive human contact, possibly resulting in a latent capacity for self-recognition being “switched back on” (Povinelli 1993). However, these results must be viewed as tenuous because of the lack of public availability of the relevant video evidence.
Gorilla MSR studies often involve removing subjects from their group for mirror exposure (e.g. Swartz and Evans 1994). This separation may negatively affect both those left behind in the group and the separated individual, particularly if they are immature. The emotional response to the separation, coupled with lack of experience in cognitive studies, may lead to attentional and emotional barriers to optimal performance in the test. Allen and Schwartz (2008) suggested that, as their single gorilla ‘passed the test’ without showing prior mirror-guided or contingent behaviours, these may not be pre-requisites. But contingency testing is open to alternative interpretations; for example, the subject may simply be trying to get the other individual in the mirror to reciprocate and respond normally instead of only mimicking the behaviour of the subject (Gallup and Anderson 2020). However, in Allen and Schwartz’s (2008) report, the timings of multiple sham and test trials, and whether the mirror was present or not are often unclear, and so assigning scores was not always easy. To facilitate future evaluations, we recommend that due attention be paid to details when describing methods and observations. These details should include observing and reporting responses in front of versus away from the mirror, and post-marking observations with mirror absent.
It is also important to acknowledge that applicability of our evaluation criteria has changed over time. For example, fewer early studies included video recordings. But with the modern widespread availability of video, hopefully more researchers will be open to sharing footage in response to reasonable requests. Finally, studies should include not just ratings by “blind” observers, but also reports of inter-rater reliability.
In conclusion, we tried to scrutinize every published paper addressing the question of mirror self-recognition in gorillas, examining methodological details both alone and in combination with reported occurrences of self-directed and mark-directed responses. We hope that researchers might heed the criteria used here, particularly those highlighted in Table 4, to optimize the quality of future studies of the self-recognition abilities of gorillas as well as other species.