Kano and Hirata (2015) recently published an article suggesting that great apes anticipate upcoming actions when viewing short videotaped films. Such a finding is groundbreaking, in that it demonstrates how researchers might assess the understanding of future events in nonverbal, nonhuman animals. This goal has proven to be a challenging one, only recently undertaken and with somewhat controversial results (Eacott & Easton, 2012). In Kano and Hirata’s attempt, the authors made use of eyetracking software to record apes’ anticipatory looks. Such software has become increasingly popular and allows researchers, noninvasively and with reasonable precision, to determine a primate’s focus of attention during complex visual tasks. Kano and Hirata’s research was the first in which the technology was applied to provide insight into a nonhuman’s expectations of future events. In their study, an infrared tracker imaged the apes’ eyes as they accessed juice through a nozzle; no head restraints were needed.

While their gaze was tracked, apes were shown brief video clips in which an unexpected, interesting event occurs. In this case, a human dressed as a gorilla suddenly appeared through one of two doors. After 24 h, the apes watched the same video again, and the eyetracking software showed that the apes now spent more time looking at the door through which the “gorilla” might be expected to appear, relative to a distractor door. At first read, it was unclear how the authors’ analysis showed that the effects were due to anticipatory looks rather than simply to a preference for the target location on the second day. I wondered if the preference for the interesting location on the second day might merely reflect a reversal of the preference from the first day. That is, would the apes prefer to look somewhere different from where they looked previously when viewing the same film for the second time? Given this possibility, it would be nice to conduct the same experiment with a control condition in which there was no salient or interesting event, to see if the same switch in preferences would be observed. However, a closer read revealed that the authors’ interpretation of the results was based on the finding that apes looked longer at the door in question only immediately prior to the critical event (i.e., the appearance of the gorilla), but did not show a preference for looking at that particular location throughout the entire video. Thus, the preferential looking appears to have been tied to the timing of the interesting event in question, rather than to an overall increase in interest in the location where a previously interesting event had occurred, or merely to a change in preference from the previous viewing.

In Kano and Hirata’s (2015) Experiment 1, the increase in looking to the target, rather than the distractor, door continued past the anticipatory period on the second day. This continued increase in looking to the target could have arisen because the event had occurred, so there was still something interesting going on. That is, looks might simply have followed interesting events as they unfolded, with a priming effect initially attracting attention to the target door. To rebut the argument that looks were simply focused on where the action was occurring, in Experiment 2 the apes stopped looking at the target tool once it was chosen. In this follow-up study, the authors modified the well-known habituation paradigm commonly used in infant studies, which is perhaps the best “go-to” repository for comparative research ideas (and vice versa). In this experiment, on Day 1, an actor in the movie grabbed one of two distinctive objects with which to attack the “gorilla.” On Day 2, the locations of the two objects were reversed. Apes looked more to the target object even before the human had made any movements toward it, and despite the fact that its location had changed. This additional finding shows that anticipation of events can be tied to specific objects involved in the events, and not simply to the spatial location where the event occurred. Thus, the apes encoded content as well as location.

The authors thus provided evidence of encoding for two pieces of information: content and location. However, to further extend the findings consistent with previous discussions of episodic memory as representing the binding of “what, where, and when” information, the authors might consider manipulating more details of the film, whereby a particular object could exist in a particular location only at a given time period, to determine whether the information was encoded together in a meaningful, coherent representation. Such a demonstration might be more informative with regard to the nature of the underlying representation and the extent to which it is unitary and meaningful.

The authors should be commended for acknowledging that the results of their experiment do not imply the use of explicit memory processes. That is, they found no evidence that apes could report on their own expectations or that they were consciously reflecting on the future. As others have acknowledged, any test of future planning that relies on conscious experience cannot be readily applied to nonhumans, given that conscious experience is not yet verifiable (Eacott & Easton, 2012). Researchers have long recognized the limitations imposed by such constraints, but they have not yet developed experimental paradigms that can distinguish between implicit and explicit memory processes in nonhumans. I know of only two attempts (Basile & Hampton, 2013; Tu, Hampton, & Murray, 2011). This should be a focus of future research efforts. Kano and Hirata (2015) have presented another stepping stone toward this long climb forward.

Furthermore, I think the work would be revealing in terms of the nature of the animals’ underlying memory representation if the authors could indicate how the finding goes beyond previous demonstrations of priming in animals (Brodbeck, 1997). That is, we already know that presenting a stimulus at a prior time can influence the manner in which later stimuli are processed, even if the earlier presentation occurred outside of conscious awareness. To move beyond such demonstrations and provide new evidence of conscious or episodic memory, one would need to know more about the accessible representation of the memory and how it might direct future actions. For instance, do animals encode the meaning of the scene rather than merely a sequence of objects and locations? Change blindness paradigms can begin to speak to this, where animals might focus more on changes to the film that alter the meaning of the scene, as compared to changes that are merely aesthetic. Thus, I think the authors have introduced a technique that can be applied to further elucidate the mysterious workings of the nonverbal nonhuman mind.