1 Introduction

Reference is a basic feature of human language. As human beings, we use linguistic items such as words and sentences to refer to things. It is due to our referential capacity that linguistic items have a meaning, representing or symbolizing the things to which we refer. It is thanks to our capacity for reference that we can communicate with others about ourselves and the world.

A much debated question in the scholarship on animal communication and language evolution is whether traces of the human capacity for reference can be found in animals too. Do animals refer to things with their signals as humans do? Or is reference something that is unique to human communication?

Answers to these questions have shifted significantly over the years and still are contentious in the light of recent empirical and theoretical research. Initially, scholars tended to agree on the fact that there is no referentiality in animal signalling. They widely held the so-called ‘motivational view’, according to which animal signals are merely automatic emotional expressions, lacking the symbolic meaning of human words. The discovery of vervet monkeys alarm calls by Seyfarth et al. (1980), however, has represented a crucial breakthrough in the study of reference in language evolution. Seyfarth et al. gather empirical evidence showing that vervet monkeys’ alarm calls do not seem to be mere reflexive externalisations of internal states but seem to refer to different types of predators. This leads Seyfarth et al., (1980, p. 803) to conclude that vervet monkeys’ alarm calls may be regarded as “rudimentary semantic signals”—that is, as evolutionary precursors of human linguistic reference. Since the publication of this study, there has been a lively debate about animals’ capacity for reference and its evolutionary continuities with human communication.

In this paper, I begin by reconstructing and critically analysing some influential positions in this debate. I survey, in particular, three influential characterisations of animal signalling as an evolutionary precursor of linguistic reference: the theory of functional reference (Marler et al., 1992; Sect. 1), Wheeler and Fischer (2012)’s meaning attribution framework (Sect. 2) and Scarantino (2013)’s revised definition of functional reference (Sect. 3). I show that functional reference, both in its traditional and revised version, and the meaning attribution framework, fail to adequately characterise animal reference as an evolutionary precursor of linguistic reference (Sects. 4 and 5). A major problem with these characterisations, I suggest, is that they all overlook at least some aspects of the psychology of animal signalling, especially the psychological processes of signal production (Sect. 6). By drawing on Crockford et al., (2012, 2017)’s study of chimpanzees’ alert hoos, in Sect. 7 I show that it is possible to plausibly interpret at least some instances of animal signaling as cases of intentional, human-like animal reference.

The existence of psychological similarities in animal and human acts of reference, I argue, provides us with valuable insight into the evolution of linguistic reference, and paves the way for a framework that goes beyond functional reference and meaning attribution to isolate phenomena that are more plausibly evolutionarily connected to linguistic reference, and more apt to shed light on its evolutionary history.

2 The theory of functional reference

Marler et al. (1992) originally introduce the theory of functional reference to challenge the classic ‘motivational view’ of animal communication, which dates back to Darwin (2013). According to this view, animal signals are reactive, involuntary outputs of the signallers’ internal states that lack the symbolic meaning of human words (i.e. they do not refer to things), being comparable to humans’ involuntary laughs or cries. In contrast to the motivational view, Marler et al. (1992) draw on Seyfarth et al. (1980)’s empirical study of vervet monkeys’ alarm calls, arguing that certain animal signals may in fact be regarded as evolutionary precursors of human words since they seem to have the referential capacity typical of human linguistic signs. The acoustic structure of these signals seems to encode information about objects. On the one hand, vervet monkeys produce different types of calls in response to the detection of three different types of predators: leopards (leopard alarm calls), eagles (eagle alarm calls) and snakes (snake alarm calls). On the other, recipients respond to these calls in a way that is adaptive to each type of predator’s hunting strategy—even without seeing the predator itself: upon hearing leopard alarm calls, vervet monkeys run up into trees; when they hear eagle alarm calls, they look up and run into bushes; when they hear snake alarm calls, they look down and stand bipedally.

The theory of functional reference establishes two empirical criteria for assessing the referential capacity of animal signals. These are the ‘production criterion’ and the ‘perception criterion’ (see Macedonia & Evans, 1993, pp. 179–180). In brief, the production criterion requires that signals are produced in response to, and thus correlate with, a narrow class of objects or events in the environment, so that one can say that they refer to these well-identifiable objects or events (stimulus-specificity). The perception criterion requires that signals refer to the same thing in different contexts (context-independence). This last criterion is aimed at ascertaining that the information which is communicated is communicated by virtue of the signals themselves—and not because of other sources of information. Whenever an animal communicative signal proves to be stimulus-specific and context-independent, it reaches, for functional referentialists, the so-called ‘threshold’ for functional referentiality (see also Marler et al., 1992).

Functional referentialists, however, qualify non-human referentiality as ‘functional’. In the theory of functional reference, the concept of referentiality is functional in the sense of being “neutral about the underlying mental processes” (Marler et al., 1992, p. 67), of being silent on the actual psychological processes governing animal communicative behaviours. The point raised by these theorists is that empirical evidence of referentiality does not support any conclusion about the psychological mechanisms underlying animal communicative interactions. More specifically, they argue that, despite the superficial similarity between animal referential signals and human words (i.e. their ability to context-independently encode information about objects) it may still be possible that animal communicative interactions are underpinned by different psychological mechanisms from those of humans. As Hauser (1996) clarifies, functionally referential animal signals ‘function’ like human words, but they are not necessarily the same as human words: like human words, functionally referential signals are taken by their recipients to stand for objects or events in the environment (i.e. animal recipients respond to these signals as if they perceive the object itself). This does not imply, though, that they refer to objects or events like human words, i.e. that they are produced according to the same psychological mechanisms of linguistic reference.

3 Wheeler and Fischer’s meaning attribution framework

The theory of functional reference has been seriously challenged by Wheeler and Fischer (2012), who consider it inadequate for capturing the continuities between animal and human communicative capacities. They propose to abandon the theory of functional reference in the study of language evolution and to replace it with their receiver-centred framework that focuses on the mechanisms of signal comprehension or meaning attribution.

Let me begin by considering Wheeler and Fischer’s objections to functional reference. From an empirical standpoint, Wheeler and Fischer argue that the phenomenon of functional reference, if existent, is extremely marginal in animal communication. In fact, empirical evidence suggests that the majority of animal signals are neither stimulus-specific (i.e. produced by animals in response to specific objects or events in the environment) nor context-independent (i.e. capable of eliciting appropriate responses in the recipients in the absence of contextual clues), failing to satisfy both criteria for functional reference—the ‘production criterion’ and the ‘perception criterion’, respectively.

With regard to the ‘production criterion’, it is Wheeler and Fischer’s contention that the majority of animal signals are not stimulus specific: same types of animal signals tend to be produced by animals in a diverse range of circumstances. For example, primates’ terrestrial alarm calls occur during aggressive interactions with their conspecifics, too. Some overlap across contexts has also been reported in the case of vervet monkeys, the paradigmatic example of functional reference (Price et al., 2015).

With regard to the ‘perception criterion’, Wheeler and Fischer contend, the majority of animal signals are not context independent since the same types of animal signals tend to elicit different behavioural responses in their recipients on the basis of the context of production. Animals take contextual clues into account when interpreting communicative signals. As Seyfarth and Cheney (2017) note, precisely because same types of signals are produced by senders in a variety of situations (non-stimulus-specificity), receivers systematically need to rely on contextual interpretation in order to identify the signals’ eliciting stimuli, i.e. to determine the signals’ informational contents (context-dependency). An example of this is the interpretation of eagle alarm calls or hacks by Putty-nosed monkeys. Putty-nosed monkeys, as Arnold and Zuberbühler (2013) report, produce hacks in response to eagles, and to falling trees and breaking branches. When they hear the sound of a falling tree, Putty-nosed monkeys do not respond to eagle alarm calls as they would in the case of a predatory risk—that is, they do not enact characteristic flee behaviours. The contextual clue sound of a falling tree has a key disambiguating function in Putty-nosed monkeys’ signal interpretation.

Moreover, Wheeler and Fischer (2012) raise a theoretical objection to accounts of functional reference. According to Wheeler and Fischer, the theory of functional reference is inadequate to capture continuities between animal and human capacities to refer to things in communication, because, as a functional framework, it is neutral about underlying psychological processes. In the scholarly literature, linguistic reference is widely understood as a psychological phenomenon with a Gricean intentional structure (Grice, 1957; see also Sperber & Wilson, 1995; Carston, 2002; Tomasello, 2008; Scott-Phillips, 2014; Moore, 2017c): (i) an utterer produces a signal with the intention to direct a receiver’s attention to an object; (ii) an utterer produces a signal with the intention that their receiver recognises their intention in (i). On this, Wheeler and Fischer are in agreement with the majority of scholars. Given that linguistic reference is a psychological phenomenon, and that functional reference is neutral about psychology, Wheeler and Fischer conclude that there is no direct, meaningful connection between functional reference and linguistic reference. As they claim, functionally referential animal signals could be regarded as precursors of linguistic reference only if they are produced by similar psychological mechanisms.

However, Wheeler and Fischer maintain that there is too great of a difference in psychological mechanisms of signal productions between animals and humans for this to be plausible. In their view, not only do humans have the psychological capacity to communicate with Gricean intentions that animals lack (see also Fischer & Price, 2017), but animal signal production is also significantly less flexible or even “inflexible” in comparison with human language. This view, for Wheeler and Fischer, is supported by neurological evidence showing that primates lack the neural system which is responsible for voluntary vocal control in humans. Thus, they stress, while human words are learned in ontogeny, animal signals are for the most part hardwired both in their forms and in their functions (see also Fischer, 2017). It is for this psychological gap that Wheeler and Fischer (2012, p. 195) suggest that we should completely abandon the theory of functional reference in the study of language evolution.

On the other hand, drawing on Seyfarth and Cheney (2003), Wheeler and Fischer suggest that there is much more psychological continuity between animals and humans with respect to signal comprehension. Thus, they propose to focus on the psychology of the receivers, especially on their capacity for ‘meaning attribution’. As Wheeler and Fischer conceive of it, meaning attribution is a combination of two basic capacities: first, the capacity to integrate contextual information in signal comprehension and, second, the capacity to psychologically represent the signals’ informational contents.

Firstly, Wheeler and Fischer position themselves in clear contrast to functional reference by insisting on the relevance of contextual interpretation for identifying the continuities between animal and human communication (see Wheeler & Fischer, 2012, p. 203). According to Wheeler and Fischer, context-dependent signal interpretation should be regarded as a point of connection between animal and human communication, because it is much more cognitively complex—and thus potentially closer or more similar to human language—than context-independent signal interpretation (Wheeler & Fischer, 2012, pp. 201–202).

Secondly, by focusing on meaning attribution, Wheeler and Fischer draw attention to animals’ capacity to psychologically represent the ‘meaning’ or information carried by signals. They find evidence for this psychological mediation in the fact that animals learn their behavioural responses during ontogeny (see e.g. Seyfarth & Cheney, 1986), and in the fact that, when animals receive communicative signals, they typically look for additional contextual clues, e.g. by looking towards the signaller or by scanning the surrounding area (Arnold & Zuberbühler, 2013). As we will see in the following sections, in contrast to contextual interpretation, the focus on the psychological mediation distinguishes Wheeler and Fischer’s approach from Scarantino’s.

4 Scarantino’s revised definition of functional reference

Scarantino (2013) responds to Wheeler and Fischer (2012)’s objections against the theory of functional reference, rejecting their claim that we should abandon functional reference in the study of language evolution. According to Scarantino (2013, p. 1007), “the current definition of functional reference is indeed conceptually flawed”, but we should resist “calls for getting rid of the theoretical construct [of functional reference] altogether”. Scarantino nonetheless agrees that there are some problems with the traditional formulation. Like Wheeler and Fischer, Scarantino acknowledges that from an empirical standpoint the phenomenon of functional reference is rare—or even non-existent—in animal communication. Moreover, Scarantino agrees that the traditional definition of functional reference does not adequately capture the continuities between animal and human referential capacities. Not only do animal signals very rarely meet the standard criteria for functional reference (i.e. the “production criterion” and the”perception criterion”), but also linguistic reference (in humans) can be non-stimulus-specific and context-dependent. For example, indexical expressions such as I, you, today, this, that, etc. are not stimulus-specific: in fact, they are produced in, and they correlate with, a variety of situations (e.g. a variety of utterers in the case of the indexical I). They are also context-dependent, because their interpretation, too, requires considering contextual clues (e.g. who is the utterer in the case of the indexical I). And yet, even though indexical expressions are not stimulus-specific and context-independent, they do refer. The existence of non-stimulus-specific and context-dependent linguistic items shows that linguistic reference does not essentially depend on the stimulus-specificity and context-independence of signals. Thus, for Scarantino, the characterisation of reference first given by Marler et al. (1992) is inadequate.

Despite its theoretical limitations, Scarantino believes that functional reference remains a useful tool (Scarantino, 2013, p. 1017). Here is where Scarantino significantly deviates from Wheeler and Fischer. In contrast to Wheeler and Fischer, Scarantino thinks that it is possible to productively connect linguistic and non-linguistic reference in abstraction from psychological continuities. All that is required for Scarantino (2013, p. 1007) is simply a “better definition” of functional reference, one that applies to signals that are stimulus-specific and context-independent (e.g. the majority of human words), as well as to signals that are not stimulus-specific and not context-independent (e.g. most, if not all, animal signals, and human linguistic expressions such as indexicals) (see also Scarantino & Clay, 2015).

In the revised account proposed by Scarantino (2013, p. 1012), “[s]ignals can functionally refer by virtue of contextual cues and in the absence of a strong correlation with their referents”. Macedonia and Evans (1993)’s “production criterion” is replaced by a weaker “[c]ontextual information criterion”. While the production criterion requires that signals are produced in response to a stable class of objects or events in the environment (strong correlation), the contextual information criterion admits weaker correlations between signals and referents: X carries information about Y iff “Xs are correlated with Ys (weakly or strongly)” (Scarantino, 2013, p. 1014).

Macedonia and Evans’s “perception criterion” is replaced by Scarantino (2013, p. 1016) with a context-sensitive “[c]ontextual perception criterion”: “X’s presentations in context C reliably cause responses adaptive to Ys in the absence of Ys”. While Macedonia and Evans’ “perception criterion” requires that signals elicit appropriate behavioural responses from the recipients in the absence of contextual clues, the “contextual perception criterion” allows for reference to be context dependent.

5 The limitations of Scarantino’s revised definition of functional reference

Scarantino’s revised definition of functional reference comes under criticism from Wheeler and Fischer (2015). Wheeler and Fischer (2015, p. e11) argue that on Scarantino’s account all signals become ‘functionally referential’. Since all signals can potentially meet Scarantino’s criteria (i.e. the “contextual information criterion” and the “contextual perception criterion”), according to Wheeler and Fischer’s criticism, his revised definition of functional reference is not useful for capturing signals that are actually relevant to an account of language evolution.Footnote 1

First of all, according to Scarantino’s criteria, all animal communicative signals are potentially “functionally referential”. It is possible to argue that virtually all animal communicative signals correlate (either weakly or strongly) with states of affairs (“contextual information criterion”) in a context (“contextual perception criterion”): in a context C, for example, the alarm call of a Putty-nosed monkey correlates with falling trees and can provide suitable recipients with information about falling trees (Arnold & Zuberbühler, 2013). A duckling’s distress call can correlate with, and convey information about, hunger or fear depending on the circumstances (Abraham, 1974). As we saw in Sect. 1, the original aim of functional reference was to identify animal signals that can be closer to human linguistic reference (Macedonia & Evans, 1993; Marler et al., 1992). As Wheeler and Fischer (2015, p. e9) rightly point out, once the functional reference category becomes coextensive with animal communication in general, “it is no longer productive to distinguish between functionally referential communication and communication more generally”.

In fact, I want to further argue, Scarantino’s revised definition of functional reference is even broader, encompassing phenomena that fall even outside what we normally refer to as ‘animal communication’. Not only does it apply, potentially, to all animal communicative signals, but it also extends to “cues” and to a vast range of other phenomena that are not strictly speaking communicative, at least not according to our intuitions of what (human-like) communication is. (A way to narrow down the range of application of Scarantino’s revised definition of functional reference is by appealing to an independent set of criteria that isolate specifically ‘communicative’ behaviours. This move, however, seems to be incompatible with functional reference, especially with the claim—which Scarantino seems to implicitly accept—that the “contextual information criterion” and the “contextual perception criterion” are the necessary and sufficient conditions for a signal to be functionally referential.)

Let me begin by examining the case of cues. In the scientific literature on animal communication, a “cue” is technically a behaviour or a structure of an animal that ‘happens’ to influence, in a non-communicative way, the behaviour of another (Maynard-Smith & Harper, 2003; Scott-Phillips, 2008). An example of a cue is, for instance, the size of an animal. All animal cues, such as the size of an animal, can meet the requirements for Scarantino’s definition of functional reference. For example, the size of a red deer statistically correlates with good or bad fighting abilities, or with the tendency to win or lose in a contest. This satisfies Scarantino’s “contextual information criterion”. A recipient, moreover, may well be in the position to respond to such correlated phenomena adaptively and within a context, as required by Scarantino’s “contextual perception criterion”. For example, a recipient may be predisposed to, or learn to, cautiously avoid conflict with bigger, thus stronger, deers. Thus, Scarantino’s revised definition is unable to differentiate between non-communicative cues and communicative signals. It equally applies to the size of a red deer and its roars.

Moreover, Scarantino’s revised concept of functional reference indiscriminately subsumes a vast range of other non-communicative phenomena in the environment that are not produced by biological entities. As animal communicative signals, environmental phenomena, such as smoke and black clouds can be regarded as signals, since they correlate, weakly or strongly, with states of affairs, such as fire and rain (“contextual information criterion”). The ‘referents’ of these environmental signals may as well vary context-dependently: for example, black clouds may indicate rain with a temperature above 0 °C, and snow with a temperature below 0 °C (“contextual perception criterion”).

Then, the broadness of Scarantino’s theory raises concerns about its practical utility. As noted by Wheeler and Fischer (2015), if all signals can potentially be treated as functionally referential, it becomes unclear how keeping the category of functional reference can aid in narrowing down possible precursors of linguistic reference.

Another issue with Scarantino's approach to language evolution emerges when we consider the following. Like Marler et al. (1992), Scarantino identifies the phenomenon of reference by its function, that is, from the point of view of the selective effect it achieves, namely, providing recipients with information about objects. This point of view is often referred to as ‘ultimate’, and is commonly contrasted with proximate explanations of behaviours, which focus, instead, on illustrating the mechanisms by which functions are achieved. According to Mayr (1961)’s original formulation of the ultimate-proximate distinction, ultimate explanations are concerned with why a behaviour exists; proximate explanations with how a behaviour works—they look at the internal and external factors that immediately generate the behaviour (see also Scott-Phillips et al., 2011).

The fact that Scarantino’s account, like other functional approaches to reference, is unconcerned with the mechanisms by which referential behaviours achieve their function has one crucial implication: the signals covered by his definition can achieve their function through quite different cognitive mechanisms. We already saw in Sect. 1 that functional frameworks are neutral about psychology. Scarantino himself acknowledges the possibility of significant differences in the ways in which signals functionally refer when he points out:

“The exploration of the semantic aspects of animal signalling should resist the identification of non-linguistic signals with words, but it should also account for the fact that words and signals ‘stand for’ external objects in their own, distinctive ways. The label “functional reference” usefully hints at the existence of both analogies and differences between linguistic and nonlinguistic reference” (2013, p. 1011).

As noted by Scott-Phillips and Heintz (2023a, 2023b), features that are superficially similar and accomplish similar functions may or may not be evolutionarily related. Put another way, they may be analogous rather than homologous: while homologous features are similar in form because they derive from a common ancestral trait; analogous features are similar in form because they fulfil similar functions, but have evolved independently (Scott-Phillips & Heintz, 2023a, 2023b, p. 94). When two features are underpinned by different cognitive mechanisms, Scott-Phillips and Heintz (2023a, 2023b) go on to argue, it is less plausible that they result from a common ancestry. By incorporating signals that rely on quite distinct cognitive mechanisms, the paradigm of functional reference encompasses phenomena that aren’t plausibly precursors of linguistic reference—they are more likely analogous than homologous. Particularly problematic, I stress, is the case of signs such as smoke and black clouds, which can be said to functionally refer (see above), but which are not even part of evolutionary trajectories.

The failure of the functional reference category to distinguish between analogous and homologous traits is at least problematic within the context of a theoretical framework that aims to shed light on language evolution (Scarantino, 2013, p. 1017; Scarantino & Clay, 2015, p. e6; see also Moore, 2017b). As I will show in Sect. 7, in addition, empirical evidence points towards the existence of genuine cases of reference in the animal kingdom: there are some animal communicative acts whose psychological underpinnings appear to be in important respects comparable to those of linguistic reference. This suggests a plausible homology between these mechanisms and those at work in humans; thus, the possibility for an account of reference evolution to effectively narrow down its focus of investigation by introducing some specific psychological constraints on signal production.

Scarantino’s framework has its drawbacks as an account of language evolution, but it remains useful for the study of animal interactions in general. It is particularly useful when we acknowledge, for example, the existence of animal communication systems that might not be psychologically mediated. While a psychological account of reference would exclude those systems, these are included in Scarantino’s paradigm thanks to its lack of constraints on signal production. In this sense, Scarantino’s all-encompassing approach enables us to explore the various ways in which signals can provide recipients with information about objects. The broadness of Scarantino’s framework is a strength when it comes to studying animal communication systems in their own right, but it constitutes a limitation in the context of language evolution. In the study of language evolution, our focus lies not in the diversity of animal communicative systems but rather in phenomena that are closer to linguistic communication and are more plausible as its precursors.

6 The limitations of Wheeler and Fischer’s meaning attribution framework

Turning now to Wheeler and Fischer, their framework, too, encounters challenges when considered as an account of the evolution of linguistic reference. As we saw in Sect. 2, Wheeler and Fischer do not think there are significant continuities between animal signalling and linguistic reference at the level of signal production, but rather at the level of signal comprehension. They see the point of connection between animal and human communication in the capacity for “meaning-attribution”—that is, the psychological process of signal comprehension, which involves the capacity to contextually interpret and mentally represent the informational contents of signals.

First of all, one fundamental problem with Wheeler and Fischer’s framework is that it does not seem to be able to isolate phenomena that are relevant to an account of the evolution of linguistic reference: the focus of the meaning-attribution framework is on the receiver; reference, by contrast, is something that’s done by speakers. In referring, it is speakers who use expressions to refer interlocutors to objects (see Sect. 7; see also Sievers & Gruber, 2016). In situations where acts of reference do not succeed, and there is no interpretive process on the receiver’s part, they still quality as acts of reference. Moreover, even regarding the comprehension of signals, the psychological processes involved in meaning attribution appear to be too far removed from the specific psychological processes that underpin language interpretation to be deemed significant for language evolution. As argued by Bar-On, (2021, p. 6), the meaning attribution framework “sets the explanatory bar too low” in the studies on language evolution. Meaning attribution, Bar-On (2021, p. 6) argues, marks continuity between animal and human communication, but this continuity “seems hardly sufficient by itself to illuminate the origins of distinctively human communication”.

One primary reason for this discrepancy lies in the meta-psychological character of utterance interpretation. As argued by Gricean scholars, linguistic interpretation consists in inferring the mental states of the speakers—that is, of grasping what the utterers intend to communicate by uttering (Grice, 1957; Tomasello, 2008). Consider this: by the same act of pointing towards a bicycle parked in a specific location S at a time T in the presence of a friend, I can mean a variety of different things such as “I would fancy that kind of bicycle”, “Here is my bicycle”, “Take your bicycle”, “See, people always park their bicycles here”, “Look at how beautiful the colour of that bicycle is”, and so on. What my act of reference will ultimately mean is determined by my intentions in producing it. The receiver will have to grasp the intentions with which I produced this utterance in order to understand it. In this sense, the receiver’s interpretive process is metapsychological.Footnote 2

In contrast, in the context of Wheeler and Fischer’s framework, animals “attribute meaning” to signals in light of what they have experienced co-occurring with the signal, and by integrating pieces of contextual information (Wheeler & Fischer, 2012, p. 200). For example, a Putty-nosed monkey who attributes the meaning ‘eagles’ to a series of conspecifics’ eagle alarm calls or hacks (Arnold & Zuberbühler, 2013) psychologically represent the presence of an eagle on the basis of previously experienced hacks-eagles correlations, plus other contextual clues (e.g. the sight of the eagle itself). In addition to the capacity to mentally represent signals’ informational contents and, additionally, the capacity to interpret communicative signals in different contexts, it is clear that meaning attribution requires only that there are learnable statistical correlations between states of affairs. However, it is important to note, this is a very loose requirement: learnable statistical correlations can involve communicative signals, as in the case of the Putty-nosed monkeys’ hacks, but also non-communicative signals (see Sect. 4).

Consequently, the point of connection identified by Wheeler and Fischer between animal and human signal comprehension is a capacity (i.e. context-dependent representation of correlational information) that is not specific to communicative interactions only, but is involved in the cognitive apprehension of the world in general, including phenomena that are distinctively non-psychological (e.g. smoke as a sign of fire). From this it follows that that meaning attribution is a psychological capacity of the receiver to interpret mere states of affairs, not mental states. Since language interpretation is a meta-psychological process (i.e. is a matter of grasping mental states), meaning attribution, as is, cannot be seen as a relevant precursor (see e.g. Bar-On & Moore, 2017; Bar-On, 2021).

As I will argue in Sect. 7, not only can animal signal production mechanisms shed light on the evolution of linguistic reference, but the existence of forms of genuine reference in animal behaviours also raises the possibility that receivers' comprehension mechanisms may be more similar to those of humans (i.e. be meta-psychological). This would be consistent with existing evidence of mindreading capacities in animals (see e.g. Call & Tomasello, 2008). More on this in Sect. 7.

Despite its limitations, I argue, Wheeler and Fischer’s meaning attribution framework remains a better candidate for providing an account of language evolution than Scarantino’s revised definition of functional reference. This is because human language, of which some forms of animal communication could be evolutionary precursors, is based on psychological processes, and meaning attribution is similarly a psychological process of signal reception. In contrast, functional reference—even in its revised form—implies distinctively, and deliberately, non-psychological mechanisms of signal reception. As stressed by Carazo and Font (2010, p. 667), “the adoption of a functional definition of information does not entail any commitment about the degree to which senders and receivers are aware of the information being conveyed” (italics added). In Scarantino’s revised definition, it is of no consequence whether information is psychologically represented by animals or not. For example, functional reference is also compatible with responses that are causally elicited by the physical structure of signals, without any psychological mediation (see e.g. Rendall et al., 2009). This is because, as stressed Kalkman 2017), in Scarantino’s framework, information is an ultimate, and not necessarily a proximate, explanatory construct: it accounts for the evolution of recipients’ responses, but it does not necessarily play a role in recipients’ real time decision-making. When information plays an ultimate explanatory role, the fact that signals correlate with, and carry information about, determinate states of affairs explains why determinate responses have been selected, but the correlations need not be psychologically represented by individual animal recipients. In contrast, in Wheeler and Fischer’s framework, information or meaning is consistently psychologically represented by animals. This makes their framework better suited than Scarantino’s for capturing the psychological continuities between animal and human capacities for signal comprehension.

7 Animal signals and natural meaning

Despite their differences, Scarantino and Wheeler and Fischer’s approaches are united in one important respect: they both aim to provide an account of language evolution in abstraction from the psychological processes underpinning signal production in animals. While Scarantino’s functional reference is completely neutral about psychology, Wheeler and Fischer’s framework exclusively focuses on the psychology of signal comprehension, excluding signal production. The reason for this exclusion lies primarily in the fact that, echoing a long-standing tradition (see e.g. Seyfarth & Cheney, 2003; Tomasello, 2008), Wheeler and Fischer and Scarantino hold the view that the mechanisms of call production in non-human animals are radically different from those at work in human language and thus irrelevant for understanding its evolution.

In a broad sense, linguistic reference is a form of intentional action. Acts of reference are at least in part under voluntary control. They are different from automatic, involuntary reactions to stimuli. For example, if during a hike with a friend one utters “That’s a snake!”, this action is intended, in a manner that an involuntary reaction such as blushing is not. In referring, utterers intend to refer their interlocutors to an object to which they are attending (Bach, 2008; see also Campbell, 2004). Acts of reference, then, succeed when the audience attends to the intended object. In Gricean accounts of communication, senders also intend that the audience recognises their intention that they attend to a certain object (among other things). The intentional structure of referential acts in Gricean accounts comprises, then of (at least) two key components: (i) an utterer produces a signal with the intention to direct a receiver’s attention to an object; (ii) an utterer produces a signal with the intention that their receiver recognises their intention in (i) (see Grice, 1957; see also Neale, 1992; Sperber & Wilson, 1995; Carston, 2002; Tomasello, 2008; Scott-Phillips, 2014; Moore, 2017c). On Gricean conceptions of human communication, receivers fully grasp (Gricean) referential communicative acts when they think of the object in the manner intended by the speaker.

On the one hand, Wheeler and Fischer and Scarantino endorse the Gricean view of linguistic reference. On the other, they believe that the mechanisms at work in animal signal production are too unsophisticated to be plausible evolutionary precursors. In particular, as anticipated in Sect. 2, Wheeler and Fischer (2012) regard animal signal production as largely inflexible and list “the inflexibility of signal producers” in animals as one of the reasons for focusing on animal signal reception instead. Animal inflexibility, according to Wheeler and Fischer, can be found in the fact that animal signals are largely hardwired due to a limited neurological capacity to voluntarily modulate the structure of their calls.

On the view that there is discontinuity in the mechanisms of call production, Wheeler and Fischer and Scarantino remove from their frameworks any reference to such mechanisms. In both Wheeler and Fischer and Scarantino’s frameworks, signals are attributed a loose concept of what Grice called “natural meaning”—a nonintentional concept of meaning that Grice (1957) distinguishes from the nonnatural meaning typical of human communication. As Grice originally conceptualizes it, natural meaning is the property of an entailment relationship: a signal p naturally means q iff p entails q. In the looser formulation employed by Wheeler and Fischer and Scarantino, a signal p naturally mean q iff it correlates statistically with q; thus, iff the occurrence of p raises the probability of q in the eyes of a recipient (see also Scarantino, 2015). Both in Grice’s original concept and in this looser formulation the signals’ capacity to convey information is independent of what psychological processes underpin their production.

In contrast, as we will see in the Sect. 7, the psychological processes involved in signal production should be incorporated by an account of the evolution of linguistic reference.

8 Building a case for (genuine) animal reference

With regard to the inflexibility argument raised by Wheeler and Fischer, it is important to consider that causes can operate in two different ways. To use Dretske (1988)’s famous distinction, there are structuring causes and triggering causes. Structuring causes are those that constrain the behaviour of a system. In the case of monkey vocalisations, for example, structuring causes can be said to be factors shaping the vocalisation form. Triggering causes, on the other hand, are those that control whether the behaviour is produced or not (e.g. whether monkeys produce their vocalisations). Now, the fact that, as Wheeler and Fischer point out, structuring causes operate on monkey vocalisations (i.e. the forms of monkey vocalisations are largely hardwired) does not imply that the production of these behaviours is equally causally determined (i.e. that monkeys do not have control on whether to produce their calls and on other aspects of their use). Something similar has been recently argued by Bar-On (2021, p. 10) in relation to primate vocalisations (see also Armstrong, 2023).

In addition, recent empirical evidence suggests that at least some animal signals are not used in an inflexible way (e.g. Graham et al., 2019; Townsend et al., 2017). Most crucially, there are at least some animal communicative acts exhibiting strong psychological similarities with linguistic reference. These findings carry profound implications for our thinking about the evolution of linguistic reference, and prompt us to reconsider the importance of animal signal production for understanding its evolution.

To appreciate this, let me consider Crockford (2012, 2017)’s field experiments on chimpanzees’ alert hoos. Alert hoos are types of alarm calls produced by chimpanzees upon encountering potential threats, such as snakes, that are not perceived as very dangerous. Crockford et al. designed a series of experiments to understand what contextual variables influence the production of these calls, and whether these calls are produced intentionally by chimpanzees to target ignorant recipients. In these experiments, Crockford et al. place a model resembling a sleeping Gaboon viper on the expected forest path of chimpanzees at the Budongo Forest in Uganda.Footnote 3 They observe chimpanzees' calling behavior in response to the model in two different conditions: (1) Knowledgeable condition: the receiver of the call is knowledgeable of the threat (e.g. the receiver has either seen the snake, has previously heard an alert hoo, or has produced an alert hoo itself). (2) Ignorant condition: the receiver is ignorant about the threat (e.g. the receiver has either not seen the snake, has not previously heard an alert hoo, or has produced a non-snake-related rest hoo call instead of a snake-related call alert hoo right before).

The data collected by Crockford et al. show that chimpanzees are more likely to produce alarm calls in the receiver-ignorant condition, than in the receiver-knowledgeable condition. These findings are consistent with the hypothesis that chimps can voluntarily control the production of alert hoos, and that they produce alert hoos in consideration of the knowledge states of their recipients. Other empirical observations support the hypothesis of intentional signal production in chimpanzees. As Crockford et al. report, in addition to being sensitive to the receivers’ knowledge states, chimpanzees’ signalling behaviours meet some other crucial markers for intentionality, such as “gaze alternation”, “audience checking” and “persistence” (Townsend et al., 2017). The fact that chimps meet more criteria for intentionality makes it more probable that their signals are voluntarily produced (Townsend et al., 2017; see also Schel et al., 2013). On the basis of this evidence, it is less likely, for example, that alert hoos are merely arousal responses (e.g. automatic expressions of fear). The (non-intentional) hypothesis that alert hoos are mere responses to arousal in chimpanzees also appears less plausible in light of the observation by Crockford et al. that chimpanzees stop calling when the recipient sees the snake, but not if the recipient accidentally moves away from the snake. In addition, the signaller’s and the receiver’s absolute distance from the snake, as well as the number of receivers potentially at risk, do not seem to affect whether chimpanzees produce alert hoos or not.

The fact that chimpanzees may volitionally control the production of their alert hoos on the basis of their recipient’s knowledge states allows us to plausibly suppose that chimpanzees intentionally use their alert hoos to draw their recipient’s attention to an object. If chimpanzees produce alert hoos more frequently when the receivers aren’t aware of the snake, this is positive indication that—similarly to human acts of reference—they intend to refer their interlocutor to the snake by their signals.Footnote 4 This interpretation, furthermore, is reinforced by the observation that chimpanzees cease their calling only once the recipient has seen the snake, a pattern that mirrors the success of a typical act of reference—acts of reference, as mentioned, succeed when interlocutors effectively attend to the object.

Then, as Crockford et al.’s study suggests, it seems possible to conceptualise at least some instances of animal referentiality not just as functionally referential, but as cases of human-like intentional reference: as humans, chimps appear to produce alert hoos intending to draw a recipient’s attention to an object (i.e. the snake). These findings contrast prevailing theories in the studies of the evolution of reference (i.e. functional reference and meaning attribution), which take the mechanisms governing signal production in animals to radically differ from those of linguistic reference (see Sect. 6).

This psychological continuity, I contend, encourages us to consider the possibility that genuine reference may indeed exist in chimpanzees; and that the mechanisms for reference may be homologous in chimps and humans. Looking at works such as Scott-Phillips and Heintz (2023a, 2023b), cognitive dissimilarity is often used as an argument against homology: for Scott-Phillips and Heintz (2023a, 2023b, p. 98), the fact that great apes vocalisations are not under volitional control suggests that they are not homologous to language, as much to spontaneous expressions such as laughter. Conversely, the existence of psychological continuity between chimpanzees' alert hoos production and linguistic reference provides at least a strong basis for entertaining the hypothesis that they may be homologous features. That is, not just superficially similar in their form and function, but also evolutionary related. The fact that chimps and humans are close in the evolutionary scale adds weight to the argument that reference mechanisms may be homologous between these two species. As argued by Sober (2005, 2012), it’s cladistically more parsimonious to suppose that similar behaviours observed in closely related clades have evolved once in a shared ancestral lineage, rather than arising independently in later lineages; and that they are supported by common underlying mechanisms. Furthermore, this hypothesis is reinforced by similar findings in bonobos (Girard-Buttoz et al., 2020).

While reference mechanisms may serve as points of evolutionary continuity with humans, it’s important to acknowledge, nevertheless, that other aspects of a speaker's psychology might still exhibit different evolutionary trajectories. The psychological continuity registered between chimps’ production of alert hoos and linguistic reference suggests that the mechanisms for reference are homologous between these two species, but it doesn’t provide insight into other psychological mechanisms which could also be either homologous or analogous.

Recent work on primate gestural communication, which is under voluntary control, has already led some scholars to propose psychological accounts of animal communication (see Moore, 2017a; Bar-On, 2021; Warren & Call, 2022; Armstrong, 2023; Scott-Phillips & Heintz, 2023a, 2023b). These accounts make different claims about the extent to which human and animal signalling are continuous. (However, perhaps because primate gestural communication is often dyadic rather than triadic (Tomasello, 2008), these views have not had much to say about the issue of reference).

For the purposes of this paper, I take no stand on the broader psychological framework with which chimpanzee referential communication should be characterised, and the extent to which these are continuous or discontinuous with human communication. Numerous candidates for the psychological structure of great ape communication exist, including minimally Gricean communication (Moore, 2017a), intermediary pragmatics (Bar-On, 2021), representational coordination (Armstrong, 2023), or ladyginian communication (Scott-Phillips & Heintz, 2023a, 2023b). Scott-Phillips and Heintz (2023b), for example, have recently argued that great apes and humans have different ways of manifesting intentions in communication. This would imply that humans and great apes convey reference differently. Moore (2017a, 2017c), instead, denies that there is a substantial distinction to be drawn. Evaluating these competing hypotheses is beyond the scope of this paper. However, even if one thinks that a full account of the psychology of human and animal signal production involves some discontinuous psychological states, the case of the chimpanzee alert hoo suggests that the psychological mechanisms of reference may be homologous.

The existence of animal acts of communication that appear to be psychologically similar to human acts of reference creates a theoretical opportunity to approach language evolution also from the perspective of the psychological processes underlying animal signaling behaviours. In particular, it sets the stage for an account of animal reference, and, by extension, of the evolution of reference, under the following constraint on signal production: (i) an utterer produces a signal with the intention to direct a receiver’s attention to an object. I maintain that an account of animal reference that centres on instances of animal signaling where (i) recipients intentionally use signals to direct their recipients attention toward an object holds particular importance in the field of language evolution research. This account focuses on phenomena that are psychologically similar to linguistic reference. As a result, it isolates phenomena that are more plausible as precursors, and thus more apt to shed light on its evolutionary history (cf. Scarantino, Sect. 4). The framework I propose is also better equipped than Wheeler and Fischer’s to isolate phenomena that are evolutionarily connected to linguistic reference. Unlike Wheeler and Fischer’s framework, this account finds continuity in the psychology of the sender, the individual performing the act of reference (see also Sect. 5).

Thus, while functional reference and meaning attribution can prove quite fruitful in the study of animal communication systems in their own right—they allow us to explore the diverse mechanisms through which signals from different species can provide recipients with information about objects (see also Sect. 4)—an account of voluntary, psychologically mediated reference, proves more valuable in the context of language evolution. It is in fact more productive for an account of reference evolution to pinpoint cases of reference in the animal kingdom that are more likely to share an evolutionary connection with linguistic reference, i.e. that are similarly psychologically mediated and exhibit a similar intentional structure. Of course, which species are capable of this form of intentional communication remains an empirical question. It may be that monkey vocalisations are involuntary, as suggested by Wheeler and Fischer (2012). Should this be the case, other species (e.g. chimpanzees) could still demonstrate capacities for intentional reference. We should then focus on these species to investigate the evolution of linguistic reference.

Before I conclude, there is one last thing I would like to note. The existence of forms of intentional reference in animals opens up a theoretical avenue for delving deeper into the specific psychological processes involved in interpreting these behaviours. It is possible, that these processes are meta-psychological, that is, they involve inferences about mental states: if chimps produce alert hoos with the intention of directing their recipients’ attention to a snake, recipients who grasp alert hoos may be grasping the sender’s intention that they see the snake. This would be consistent with current evidence of mindreading capacities in animals. Animals appear to be able to track goals, among other things (see e.g. Call & Tomasello, 2008). Based on recent studies, some scholars, such as Moore (2017a) and Warren and Call (2022), have already proposed to construe animals' interpretive processes as inferences about mental states analogous to how we would frame humans' interpretation of communicative utterances. Providing a meta-psychological account of signal interpretation in animals would yield valuable insights into the study of language evolution, particularly the evolution of the meta-psychological processes of linguistic comprehension. The development of this line of inquiry can be pursued further elsewhere.

9 Conclusion

In this paper, I have reviewed three influential characterisations of animal signalling as an evolutionary precursor of linguistic reference: the theory of functional reference (Marler et al., 1992), Wheeler and Fischer (2012)’s meaning attribution framework and Scarantino (2013)’s revised definition of functional reference. I have shown that these characterisations attempt to build an evolutionary theory of reference without relying on the psychological processes underpinning signal production in animals. The theory functional reference and Scarantino’s revised definition of functional reference, as functional frameworks, are unconcerned about the psychology of animal communicative behaviours. Wheeler and Fischer’s meaning attribution framework solely focuses on the psychological processes of receivers, excluding the psychological processes of signallers or producers.

By drawing on Crockford et al., (2012, 2017), in this paper I have shown that there are at least some instances of animal signalling exhibiting strong psychological similarity with linguistic reference. Like humans, chimps appear to intentionally produce their alert hoos to draw their recipient’s attention to an object. This line of inquiry has been rarely pursued, also because studies on animal signalling as an evolutionary precursor of linguistic reference and studies on intentional animal communication have largely developed as separate fields of study. The existence of animal acts of communication that are psychologically similar to human acts of reference, I have argued, creates a theoretical opportunity to approach language evolution also from the perspective of the psychological processes underpinning animal signaling behaviours. Particularly, it sets the stage for an account of animal reference, and, by extension, of the evolution of reference, where this involves an utterer produces a signal with the intention to direct a receiver’s attention to an object. I contend that an account of voluntary, psychologically mediated reference proves more valuable in the context of reference evolution than functional reference and meaning attribution. In contrast to Scarantino’s approach, this account isolates phenomena that, since they are cognitively more similar to linguistic reference, are more plausible as evolutionary precursors. Unlike Wheeler and Fischer’s framework, this account finds continuity in the psychology of the sender, the individual performing the act of reference.