Scientists believed that a fundamental difference between human language and animal communication lies in the ability to refer to entities within the environment (Darwin 1871; Lorenz 1952; Smith 1977). Humans can refer to external objects and events using words, and receivers can understand and perceive to what the words refer (Smith 1977; Hurford 2007). In contrast, since Darwin’s time, animal signals have long been assumed simply to reflect the expression of internal states (or arousal levels) of the senders (Darwin 1871; Lorenz 1952; Smith 1977) and not to refer to objects or events in the environment.

In the 1980s, a series of field studies on primate vocal communication challenged this assumption. A study conducted several years ago revealed that vervet monkeys (Chlorocebus pygerythrus), an Old World species that inhabits Southern Africa, produce acoustically discrete calls for different predators such as leopards, eagles, and snakes (Struhsaker 1967). Seyfarth et al. (1980a) hypothesized that these calls notify group members of different predator types and tested this hypothesis by field experiment. Playbacks of discrete alarm calls elicited qualitatively different behaviors, as if actual predators were present nearby: monkeys ran up a tree for leopard alarms, looked up in the air for eagle alarms, and stood up bipedally for snake alarms (Seyfarth et al. 1980a, b). These responses would be adaptive for the monkeys to evade leopards and detect the exact location of eagles or snakes (Seyfarth et al. 1980a, b). This was the first evidence that vocalizations of non-human primates can refer to external objects and convey this information to receivers. Here, I treat the term “information” as a reduction of uncertainty in the receivers, as this terminology does not contradict the mathematical models of information theory (Shannon 1948) and is consistent with the general usage of this term in disciplines that study animal behavior such as behavioral ecology and animal psychology (Font and Carazo 2010; Seyfarth et al. 2010).

The theoretical framework for the study of animal communication developed remarkably in the 1990s, when the concept of “functional reference” was introduced (Macedonia and Evans 1993; Evans 1997). Although the underlying mental processes of signal production and perception were unclear, researchers suggested the idea of “functional reference”, which provisionally referred to the signals that appear to convey information about external referents rather than the change in internal states of the senders. Macedonia and Evans (1993) proposed two key criteria to classify animal signals as functionally referential: (i) production specificity, which considers the degree to which a signal is associated with a specific external stimulus, and (ii) perception specificity, which considers the degree to which a signal elicits a specific response in the receivers (Fig. 1). To explore production specificity, researchers have tried to show different stimuli (such as different predators and food) to a focal individual and analyzed the association between the external stimuli and the acoustic variation in vocalizations (Evans 1997; Manser 2009; Zuberbühler 2009). To test whether the variation in certain calls is linked to perception specificity, researchers have conducted playback experiments in the absence of actual stimuli (i.e., predators or food) (Evans 1997; Manser 2009; Zuberbühler 2009). If these calls contained sufficient information about the external entities, receivers were expected to exhibit appropriate responses to the playback of calls as if the eliciting stimuli (e.g., predators or food) were nearby.

Fig. 1
figure 1

Frameworks for the study of semantic communication about external objects or events. The functionally referential framework a only considers the association between production specificity and perception specificity as information transmission, whereas the cognitive framework b assumes mental representations of external entities in information processing

Over the past three decades, researchers have explored functionally referential communication in many animal taxa including non-human primates (Zuberbühler 2009; Manser 2013), ground-dwelling mammals (Townsend and Manser 2013), and birds (Gill and Bierema 2013). Similar to vervet monkeys (Struhsaker 1967; Seyfarth et al. 1980a), these animals often produce different alarm calls for a variety of predators, and some of them can transmit many different types of information to receivers. Despite the increasing number of studies that demonstrated evidence for functionally referential signals in animals, there remain unanswered questions and controversy about the semantics of animal signals (Rendall et al. 2009; Wheeler and Fischer 2012).

First, although functionally referential signals seem to convey information about the environmental entities of the senders beyond their internal states, in most cases, it remains controversial whether such communication can be considered an evolutionary precursor of linguistic reference (Rendall et al. 2009; Wheeler and Fischer 2012). Even when a sender produces different calls for different stimuli and receivers respond to them with qualitatively dissimilar reactions, such behavioral cascades could occur simply as a result of associative learning without any construction of mental representations (Wheeler and Fischer 2012). In humans, senders encode information about external entities into words or phrases (i.e., messages, Smith 1977) through representational ideation, i.e., the formation of mental images or concepts about the external entities. Receivers derive this information (or meaning, Smith 1977) through the creation of mental representations, i.e., they retrieve stored information about the external entities from the signals (Rendall et al. 2009; Font and Carazo 2010; Fig. 1). However, to date, only a few studies have examined the cognitive processes that underlie signal perception and information processing by receivers (e.g., Zuberbühler et al. 1999; Evans and Evans 2007), and no known study has investigated representational ideation in senders.

Second, it may be problematic that previous studies bias researchers to consider a call as functionally referential only when it is acoustically distinctive from other call types. Animal communication signals could have a range of acoustic variations, including both discrete and graded variations (e.g., number of calls and length of calls), and call combinations (Zuberbühler 2009; Manser 2013). Thus far, it is unclear whether acoustic variations other than discrete variations could provide referential information to receivers. Recent theoretical work indicates that, even when the signal structure is not highly specific to external objects, receivers may still be able to derive “referential” information from the signals by virtue of contextual cues (i.e., pragmatics, Scott-Phillips 2010; Scarantino and Clay 2015). Therefore, it may be possible that graded and combinatorial variations in acoustic structure may also contribute to the transmission of information about external entities and could be considered functionally referential. However, the classical framework of functional reference has led researchers to neglect these acoustic variations and the pragmatic aspects of animal communication (Scarantino and Clay 2015).

Third, although functionally referential communication has been intensively investigated in non-human primates and other mammalian species such as meerkats and ground-dwelling rodents (Manser 2013; Townsend and Manser 2013), there have been only a few systematic investigations of referential communication in other animals such as birds. Thus, we are still far from a comprehensive understanding of how extensively functionally referential signals are involved in animal communication systems and which selective pressures drive the evolution of semantic communication in animals. Animal communication signals can evolve only when the senders benefit from producing signals that influence the behavior of the receivers (Dawkins and Krebs 1978). Although many studies have investigated the complexity of acoustic structure and the specificity of receivers’ responses to communication signals, it is also crucial to investigate how senders and receivers benefit from functionally referential communication.

In consideration of these three issues, I explore the evidence for communication about external entities in wild birds. Over the past two decades, a number of field studies have been conducted to investigate the information content of avian alarm calls. Birds often live in environments with multiple types of risks such as predators (Caro 2005), nest predators (Martin 1993), and brood parasites (Rothstein 1990). In addition, the risk posed by a single type of predator may also vary with its behavior and distance (Lima 2002; Stankowich and Blumstein 2005; Brilot et al. 2012). To survive predation hazards that vary based on predator attributes, birds have evolved a sophisticated communication system using alarm calls that may show discrete variations (different call types), graded variations (number of sound elements such as note number and calling rate, or finer acoustic features of an element such as call length, frequency/pitch, and relative amplitude), and combinatorial variations (combination of notes or calls). First, I summarize the relationship between acoustic variation and information content of alarm calls. Then, I explore which signal variations could be used to refer to external entities and how researchers can investigate cognitive processes that underlie signal perception and information processing. The goal of this review is to condense previous studies on the semantics of avian alarm calls from a wide spectrum and advance our understanding about the ecological significance, cognitive processes, and evolution of semantic communication.

Discrete variation in alarm calls

Classical example

The first experimental study on functional reference in birds was conducted in domestic chickens, golden Sebright bantams (Gallus gallus domesticus), which have the same call structure and repertoire size as wild red jungle fowls (G. gallus). Evans et al. (1993) examined the vocal responses of male chickens to two types of predators (flying raptors and raccoons) through a video monitor and found that they produce acoustically discrete types of alarm calls for these two predatory stimuli. The researchers further revealed that playbacks of these calls elicit discrete types of anti-predator responses in hens (Evans et al. 1993). Therefore, these two types of alarm calls seem to provide information about predator type to conspecifics and meet the criteria of functional reference (Macedonia and Evans 1993). However, in natural situations, fowls produce aerial alarm calls in response to a variety of non-threatening stimuli, such as flying insects and harmless birds, and often mix those two types of alarm calls in an anti-predator context (Gyger et al. 1987). Thus, the alarm-calling system of fowls seems to lack production specificity and is considered to fail the first criterion of functionally referential signals (Macedonia and Evans 1993; Evans 1997). This might be because of the so-called “better safe than sorry” strategy (Haftorn 2000), which increases the likelihood of avoiding predator attacks by reacting to any possible threat, even if it ends up being harmless.

Evidence from the wild

Several observations have revealed cases in which wild birds use discrete types of alarm calls in a similar manner to that of domestic chickens. For example, Marler (1955) described that many small bird species produce high-pitched “seeet” or “zi” calls when they detect a flying hawk in the air. Two years later, Marler (1957) found that birds produce loud, repetitive calls when approaching and harassing perched owls (i.e., mobbing, Curio 1978). The acoustic structures of aerial and mobbing alarm calls are quite different within a species and might therefore be used to communicate information about the behavior and/or spatial position of the predator. These observations have been anecdotal for a long time, as no experimental studies have been conducted to test whether senders vocally discriminate between those different threats and how receivers respond to those calls.

Over the past two decades, an increasing number of studies have applied the experimental approaches established by Evans (1997) to examine the information content of alarm calls in wild birds. Because of the rarity of observing natural predator encounters, researchers have used experimental presentations of mounted predator specimens to record alarm calls. Predator presentations are typically conducted around feeders and near nests, where individuals readily detect the predator models and researchers can easily record their behaviors. Griesser (2008) presented a number of predator models to Siberian jays (Perisoreus infaustus) at artificial feeding stations to elicit alarm calls. He also manipulated the movements of hawk models using a remote-control device. The field experiment revealed that Siberian jays produce different types of alarm calls and that the use of call types depends on the predator’s behavior; they produce three discrete types of alarm calls for perched, prey-searching, and attacking hawks. Playbacks of these calls elicit discrete adaptive responses in their flock members (Griesser 2008). Immediate risk of predation depends on the predator’s behavior (Lima 2002; Stankowich and Blumstein 2005; Brilot et al. 2012) and the adjustment of anti-predator behaviors would therefore improve the survival of the jays. This was one of the first experimental demonstrations of functionally referential alarm calls in wild birds.

Such sophistication in communication may also evolve in the context of brood defense. The Japanese great tit (Parus minor) provides an interesting example; these tits are cavity-nesting birds, but their nestlings face a variety of nest predator species such as jungle crows (Corvus macrorhynchos) and Japanese rat snakes (Elaphe climacophora). The adults use two discrete types of alarm calls when mobbing different predators that approach the nests; they produce “chicka” calls for crows and “jar” calls for snakes (Fig. 2) (Suzuki 2011). These calls are specifically produced in response to the two predator types, and they elicit different reactions in nestlings. Nestlings crouch down inside the nest cavity in response to the “chicka” calls; whereas they jump out of the nest on hearing the “jar” calls (Suzuki 2011). The two responses help the nestlings evade the corresponding predators, because crows snatch nestlings through the nest entrances using their beaks, whereas snakes invade the nest. Within the nest cavities, nestlings have no opportunities to learn to associate call types with predator-specific escaping behaviors, suggesting that functionally referential communication can evolve without previous experience of the receivers. Further investigations revealed that these two types of alarm calls also convey information about the predator type to other receivers such as their mates (Suzuki 2012a, 2015) and heterospecific individuals breeding around the tits’ nests (Suzuki 2016), and elicit appropriate anti-predator responses. Thus, tits use a sophisticated system of functionally referential communication that warns multiple receivers about predatory threats and thereby effectively reduces the risk of nest predation.

Fig. 2
figure 2

Japanese great tits (a) produce two types of alarm calls for different nest predators; “chicka” calls (b) for jungle crows (c) and “jar” calls (d) for Japanese rat snakes (e). Note that those two alarm calls are acoustically discrete, but both of them have graded variation in note repetition number. The “chicka” calls also vary in combinations of notes within a call

Discrete alarm calls may also be used to discriminate between predators and brood parasites. For example, male yellow warblers (Setophaga petechia) produce “seet” calls when they detect a brood-parasitic cowbird during the egg-laying period (Gill and Sealy 1996), and their mates (females) react with a unique response that prevents brood parasitism, i.e., they rush back to and sit tightly on their nests (Gill and Sealy 2004). Since other types of alarm calls do not elicit such nest protection behaviors in females (Gill and Sealy 1996, 2004), the “seet” calls are considered functionally referential signals that warn females of the presence of a cowbird. Interestingly, warblers that are allopatric with cowbirds do not vocally discriminate between cowbirds and nest predators and mix different calls into a single mobbing calling bout (Gill and Sealy 2004), suggesting the importance of learning in the usage of discrete call types. Brood-parasite-specific alarm calls have been documented for several discrete lineages of host species such as Parulidae (Gill and Sealy 1996), Maluridae (Langmore et al. 2012), and Phylloscopidae (Wheatcroft and Price 2015), but this does not hold true for all host species (Welbergen and Davies 2008). Because such specific alarm calls may enhance recognition of brood parasites by hosts through social learning (Davies and Welbergen 2009; Feeney and Langmore 2013), brood parasites might show counter-adaptation to reduce the likelihood of effective learning by the hosts. Future comparative work may provide an ideal model for investigating the evolution of functionally referential communication through the process of an evolutionary arms race.

The need for discrete reactions in receivers seems to be a major driving force of the evolution of discrete variation in alarm calls (Macedonia and Evans 1993; Brilot et al. 2012). In the case of Siberian jays, an individual decision to escape from or mob predators depends on behavior of the predators rather than type of predator (Griesser 2008, 2009). In the case of Japanese great tits, nestlings are usually within the nest cavities and therefore use two contrasting behaviors to avoid snakes and other predators; that is, they either leave the nest or remain in the nest (Suzuki 2011). The methods by which predators approach the nests are also different depending on the predator species, which might also be a selective force for the evolution of discrete alarm calls (Suzuki 2012a, 2014, 2015). In the case of yellow warblers, females may need to rush back and protect their eggs against brood parasites (Gill and Sealy 2004).

Cognitive processes underlying signal perception

To date, at least eight species of birds are known to produce more than one type of alarm call to discriminate between different risks (Table 1). They vocally discriminate between different enemy types, spatial positions, or behaviors. In all of these cases, discrete types of alarm calls elicit qualitatively different reactions in receivers (Table 1); therefore, these calls seem to refer to external objects.

Table 1 Summary of studies on acoustic variation and information content of alarm calls in birds

Although discrete types of alarm calls are considered functionally referential, almost nothing is known about the cognitive processes that underlie the production and perception of these calls. To determine whether these calls are truly analogous to context-specific human referential words, researchers should investigate whether senders form a mental image or concept and whether receivers retrieve a mental representation of the external referent on hearing the signals (Rendall et al. 2009; Wheeler and Fischer 2012; Manser 2013; Townsend and Manser 2013). However, thus far, only a few studies have been conducted to identify such cognitive processes, and all have focused on the receiver.

Zuberbühler et al. (1999) applied a prime–probe technique to test whether Diana monkeys (Cercopithecus diana) associate their functionally referential alarm calls (leopard and eagle alarms) with the vocalizations of these predators. First, Diana monkeys were habituated to either leopard or eagle alarms (prime). Second, they were exposed to a predator’s call, either leopard growls or eagle shrieks (probe). Diana monkeys showed a reduction in alarm-calling response to the probe when the prime and probe stimuli were associated with the same predator type. A similar pattern was also observed when monkeys received predator calls as a prime stimulus and alarm calls as a probe. These results indicate that Diana monkeys did not habituate to the acoustic characteristics of stimuli but to the information content of the stimuli. Similarly, using a habituation–dishabituation technique, Cheney and Seyfarth (1988) showed that vervet monkeys also extract referential information from alarm calls and do not simply respond to their acoustic structure (Seyfarth and Cheney 1990).

There has been a lack of studies on the mental processes that underlie alarm call recognition in birds. However, one study tested whether food calls of male chickens in captivity evoke selective retrieval of foraging opportunities in hens (Evans and Evans 2007). Applying a prime–probe technique, the researchers manipulated prior experience of hens with the presence of food items. Food calls induced food-searching behavior, only if hens had not recently discovered and consumed food. However, this method does not preclude the explanation that hens may adjust their responses to food calls based on their hunger level, and not necessarily according to prior information about the presence of food.

Although these studies did not clearly demonstrate that animals retrieve mental imagery of external entities from alarm or food calls (Adams and Beighley 2013), they indicate that receivers could associate a certain call type (alarm or food call) with a related stimulus (predator call or food) to produce an appropriate response. Additional experiments using other techniques are necessary to clarify whether referential calls truly evoke mental representations of external entities. One candidate for this is the cross-modal matching technique, in which subjects are tested in their response to the sequential presentations of two stimuli through different modalities. This method has been commonly used in the study of cross-modal matching of individual recognition. For example, subjects were first presented with a visual stimulus from a given individual and then exposed to vocalizations of the same or a different individual (Proops et al. 2009; Kondo et al. 2012; Kulahci et al. 2014). A similar approach can be used to study semantic communication by testing the response of individuals to predator-specific alarm calls after exposure to visual cues associated with different predators.

To examine the process of representational ideation by senders, it is crucial to investigate how individuals classify a novel threat into existing threat categories. Several studies showed that birds are able to socially learn to associate a novel animal with a particular type of threat (Davies and Welbergen 2009; Feeney and Langmore 2013). It may be interesting to test whether birds can learn to produce different call types for novel objects, and if these associations can change based on previous experience. Even in a natural setting, learning opportunity can differ according to social environments (Griesser and Suzuki 2016), and this may lead to the differential formation of mental imagery or concepts.

Graded variation in alarm calls

Repetition rate

Birds may alter the repetition rate of a single type of call or note to convey different types of information. For example, many species of chickadees produce “chick-a-dee” mobbing calls in response to a variety of predators. Templeton et al. (2005) revealed that black-capped chickadees (Poecile atricapillus) change the repetition rate of D (or “dee”) notes within their “chick-a-dee” calls to convey different types of information to flock mates. These chickadees produce a greater number of “chick-a-dee” calls that contain more D notes when mobbing smaller predators and produce fewer calls with fewer D notes for larger predators. Moreover, chickadees showed more intense responses to playback that contained more D notes and more calls. Because smaller predators are more maneuverable than larger predators, chickadees may adjust their intensity of mobbing based on the degree of danger posed by the predator. Alteration of the repetition rate is also observed when birds warn conspecifics about aerial predators to elicit escape behaviors. For example, white-browed scrubwrens (Sericornis frontalis) alter the number of alarm notes based on their distance from aerial predators; closer predators cause a greater number of elements that elicit increased fleeing responses in conspecifics (Leavesley and Magrath 2005). In all of the cases reported in both mobbing and fleeing contexts (Table 1), high-urgency alarm calls have a higher repetition rate, whereas low-urgency calls have a lower repetition rate. However, whether this is a general rule for animal communication remains unclear, because a reverse relationship was found in alarm calls of a primate species (Murphy et al. 2013).

Other variations

Some species of birds also show graded variation in other acoustic parameters of alarm calls. Birds are known to alter call length, inter-call intervals, and amplitude according to the size, speed, and distance of predators (Table 1). The call length and inter-call intervals are shorter for high- than for low-urgency alarm calls. Similarly, high-urgency alarm calls tend to have higher amplitude than low-urgency calls. Interestingly, this rule seems to be widespread in a variety of avian taxa, including Paridae (Templeton et al. 2005), Phasianidae (Wilson and Evans 2012), and Corvidae (Ellis 2008; Yorzinski and Vehrencamp 2009).

Functional reference in graded alarm calls

Although graded variation in alarm calls has been described in a number of avian species (Table 1), graded alarm calls have never been interpreted as functionally referential. In most cases, graded variation shows a low degree of specificity and is used in many different contexts. Therefore, graded alarm calls may not meet the first criterion of functional reference outlined by Macedonia and Evans (1993), production specificity, which provides the basis for the receivers to expect the external referent. In addition, receivers typically respond to graded variation by adjusting the degree of their response, but rarely by qualitatively different reactions (Table 1). Thus, the second criterion of functional reference (Macedonia and Evans 1993), perception specificity, may not be met. It is likely therefore that graded variation can be used to refer to urgency in the response or change in the arousal level of the sender rather than its external world (Marler et al. 1992; Hauser 1996).

If this is true, how should researchers interpret the information content of graded signals? It is possible that graded alarm calls simply reflect response urgency or the internal state of the senders (i.e., a change in arousal level), but they may also provide information about the external referent (although the reliability of this remains low). If birds alter the repetition rate of alarm notes according to the predator’s distance, do the receivers derive information about the arousal level of the sender or the predator’s distance from the sender? To test these two hypotheses, researchers should measure the specificity of signal production. If birds can make similar alterations to alarm calls in more than one context, the signals should be considered purely to convey information about the degree of danger or urgency level as perceived by the sender. However, if they are specific for many different attributes of predators (e.g., predator distance and size), the calls might have the potential to encode information about external entities (Scarantino and Clay 2015).

Nevertheless, there are some critical constraints for graded signals to be functionally referential. For variations in the repetition rate, receivers are required to discriminate between different numbers of notes or calls. However, previous studies have shown that many animals are limited in their ability to quantify the number of objects, and can only quantify fewer than four objects (e.g., Hunt et al. 2008; White et al. 2009). In the case of “chick-a-dee” mobbing calls, playback experiments suggested that receivers simply respond to the rate of D notes produced in a given time (i.e., duty cycle) rather than the number of notes within a call (Wilson and Mennill 2011). Similarly, playback experiments suggested that birds may not be able to discriminate the subtle variations in call length and amplitude (Randler and Förschler 2011). Adoption of prime–probe, habituation–dishabituation, violation of expectation procedures, or cross-modal matching procedures would help to determine whether graded signals could be used to convey information about external referents to receivers.

Signal combinations

Examples from non-human primates

Human language is based on two types of syntax, one that combines meaningless elements (e.g., phonemes) into a meaningful sound (phonology), and the other that combines meaningful units (e.g., words) into more complex expressions (syntax) (Chomsky 1965; Hauser et al. 2002, 2007; Hurford 2007, 2011). It has long been assumed that phonology and syntax are unique features of human language (Chomsky 1965; Hauser et al. 2002, 2007); however, recent studies suggest that non-human primates and other mammals may also have the ability to combine different signals to provide different types of information (Collier et al. 2014). For example, putty-nosed monkeys (Cercopithecus nictitans) produce the acoustically discrete loud calls “pyow” and “hack” in a range of contexts, but combine the two calls when instigating group movements (Arnold and Zuberbühler 2006a, 2008). Similarly, some other Old World monkeys (Cercopithecinae) (Ouattara et al. 2009; Candiotti et al. 2012), New World monkeys (Pitheciidae) (Cäsar et al. 2013), and apes (Hominidae) (Crockford and Boesch 2005; Clay and Zuberbühler 2009, 2011) can combine different types of calls into a variety of sequences and use these combinations in different environmental contexts, such as the discovery of food sources or encounters with predators.

Examples from birds

Combinations of different call or note types have also been documented for communication in birds (Lucas and Freeberg 2007; Wheatcroft and Price 2013; Suzuki 2014). Remarkably, birds within the family Paridae are known to produce complex combinatorial calls. Parids (chickadees and tits) produce structurally complex vocalizations (“chicka” or “chick-a-dee” calls) that are composed of different types of notes (e.g., A, B, C, and D notes) (Hailman et al. 1985; Lucas and Freeberg 2007; Freeberg and Lucas 2012). They use these calls in a range of contexts such as finding food sources (Mahurin and Freeberg 2009; Suzuki 2012b), approaching and mobbing a predator (Templeton et al. 2005; Soard and Ritchison 2009; Courter and Ritchison 2010; Suzuki 2014), and maintaining social cohesion with conspecifics (Nowicki 1983).

Previous studies revealed that parids may use different combinations of notes to convey different types of information. In Japanese great tits, adults produce “chicka” calls when mobbing predators near their nests, but the combinations of notes within the calls differ between different predator types (Suzuki 2014); they produce AKG, AD, A, and D combinations for crows, whereas D, AC, A, and ACE combinations for martens. In the case of Carolina chickadees (Poecile carolinensis), note combinations might vary depending on both attributes of the predator and the behavior of the caller (Freeberg 2008). Despite these observations, no known studies have investigated whether receivers can recognize combinations of different types of notes as a source of predator information.

Because of a lack of playback experiments (Table 1), it is not yet clear whether combinatorial variation in alarm calls provides functionally referential information to other birds. However, several previous playback studies suggest that parids may be able to derive particular types of information from different note combinations. Carolina chickadees respond differently to playbacks of different note combinations in “chicka” calls (Freeberg and Lucas 2002). Similarly, willow tits (Poecile montanus) alter their note combinations of contact calls based on the presence or absence of food (Suzuki 2012b); the playback of food-associated contact calls facilitates the formation of mixed-species foraging flocks, unlike the contact calls recorded in a non-food context (Suzuki 2012c). Since both food and non-food calls have been recorded from solitary individuals at times when enhanced social cohesion was needed, the receivers might be able to derive information about the presence of food through the recognition of note combinations.

Aside from communication about the environment (i.e., predator or food), birds also use combinatorial signals to communicate with conspecifics. For example, songs of many passerine birds are composed of multiple syllables that apparently do not have independent meanings (Catchpole and Slater 2003). In general, songs have the dual function of mate attraction and territory announcement and syllable composition may signal male quality (Catchpole and Slater 2003). In another example, chestnut-crowned babblers (Pomatostomus ruficeps) combine two types of notes (A and B) into two sequences (AB or BAB) and use these sequences in different contexts (Engesser et al. 2015); they produce AB calls during flight, whereas BAB calls are used in the context of nestling provisioning. Experiments with captive babblers showed that playbacks of AB calls made them look out of the aviary, whereas those of BAB calls made them look at the nest within the aviary.

Functional reference in combinatorial signals

Do animals use combinations of signals to refer to external objects? In humans, phonology is a pivotal feature that generates words composed of meaningless sounds, which refer to external objects. This feature may also be found in non-human primates. White-handed gibbons (Hylobates lar) produce songs that are composed of multiple sound elements and use these sounds in two distinct contexts (Clarke et al. 2006). One context includes repelling conspecific intruders, advertising pair bonds, and attracting mates; the other context is when encountering a predator. The gibbons alter composition of song notes according to these two contexts, apparently eliciting different responses in the group members (Clarke et al. 2006). Thus, predator-induced song sequences could be considered functionally referential. However, thus far, phonology in birds has never been shown to refer to specific external entities.

In humans, syntax plays an important role in referential communication. For example, we can connect an adjective and a noun to provide detailed information about the external referent (the noun). Although the evidence for syntax remains ambiguous in birds and mammals, it might be possible that syntax serves to modify information of referential signals. Support for this idea comes from a report on communication in the Campbell’s monkey (Cercopithecus campbelli), which produces discrete alarm calls for leopards and eagles (Ouattara et al. 2009). These monkeys can modify alarm calls by adding “-oo” at the end, thereby transforming either a leopard alarm call to a general disturbance call or an eagle alarm to an arboreal alert call. However, it is still unclear whether birds use similar syntactic modifications in their communication (Collier et al. 2014), as there is no evidence that they combine discrete functional signals in response to specific external entities. Further studies are required to reveal whether such sophistication in communication has a unique evolutionary origin in the primate lineage.

Conclusions and future directions

In this review, I explored the evidence for semantic communication in wild birds. Field research over the past two decades has revealed a high degree of communicative sophistication in birds, which may rival the complexity of communication in non-human primates. First, like several non-human primates, several species of birds appear to communicate about external entities (e.g., predator type) using discrete variations in alarm calls. Second, birds can modify a single type of alarm call by altering the repetition rate of the vocal production or by subtly modifying the acoustic structure. Such graded variation is also present in primate communication (Fischer et al. 2001; Keenan et al. 2013). In some bird species, graded variation is incorporated with discrete variation (Wilson and Evans 2012; Suzuki 2014), which broadens the variety of information that individuals can transmit or provides information about the internal states of the senders (Marler et al. 1992; Manser 2001, 2009). Finally, like several Cercopithecus monkeys (Arnold and Zuberbühler 2006b), several species of birds can combine different sounds into more highly structured sequences in a context-dependent manner. Birds face a variety of enemies such as predators (Caro 2005), nest predators (Martin 1993), and brood parasites (Rothstein 1990). In addition, many birds live in a complex social system that involves both cooperators and competitors from both conspecifics and heterospecifics, which might lead to complexity in their vocal systems (Krams et al. 2012). Therefore, studying alarm-calling systems of wild birds would provide an ideal opportunity to investigate ways in which socio-ecological factors drive the evolution of sophisticated communication systems and their underlying cognitive processes.

It is worth mentioning that food calls may provide another model system to examine the evidence for semantic communication in birds. Similar to the studies of alarm-calling systems, researchers are able to test the information content of food calls by controlling the nature of external entities (e.g., food type and amount) and the internal states of the senders (e.g., hunger level). Moreover, compared with alarm-calling systems, it might be easier to observe the behavior of individuals and the audience around the senders as they may stay in a foraging patch for a while. Despite these advantages, food calls have received much less attention; there are only a few studies on wild birds (house sparrows, Elgar 1986; common ravens, Heinrich and Marzluff 1991; Carolina chickadees, Mahurin and Freeberg 2009; and willow tits, Suzuki 2012b, c). Studies on alarm and food calls may complement each other, which might help to determine the general features of semantic communication in animals.

One interesting question is how birds acquire the ability to derive information from the variation in alarm calls (Hollén and Radford 2009). Like vervet monkeys (Seyfarth and Cheney 1986) and meerkats (Hollén and Manser 2006), birds may be able to learn to associate a particular call type with a particular environmental event through social interactions. This idea is supported by the fact that many birds can eavesdrop on the alarm calls produced by heterospecifics (Templeton and Greene 2007; Suzuki 2016) and learning may help establish adaptive responses to these calls (Magrath and Bennett 2012; Magrath et al. 2015). However, the cognitive capacity for discrimination of call types could be inherited, because some birds seem to have an innate ability to respond differently to different vocalizations. For example, nestlings of the Japanese great tit can discriminate between different types of alarm calls, although they may not be able to learn to associate call types with predator types within their nest cavities (Suzuki 2011). Such differences in the process of deriving information may be strongly correlated with ecological factors such as opportunities for social learning (Hollén and Radford 2009) and rapid changes in the risks posed by enemies (Davies and Welbergen 2009). Further studies are required to reveal the ecological factors that drive the developmental mechanisms of alarm call response, which would provide new insight into our understanding of the ontogeny and evolution of semantic communication.

One of the most important frontiers in studies of animal communication entails understanding the cognitive processes that underlie the production and perception of referential signals. Although several species of birds have evolved functionally referential alarm calls, it is still not yet clear whether these signals, like human language, are produced through the formation of mental images or concepts and whether they evoke mental representations of external entities in the receivers (Rendall et al. 2009; Wheeler and Fischer 2012). It is also unclear whether graded and combinatorial variations in signals can refer to external entities. Future studies, especially those from the perspective of cognitive sciences, are required to have a better understanding of the similarities and differences between human language and animal communication signals. In addition, it would be worthwhile to apply cross-modal matching experiments to determine the cognitive processes involved (Proops et al. 2009; Kondo et al. 2012; Kulahci et al. 2014). We are presently at a transition point in animal communication research from the functionally referential framework to the cognitive framework (Fig. 1). Based on previous fruitful research on the information content of animal signals, we can move onto the next step of exploring socio-ecological factors that drive the evolution of cognitive sophistication that mediates communication. I would also like to encourage further naturalistic observation of animal communication as well as experimental research, and I hope that this review will help uncover the evolution of semantic communication.