Vocal exchanges between conspecifics are important to coordinate social interactions between group members (intragroup vocal exchanges) or groups (intergroup vocal exchanges). They are very flexible and time-efficient, function over long distances, and do not require the physical proximity of the caller. Whereas intergroup vocal exchanges mainly function to signal territoriality (Kitchen, 2006; Ramanankirahina et al., 2016), intragroup vocal exchanges can signal information about the social relationship between the individuals involved (Arlet et al., 2015; Kulahci et al., 2015; Levrero et al., 2019; Wittig et al., 2007). For example, in pair- and group-living primates, vocal exchanges function to signal the social relationship between group members (e.g., pair-bonding: Mendez-Cardenas & Zimmermann, 2009, mother-infant: Scheumann et al., 2017, affiliative relationships: Arlet et al., 2015; Kulahci et al., 2015; Levrero et al., 2019, dominance relationships: Wittig et al., 2007). Thereby, exchanges of contact calls occur more frequently between group members who groom each other more frequently, indicating strong affiliative bonds between group members (e.g., lemurs: Kulahci et al., 2015, Japanese macaques: Arlet et al., 2015, bonobos: Levrero et al., 2019). Vocal exchanges also can occur between opponents in agonistic interactions, signaling the social roles of the conflict partners (Mercier et al., 2019; Slocombe & Zuberbühler, 2005) and attracting potential third-party helpers to intervene in dyadic conflicts (Slocombe et al., 2009; Slocombe et al., 2010).

In a few primate species, third-party interventions are accompanied by vocalizations in response to aggressive calls by the opponents (Roeder et al., 2002; Wittig et al., 2007). Vocal third-party intervention is defined as a triadic social interaction where the third party intervenes in a conflict by solely producing vocal displays. Vocal third-party interventions can be associated with three different conflict intervention types: (1) Aggressive intervention, i.e., the intervener supports either the aggressor or the aggressee with aggressive displays; (2) neutral interventions, i.e., the intervener interposes itself between the opponents without specific behavioral displays; or (3) peaceful interventions, i.e., the intervener displays affiliative behaviors (e.g., grooming) toward the aggressor to end the conflict. In female chacma baboons (Papio hamadryas ursinus), vocal interventions supporting kin-group members were more common than active aggressive interventions, suggesting an influence of social bonds (Wittig et al., 2007). The authors argued that vocal signals may have evolved as efficient ritualized signals to replace physical punishment and thereby reduce costs and the risk of life-threatening injuries. However, vocal intervention also are observed in neutral third-party interventions. For example, in brown lemurs (Eulemur fulvus), the intervener uttered a series of grunt vocalizations, also observed in the greeting context, while interposing itself between the opponents (Roeder et al., 2002). Moreover, in red-fronted lemurs (Eulemur rufifrons), approaches by other group members accompanied by grunts resulted in more affiliative interactions (e.g., grooming, huddle) than approaches without grunts (Pflüger & Fichtel, 2012). Thus, in neutral and peaceful interventions, vocal signals might reduce group tension and function as an appeasement signal. This may be a behavioral strategy for intervening in conflicts within a group while minimizing costs of aggression (Petit & Thierry, 1994; Roeder et al., 2002; Wittig et al., 2007). Such vocal third-party interventions are rarely reported in primates. However, in proboscis monkeys (Nasalis larvatus), it was anecdotally described that the loud call of the adult male of a one-male/multifemale group, the bray call, seems to occur more often after agonistic calls, termed shrieks, uttered during intragroup conflicts (Kawabe & Mano, 1972; Kern, 1964). Based on these observations, Kawabe and Mano (1972) and Kern (1964) hypothesized that the bray of the adult male of a one-male/multifemale group might function “quieting the troop confusion” (Kawabe & Mano, 1972, p. 220), suggesting that the bray is a promising candidate for a vocal third-party intervention signal in proboscis monkeys.

The proboscis monkey is an endemic and endangered primate restricted to mangroves, riverine forests, and peat swamps on the island of Borneo (Feilen & Marshall, 2014). It is well known for the sexual dimorphism of its nose (Koda et al., 2018). In contrast to the females, males have a large, elongated nose. Proboscis monkeys live in a multilevel society where the smallest units are one-male/multifemale groups and all-male groups (Bennett & Sebastian, 1988; Matsuda et al., 2012a, b). One-male/multifemale groups consist of an adult male with adult females and their offspring, whereas all-male groups consist only of males. Our knowledge of vocal communication in proboscis monkeys is limited. To date, five call types have been described acoustically (Röper et al., 2014): (1) Bray—a low-frequency vocalization given by adult males; (2) Shriek—a single or series of high-pitched vocalizations mainly given by offspring and females in aggressive or alarm contexts (Srivathasan & Meier, 2011); (3) Honk—a short, low-frequency call, which often is uttered in a series; (4) Roar—a short low-frequency call; and (5) Chorus—consisting of a mixture of shrieks, honks, and brays uttered by several group members at the same time. As mentioned above, Kawabe and Mano (1972) and Kern (1964) anecdotally described that brays occur more often after shrieks, suggesting that the bray might function to “quieting confusion of the troop members” (Kawabe & Mano, 1972, p. 218), but this was not empirically tested. To substantiate this anecdotal observation and to investigate the hypothesis that the bray vocalization functions as a nonagonistic vocal display to intervene in intragroup conflicts, we empirically studied the vocal responses of adult males in one-male/multifemale groups (bray) to agonistic vocalizations by group members (shrieks).

We conducted two studies. In the first study, we recorded the vocalizations of free-ranging proboscis monkeys at the Lower Kinabatangan Wildlife Sanctuary (LKWS) in Sabah, Malaysian Borneo. Using field data has the advantage of observing the animals in their natural habitats with limited human influence on their behavior. We used the LKWS data set to test whether brays, which were mainly uttered by adult males of one-male/multifemale groups, function as vocal third-party intervention signals to reduce the troop tension. We tested the following predictions: (1) Brays, but no other call types, occur significantly more often after agonistic calls (shrieks) than expected based on overall call occurrence; and (2) Brays terminate vocal responses toward shrieks significantly more often than other call types do. Because it was not possible to assign vocalizations reliably to a sender and behavioral context in the field, we performed a second study at Labuk Bay Proboscis Monkey Sanctuary (LBPMS). The animals were fed on artificial feeding platforms, which allowed close video and audio recordings and the identification of groups and individual monkeys. Using the LBPMS data set, we investigated which kind of conflict dyads evoked the most vocal support by adult males of one-male/multifemale groups.


Study sites, study subjects, and data collection

Lower Kinabatangan Wildlife Sanctuary

We observed 17 free-living proboscis monkey groups at the Lower Kinabatangan Wildlife Sanctuary (LKWS; Sabah, Malaysian Borneo) close to Danau Girang Field Centre (N5.41646, E118.03441) from August to November 2010. Proboscis monkeys lived along Sabah’s longest river, the Kinabatangan River, where on average one group per kilometer can be observed (Matsuda et al., 2020). We conducted observations from 7.7-km downstream (5.4-km linear distance) to 5.9-km upstream (3.4-km linear distance) from the field center.

Because the proboscis monkeys live in the high canopy of mangrove forests, we performed observations from an outboard motorboat along the Kinabatangan River. We conducted observations in the evening (from 16:45 to 19:40) and in the morning (from 05:00 to 07:30). In the evening, the observers searched for a group occupying a sleeping tree, audio-recorded the group until all monkeys fell asleep using all-occurrence sampling (Altmann, 1974) and noted the GPS data of the sleeping tree. In the morning, the observers came back to the sleeping tree and audio-recorded the same group until they disappeared into the forest. We could not identify individuals due to the distance and the dense canopy, so we cannot rule out the possibility that we resampled groups. However, we assumed that each paired observation session involved a different proboscis monkey group based on the different locations and differences in group composition. Thus, we treated each paired observation as an independent sample group following Röper et al. (2014).

Labuk Bay Proboscis Monkey Sanctuary

Labuk Bay Proboscis Monkey Sanctuary (LBPMS) was founded in 1994 in a mangrove forest close to the city of Sandakan (Samawang village, Sabah, Malaysia) where proboscis monkeys occur naturally. LBPMS has been open to tourists since 2001, and the animals are habituated to humans allowing close video and audio recordings during feeding sessions. Animals were fed four times per day on two platforms (A and B), where they could be best observed and which they visited regularly.

We observed two all-male groups and three one-male/multifemale groups. Local guides reliably identified the adult male after which each group was named. We observed the one-male/multifemale group Sasokih (30-35 animals) and the all-male group Putut (20 animals) on platform A. We observed the one-male/multifemale groups Leo Messi (32 animals) and Romano (18-22 animals) and the all-male group Canon (18-20 animals) at platform B.

We made observations from September to October 2012 using all-occurrence group sampling (Altmann, 1974). We recorded five times at 09:30 at platform A and six times at 16:30 at platform B, resulting in 241 min of audio and video data. The observer spoke the following information on the video recordings: name of the group identified by the local guides, information on the identity of the sender, and information on heterospecifics (e.g., humans, cats, macaques, tree shrews), which approached the group.

At both locations, we recorded vocalizations using a Sennheiser microphone (MKE6, directional microphone, frequency range: 40-20,000 Hz) equipped with a windshield (Rycote Kit 295) linked to a Marantz solid state recorder (PMD 661; sampling frequency: 44.1 kHz). In LBPMS, we made video recordings using a digital camcorder (Sony DCR-SR210E).

Data analyses

Audio data from the Lower Kinabatangan Wildlife Sanctuary

Using the LKWS data set, we scanned 30 h of audio recordings using Audacity software (Carnegie Mellon University, Pittsburgh, PA, USA, We based our analysis on the sequential occurrence of call types to empirically test the anecdotal observations that brays occurred more often after shrieks uttered during intragroup conflicts (Kawabe & Mano, 1972; Kern, 1964). Based on spectrographic displays, we categorized calls following Röper et al. (2014) into three call type classes: bray, shriek, and other (e.g., honk and roar). We noted the time when the call occurred for each shriek and calculated the intercall–interval to the following call. We defined a latency of 30 s as the response interval (Supplementary information S1). Based on these, we defined a call combination as a shriek followed by another call within 30 s and noted the call type of the following call. Thus, we defined three types of call combinations: shriek-bray, shriek-shriek, and shriek-other (Fig. 1a). For each call combination, we also counted whether a further call followed. If we recorded no third call within 30 s of the last call, we counted the sequence as terminated (Fig. 1b).

Fig. 1
figure 1

Schematic description of analyses. a Predictions for three different types of call combinations; b Definition of termination of a call sequence. ICI, intercall–interval, OCP, overall call occurrence.

Video data from Labuk Bay Proboscis Monkey Sanctuary

Using the LBPMS data, we analyzed 241min video data recorded during feeding sessions using VLC player. Focusing on shrieks and brays, we noted the sender and the context of each call. Because we could not identify all group members individually (except the dominant males), we classified senders according to their sex and body size into adult males, adult females, immatures, and infants. We identified adult males based on their elongated nose and external genitals. They were either larger than or similar in size to adult females. Adult females were either smaller than or of similar size to adult males, had a smaller nose, and elongated nipples. Immatures were smaller than adults but larger than infants with reddish-brown fur. Infants were smaller than immatures with blackish fur. Furthermore, we noted the context in which the sender uttered the shriek using the following context categories: 1) response to shrieks—the sender uttered a call after a shriek; 2) response to group conflicts without shrieks— the sender uttered a call during an intragroup conflict where no shrieks were uttered; 3) response to other vocalizations—the sender uttered a call after a honk, roar, or chorus calls; 4) response to approaching group member—the sender uttered a call while another group member was approaching; 5) response to heterospecifics—the sender uttered a call while a heterospecific (e.g., human, macaque, cat) was approaching; 6) resting and feeding—the sender uttered a call during resting and feeding activities with none of the other context categories.

We further analyzed intragroup conflicts accompanied by vocalizations. We defined intragroup conflicts as aggressive interactions between two group members. These included either physical conflicts or threatening gestures of the aggressor to the aggressee from closer than two body lengths (open mouth display: Matsuda et al., 2008; Yeager, 1992). We defined the aggressor as the individual who threatened, attacked, or chased the aggressee. We defined the aggressee as the animal that ran away or avoided the aggressor. For all intragroup conflicts, we noted the aggressor and the aggressee and whether the aggressor or the aggressee uttered vocalizations during the conflict. We only considered conflicts for further analysis when one or more animals vocalized during the conflict. For these conflicts, we further noted whether a bray from the dominant male followed the conflict.

Statistical analysis

We calculated statistics using SPSS (IBM Corp. Released 2013. IBM SPSS Statistics for Windows, Version 22.0. Armonk, NY: IBM Corp.). We set the significance level at p ≤ 0.05.

Lower Kinabatangan Wildlife Sanctuary data set

First, we calculated the percentage of overall call occurrence per call type by dividing the number of calls of each call type (e.g., bray) by the total number of calls for each group (N = 17 groups). Then we calculated the mean across groups. We used this mean overall call occurrence as the chance level for further analyses (Supplementary information S2).

To investigate whether brays occurred significantly more often after shrieks than expected based on the overall call occurrence (Fig. 1a), we calculated the percentage of call combinations. For each call combination and group, we divided the number of each call combination (e.g., shriek-bray) by the total number of call combinations. Then, for each call combination, we tested whether the percentage of call combinations differed from the level of overall call occurrence using a one-sample t-test (Fig. 1a).

We calculated the percentage of terminations to investigate whether brays in response to shrieks calm the group (Fig. 1b). First, we divided the number of terminations by the total number of call combinations for each call combination and each group. For example, we recorded nine shriek-bray combinations for Group 2. For one shriek-bray combination, we recorded a further shriek, whereas for the remaining eight shriek-bray combinations, we recorded no further shrieks, suggesting that the sequence was terminated. To calculate the percentage of terminations, we divided the eight terminations by the total number of shriek-bray combinations (n = 9) multiplied by 100, giving 89% terminations. We compared the percentage of terminations between shriek–bray, shriek–shriek, and shriek–other combinations using the Friedman test and the Wilcoxon signed-rank test for pairwise comparisons. To control for multiple testing, we applied a Bonferroni correction correcting the alpha level (α = 0.05) by the number of pairwise comparisons. The corrected p-value (pcorr) was calculated according to the formula pcorr = p-value * number of pairwise comparisons.

Labuk Bay Proboscis Monkey Sanctuary data set

First, we calculated the call rate per group for brays and shrieks by dividing the number of calls by each group’s observation time. Then, we calculated the mean for one-male/multifemale and all-male groups. To investigate the sender and context of shrieks and brays, we calculated the percentage of calls that were uttered for each sender and context category per call type. Thus, we divided for each call type the number of calls for each sender/context category by the total number of calls, and multiplied by 100. Because members of all-male groups rarely uttered shrieks, we excluded both all-male groups from further analyses. For the one-male/multifemale groups, we compared the number of shrieks to which the male responded between adult females, infants, and immatures using the Chi-square test. Additionally, we compared the number of conflicts that were followed by brays between female–female conflicts and offspring–offspring conflicts (immature–immature, immature–infant) using the Chi-square test.

Ethical note

The research was approved by the Economic Planning Unit Malaysia and the Sabah Wildlife Department and complied with Malaysia laws on foreign research. The authors declare that they have no conflict of interest.

Data availability

Audio and video files analyzed during the current study are available from the corresponding author on reasonable request. All raw data used in this manuscript are in the electronic supplementary material.


Vocal responses to vocal exchanges within the groups at LKWS

We recorded 1,811 calls from the 17 groups. A mean of 24% of these calls were brays (SD = 12%, N = 459 calls), 23% were shrieks (SD = 10%, N = 468 calls), and 54% were other call types (SD = 16%, N = 884 calls; Supplementary information S2). Of the 387 shrieks that received a response, 228 (N = 14 groups) were followed by a bray, 94 (N = 12 groups) by a shriek, and 65 (N = 15 groups) by other call types. The occurrence of a bray following a shriek (shriek–bray associations) was significantly higher (mean = 55%; SD = 35%; Fig. 2) than expected based on the overall occurrence of brays (one-sample t-test: t = 3.62, df = 16, p = 0.002; Fig. 3a). In contrast, the occurrence of a shriek following a shriek (shriek–shriek associations) was not different to the overall occurrence of shrieks (mean = 17%; SD = 18%; t = −1.28, df = 16, p = 0.219), and the occurrence of shriek-other associations was significantly lower than the overall occurrence of other vocalizations (mean = 28%, SD = 28%; t = −3.75, df = 16, p = 0.002; Fig. 3a, Supplementary information S3).

Fig. 2
figure 2

Sonograms of a call combination of shrieks followed by a bray call by the adult male of a one-male/multifemale group (photos: Elke Zimmermann); the broad lines at 3.5 kHz and 6.5-7 kHz are insect sounds; black lines mark the duration of the shrieks and bray.

Fig. 3
figure 3

Mean and standard deviation of a percentage of call combinations and b percentage of terminations after call combinations for 17 proboscis monkey groups recorded at the Lower Kinabatangan Wildlife Sanctuary (Sabah, Malaysian Borneo) from August to November 2010.

Terminations occurred significantly more often after shriek-bray combinations (mean = 65%, SD = 25%) than after shriek–shriek (mean = 25%, SD = 39%) and shriek–other combinations (mean = 14%, SD = 19%; Friedman test: χ2 = 11.53, df = 2, N = 9, p = 0.003; Wilcoxon sign-rank test: shriek–bray versus shriek–shriek: T = 0, N = n = 10, p = 0.005, pcorr = 0.015; shriek-bray versus shriek-other: T = 1, N = n = 12, p = 0.003, pcorr = 0.009, Fig. 3b). In contrast, we found no significant differences in the percentage of terminations between shriek–shriek and shriek–other associations (T = 16, N = 11, n = 9, p = 0.441; Fig. 3b; Supplementary information S4).

Function and context of shriek and bray vocalizations at LBPMS

We very rarely observed shrieks in all-male groups (mean = 2.2 calls/hour, SD = 3.1 calls/hour) in contrast to one-male/multifemale groups, where they were much more frequent (mean = 30.7 calls/hour, SD = 17.1 calls/hour; Supplementary information S5). In line with this finding, shrieks were mainly uttered by immatures (49%), infants (30%), and adult females (12%) during agonistic interactions between group members or in response to heterospecific animals (e.g., macaques). In contrast, most brays were made by the adult males of one-male/multifemale groups (83%), whereas in only few cases brays were uttered by other males (3%). The remaining 14% of brays were uttered by adult females, which differed in their acoustic structure and could easily be discriminated acoustically from male vocalizations. In contrast to males, females uttered this vocalization together as a chorus and the vocalizations were of lower amplitude and higher in frequency. The dominant male responded with brays significantly more often after shrieks by adult females (71%) than after shrieks by infants (21%) and immatures (20%; Chi-square test: χ2 = 16.03, df = 2, p < 0.001; pairwise comparisons: female vs. immatures: χ2 = 13.94, df = 1, p < 0.001, pcorr < 0.001; female vs. infant: χ2 = 11.21, df = 1, p = 0.001, pcorr = 0.003; Supplementary information S6).

Of 63 observed intragroup conflicts, most occurred between adult females (N = 14) or offspring (immatures and infants; N = 20, Supplementary information S7). Brays emitted by the dominant male occurred significantly more often after female–female conflicts (N = 6) than after offspring–offspring conflicts (N = 2; Chi-square test: χ2 = 4.94, df = 1, p = 0.026). We never observed the adult male punishing one of the opponents physically.


Brays occurred significantly more often after agonistic shrieks than expected from their overall occurrence at LKWS. Furthermore, vocal conflicts were terminated significantly more often after shriek–bray than after shriek–shriek or shriek–other call combinations. Thus, the two predictions concerning the vocal intervention hypothesis were supported, suggesting that the bray functions as a third-party vocal intervention signal. Moreover, our results from LBPMS showed that vocal interventions were mainly directed at female–female conflicts, although infants and immatures uttered the majority of shrieks.

We observed naturally occurring vocal conflicts in LKWS, with little human influence, but could not identify the sender of these vocalizations due to the dense canopy. In LBPMS, where sender identification was possible, we found that all shrieks were uttered by infants, immatures, and/or females during conflict situations. Moreover, we recorded no shrieks in all-male groups where no females and infants were present. Most brays were attributed to the dominant adult male (83%) and only in few cases to other adult males (3%). Our observations at LBPMS concord with reports that infants, immatures, and adult females uttered mainly shrieks, whereas the adult males of one-male/multifemale groups uttered brays in both captive and free-living proboscis monkeys (Kawabe & Mano, 1972; Kern, 1964; Röper et al., 2014; Srivathasan & Meier, 2011). Thus, we can assume that the brays in LKWS are also mainly uttered by the adult male, whereas infants, juveniles, and females mainly utter shrieks. Interestingly, in a few cases we also recorded bray-like vocalizations from females at LBPMS (14% of all bray vocalizations). However, we did not record these kinds of vocalizations at LKWS. This could be explained either by the lower amplitude of female brays or the lower call rate than male brays.

Vocal intervention displays may be a useful tool for the adult male of an one-male/multifemale group to intervene in conflicts. Proboscis monkeys can live in a dense forest environment, sleeping and moving in the high canopies (Feilen & Marshall, 2014). Most observed intragroup conflicts among adult females and immatures are displacements at sleeping sites (Matsuda et al., 2012a, b). In such a situation, vegetation often constrains visual contact among group members, including the adult male. Because the adult male is much heavier than other group members, he sits on thicker branches than the lighter females and offspring. Under these conditions, it often is not possible for the male to intervene physically in a conflict, and a vocal intervention display to signal the presence of the adult male may be beneficial.

We could not systematically test whether the bray function as a ritualized vocal display of physical intervention or as an appeasement signal. However, we observed no physical punishment by adult males of one-male/multifemale groups during conflicts among group members at LBPMS, although the artificial setting (visual and spatial access to conflict partners due to stable platforms) would allow males to intervene physically in a conflict easily. Moreover, in most cases, the dominant male uttered a bray but showed no other reaction to the conflict. The females also showed no obvious reaction to the male (e.g., looking) but they ended the conflict. Thus, we suggest that it is unlikely that the bray is a ritualized signal of physical punishment. We cannot rule out the possibility that punishment happens, but we did not observe it during our short observation time. However, agonistic behaviors occur infrequently in proboscis monkeys (Matsuda et al., 2012a, b; Yeager, 1992), and vocal displays were observed instead of physical attacks (Murai et al., 2007). In one case, an adult male of an one-male/multifemale group called its adult female back when she tried to shift the group, and herding sounds were observed from the adult male when she returned, with no physical attack (Murai et al., 2007). No conflicts among males were observed when females joined all-male groups, even when a female copulated with a group member (Murai, 2004). In our study, brays were produced in response to group conflicts and in response to approaching heterospecific species. Adults and immatures uttered shrieks in response to approaching humans or to macaques or domestic cats. Even in these cases, shrieks were answered by a bray from the dominant male, presumably to signal the presence of the male who plays an important role in defending the group from predators (Matsuda et al., 2008). Because we did not observe aggressive and affiliative displays associated with the bray, we suggest that brays are vocal displays used in neutral interventions (Roeder et al., 2002). We hypothesize that brays are appeasement signals to reduce tension and signal the adult male’s presence. Further studies are needed to test this hypothesis systematically and to investigate whether similar vocal intervention signals also occur in other primate species living in the canopies of tropical rain forests.

The call rate at LBPMS was double that at LKWS. This finding indicates that the vocal behavior may reflect the different socioecological constraints of wild- versus semi-wild populations (e.g., feeding access, predator risk). Artificial feeding at LBPMS might influence the natural behavior of the animals and provoke a higher call rate of agonistic vocalizations compared with LKWS. In addition, other factors, such as the presence of tourists, different predation risk, and different group composition, might affect vocal behavior. Investigating the impact of socioecological constraints on vocal behavior might help to develop a bioacoustics monitoring system to monitor the abundance of proboscis monkeys and categorize group composition, social relationship, and predation risk.

In conclusion, we found that vocal third-party intervention occurred in a nonhuman primate living in a complex social system and dense vegetation, limiting the visibility of group members. Vocalizations were used for neutral intervention and ended 65% of agonistic vocal exchanges. These vocal supports might signal affiliative bonds between the adult male and its group members similar to ring-tailed lemurs, which signal affiliative bonds between group members through vocal networks (Kulahci et al., 2015). Thus, further studies that include social network analysis are needed to investigate whether the vocal support of the adult male in one-male/multifemale groups depends on the social status of females or immatures.