Introduction

Nonhuman great apes’ (hereafter great apes) use of 60–80 gestures in intentional communication remains the only broad system of communication outside of human language in which there is evidence for widespread and flexible use of goal-directed signals to communicate language-like meaning in everyday social interaction (Graham et al.2018; Hobaiter and Byrne 2014, 2017; Moore 2014). Great apes employ diverse repertoires of vocal, gestural, and facial signals to communicate a wide range of nuanced information (e.g., bonobos, Pan paniscus: Graham et al.2017; Pika et al.2005; chimpanzees, Pan troglodytes: Bard et al.2017; Fröhlich et al.2016a, 2016b; Hobaiter and Byrne 2011a, 2011b, 2014; Plooij 1978; Roberts et al.2012; Tomasello et al.1985, 1989; gorillas, Gorilla gorilla: Genty et al.2009; Perlman et al.2012; Pika et al.2003; Tanner and Byrne 1999; orang-utans, Pongo: Liebal et al.2006; cross-species: Bard and Vauclair 1984; Pollick and de Waal 2007). Recently, researchers have started to describe the way in which apes combine their signals; with gestures, vocalizations, and facial expressions, among other signals, employed in a single communicative system. These studies have highlighted the importance of considering communication holistically (Hobaiter et al.2017; Liebal et al.2011; Wilke et al.2017), as well as blurring the boundary between signal categories (e.g., orang-utans’ use of their hands to modify the acoustic properties of some vocalizations: Lameira et al.2013; Peters 2001).

Vocal behavior has been studied across great ape species in both captive and wild populations (e.g., chimpanzees: Crockford and Boesch 2005; Goodall 1986; Hauser and Wrangham 1987; bonobos: Bermejo and Omedes 1999; de Waal 1988; gorillas: Salmi et al.2013; orang-utans: Lameira et al.2015; Wich et al.2012). However, gestural research efforts have focused mostly on the African great apes, with detailed repertoires described for the gesturing of both captive and wild chimpanzees (e.g., Bard et al.2014; Hobaiter and Byrne 2011a; Pollick and de Waal 2007; Roberts et al.2012; Tomasello et al.1985, 1989), bonobos (e.g., Genty et al.2014, 2015; Graham et al.2017; Halina et al.2013; Pika et al.2005), and gorillas (Byrne and Tanner 2006; Genty et al.2009; Pika et al.2003; Tanner et al.2006), while gestural research on orang-utans is limited to a few captive studies (Cartmill 2008; Cartmill and Byrne 2007, 2010; Liebal et al.2006; although cf. Mackinnon 1974 for descriptions of wild orang-utan signals, Bard 1992 for gesture use in free-ranging reintroduced individuals, and Waller et al.2015 for facial displays).

Great apes show substantial flexibility in the production of their signals, for example showing nuanced audience effects in their vocalizations (audience composition: Crockford et al.2012; Schel et al.2013a, 2013b; Slocombe and Zuberbühler 2007) and facial expressions (audience attention: Waller et al.2015). Unlike either vocal or facial signal repertoires, gestural repertoires contain signals with a range of information modalities (e.g., in the case of gesture: audible, visual, and tactile). All vocal and facial signals contain, respectively, either audiovisual or visual information. However, gestural signals offer signalers flexibility in signal selection relative to the visual attention of their recipient. Great apes can adjust their selection of gesture types to take into account other individuals’ ability to receive the information (e.g., Cartmill and Byrne 2007; Cudmore and Galdikas 2012; Hobaiter and Byrne 2011a; Liebal et al.2004a, 2004b; Poss et al.2006; Tomasello et al.1994), including potential eavesdroppers (Hobaiter et al.2017).

Studies of communication in captive great apes have provided unique insights into their cognitive capacities, for example, selecting the appropriate modality for their audience’s visual attention, showing understanding of the physical basis of gestural communication (e.g., Tomasello et al.1985, 1989), and showing understanding of recipients’ knowledge states (Cartmill and Byrne 2007). However, a captive environment impacts the expression of species-typical behavior (e.g., Hobaiter and Byrne 2011a; Seyfarth and Cheney 2017), and can lead to the regular spontaneous production of behavior rarely, if ever, seen in the wild (chimpanzee pointing: Hobaiter et al. 2013; Leavens et al.2005b; tool use in gorillas: Fontaine et al.1995; Lonsdorf et al.2009).

The study of gestural communication in wild orang-utans provides a general point of broader comparison for an exploration of the evolutionary origins of gestural communication. Current estimates place the last common ancestor of the African apes and orang-utans at around 17 million yr. ago (Pozzi et al.2014). However, and of more interest, orang-utans also occupy a very different socioecological niche from that of their African cousins. Communication, like other behaviors, is adapted to the niche of the species employing it (Cheney and Seyfarth 2018). The semisolitary and arboreal niche occupied by wild orang-utans provides a unique point of comparison for the communicative usage of their gestures, as compared to African apes. Chimpanzees and bonobos live in large multimale–multifemale groups of between 20 and > 200 individuals, in a fission–fusion social structure (Aureli et al.2008; Nishida 1968). While gorilla groups are typically much smaller, they also typically include multiple adult males and females, as well as immature individuals (Robbins and Robbins 2018). Play, sex, and display are among the most prolific contexts for gestural communication in wild or captive African apes (Fröhlich et al.2016a; Genty and Zuberbühler 2014; Goodall 1986; Graham et al.2017; Hobaiter et al.2017; Plooij 1978; Schneider et al.2012; Tomasello et al.1997). In contrast, the most regular social partners in wild orang-utans are mothers and their offspring (Mitani et al.1991; van Schaik 1999). While siblings do interact, the long interbirth intervals of 6–8 yr. (Wich et al.2004) limit sibling interactions to the first few years of life, with a large age difference. With no known long-distance audible gestures (cf. chimpanzee drumming: Arcadi et al.1998, 2004), orang-utan signaling outside of mother–offspring is typically vocal (e.g., male long calls: Delgado et al.2009; Mitani 1985). Recent studies have highlighted the importance of early socialization on African ape communication (Bard et al.2014; Fröhlich et al.2016a, 2017; Laporte and Zuberbühler 2011), but the impact of the social environment on the development of wild orang-utan gesture remains unknown.

Their ecological niche presents physical and cognitive challenges that also distinguish them from African apes. Their diet requires the acquisition of complex food-processing techniques (Galdikas 1988; Jaeggi et al.2008; van Adrichem et al.2006; Van Noordwijk and Van Schaik 2005; Wich et al.2004) and their lifestyle is more arboreal (Thorpe and Crompton 2006). As a result, orang-utan hands and feet are employed simultaneously in locomotion more often than seen in African apes (chimpanzees and gorillas: Gebo 1992), perhaps limiting their availability for gesturing.

The socioecological environment of captive orang-utans differs strikingly from those in the wild. They are often housed socially with several adult individuals and offspring (Price and Stoinski 2007). As well as differences in the social partners and behavioral contexts available, the physical environment impacts signal selection and transmission (Hobaiter et al.2017; Mitani et al.1999). Captive habitats are limited in size, but also in form – with typically open enclosures that allow for longer lines of sight and more terrestrial behavior (Hebert and Bard 2000; Manduell et al.2011). As a result, the expression of gestural communication in wild orang-utans may differ substantially from that recorded in captive groups (Cartmill 2008; Cartmill and Byrne 2010; Liebal et al.2006). Given increasing anthropogenic pressures, many populations of wild primates are decreasing in numbers (Estrada et al.2017). As all three species of orang-utan are considered Critically Endangered in the wild (Ancrenaz et al.2016; Goossens et al., 2006; Nowak et al.2017), there is also an urgent need for both research and conservation in their natural habitat.

Here, we provide an initial description of the gestural and vocal signal repertoire used in mother–offspring pairs of wild Southwest Bornean orang-utans. The largely arboreal lifestyle of wild orang-utans may impact both the expression of gestural forms, for example limb use, and the range of gesture types available in a particular interaction. We examine the selection of signals and their adjustment to recipient attention. We further explore the responsiveness of orang-utans to gestural communications and describe the range of goals for which gestures are employed between mother–offspring pairs.

Methods

We collected data during focal follows (Altmann 1974) of mothers and dependent offspring orang-utans within the Sabangau peat-swamp forest in Borneo, Indonesia, at the Borneo Nature Foundation (BNF) research site in conjunction with the Centre for International Co-operation in Management of Tropical Peatland (CIMTROP). The study site is a 500-km2 protected area known as the Natural Laboratory of Peat Swamp Forest (NLPSF), which is managed by the University of Palangkaraya for the purposes of scientific research. A base camp is located at 2°19′S and 114°00′E, 20 km SW of Palangkaraya. Unlike most forests in Kalimantan, the Sabangau forest is not impacted by high levels of fragmentation, making it one of the largest continuous areas of peat-swamp forests left on the island (Morrogh-Bernard et al.2003). This forest supports the largest population of orang-utans in the world at a density of two or three individuals per km2 (Husson et al.2009; Morrogh-Bernard et al.2003; Singleton et al.2004; Wich et al.2008).

Subjects

We followed 16 orang-utans over the course of the study; these included 7 mother-dependent offspring pairs (Table I) and 2 older semiindependent offspring (Georgia, the maternal sibling of Gretel, and Isabella, the maternal sibling of Indy and Ima) that were occasionally encountered when interacting with the focal pairs. Following Rijksen (1978) and Morrogh-Bernard et al. (2002), we define age/sex groups as follows: infants (0–3 yr), juveniles (3–6.5 yr), adolescents (6.5–10 yr), adult females (females with young), unflanged males (adult), and flanged males (adult). Using these categories we included 7 adult females, 2 adolescents (both female), 6 juveniles (4 males and 2 females), and 1 female infant. Six of the adult females had home ranges within the study site at the NLSPSF. We first encountered the seventh female and her offspring in spring of 2016, and believe that she was new to the area because of potential displacement from the nearby forest fires that occurred in September and October of 2015.

Table I Focal orang-utan mother–offspring pairs followed

Data Collection

Standardized behavior and video data collection, based on the field data collection procedures by the Leakey Foundation Orang-utans Compared Workshop in San Anselmo, CA (Morrogh-Bernard et al.2002), started in May 2014. We collected data in May 2014–July 2016, yielding a total of 681 h of recorded footage. We followed orang-utan pairs from nest to nest with a team of two or three observers for a minimum of 5 days per month if encountered and not lost. One or two people recorded primary behavioral, proximity, and vocalization data on both mother and offspring, while the remaining observer took video recordings. The majority of video data recorded were from orang-utans in an arboreal context. All ground observations were at a minimum distance of 10 M. minimum distance for arboreal observations was 5 m, but 10–20 m was more typical. Where any orang-utan behavior appeared to be directed toward observers on approach or while moving to find an observation location, we stopped and/or increased our observation distance. Variation in observation distance and conditions impacts our ability to observe signals; for example, quieter vocalizations or subtle movements may be missed at greater distances. As a result, we employ ad libitum rather than continuous sampling of signals (Altmann 1974).

We recorded video and audio data using a Canon Powershot sx50 HS or Panasonic DMC FZ-1000 video camera and a Velbon up-400 monopod. The use of the built-in microphone on the video camera has significant limitations for the accurate collection of vocal signals, particularly within a noisy arboreal rainforest environment. As a result our vocal data are biased toward calls that were either louder or more acoustically distinct than the surrounding environment. Despite the limitations on the number and type of calls that could be coded, they still represented almost double the number of gestural signals recorded. (The coding of gestural signals typically excludes 20–40% of potential cases where they do not meet the criteria for intentional use; see, e.g., Genty et al.2009; Kersken et al.2018.) As a result we felt that it was important to include the vocal data to highlight the importance of vocal signals in orang-utan communication, but we are cautious in our analysis and interpretation of them..

Video Analysis

Following Genty et al. (2009), we scanned videos clips for “potentially communicative” episodes before coding. Essentially this meant we isolated any circumstance in which at least two individuals were present and at least one individual was not occupied in a solitary activity such as self-grooming or sleep, resulting in 52 h of footage (hours of footage per individual: range = 0.33–15.94, mean = 5.2 ± SD 4.5; see Electronic Supplementary Material [ESM] Table SI). We coded all vocal and gestural signals used to initiate social interaction. Facial gestures were included here; however, facial expressions could not be coded consistently given visibility in the arboreal habitat. We coded facial expressions ad libitum where possible but they were not included in subsequent analyses. Vocal signals originate from the mouth or throat, and can be altered by the use of hands or foreign objects such as leaves (Hardus et al.2009; van Schaik 2003). As specific calls used by orang-utans vary by location, known as “call cultures” (Wich et al.2012), we classified calls using a condensed compiled ethogram adapted from BNF protocols and previous studies of captive and wild orang-utans (Table II). We defined gestural signals as discrete, mechanically ineffective physical movements of the body observed during periods of intentional communication (Cartmill and Byrne 2010; Hobaiter and Byrne 2011a). Discrete movements have a clear start and end point, typically distinguished by a pause or change in speed or direction of movement (Kita et al.1997). We initially classified gestures following the repertoire described in Byrne et al. (2017; updated in Hobaiter and Byrne 2017), which included gestures previously seen in all four great apes in both captivity and in the wild. Example videos are available at http://www.greatapedictionary.com Given the high level of facial muscle control found in orang-utans (Caeiro et al.2013), we then extended the gesture list to include previously described orang-utan facial displays that were distinguished from facial expressions by the evidence for their intentional use (Cartmill 2008).

Table II Orang-utan vocalizations recorded

The exploration of intentional communication in either human or nonhuman primates is challenging, as it requires decoding a signaler’s intention: an invisible cognitive state, from the signaler’s observable behavior. The criteria for doing so were adapted from early explorations of language development in young children. Bates and colleagues (Bates et al.1975) distinguished illocutory acts, in which an infant employed a conventionalized signal toward a recognizable goal, from perlocutory acts, in which a signal changed a recipient’s behavior, but without any evidence that this effect was intended by the signaler. Tomasello and colleagues (Tomasello et al.1985) adapted Bates’ criteria for use with nonhuman apes, and subsequent studies of intentional communication in nonhuman animals have employed similar criteria.

We define intentional communication as including at least one of three criteria: 1) The signaler orients its body and gaze toward the recipient (Call and Tomasello 2007; Cartmill and Byrne 2010); 2) the signaler waits for a response from the recipient followed by repeating the gesture if the desired response is not obtained (Call and Tomasello 2007; Leavens et al.2005a; Tomasello et al.1994); and 3) in the absence of a response that in other cases is satisfactory, the signaler employs persistence toward a goal, such as modifying the gesture depending on recipient response, or lack thereof, or using the gesture in conjunction with other gestures or communicative behavior (Cartmill and Byrne 2007; Leavens et al. 2005; Tomasello et al.1994). We require each case of potential gesture use to meet at least one of these criteria to be considered a case of intentional gesture.

Coding of gaze direction is challenging in a natural setting, particularly from arboreal subjects. Following Hobaiter and Byrne (2011a, 2011b), we included an individual as directing its gaze toward a recipient where gaze was visible, or where head movements indicated that it was tracking recipient movements (in the way, for example, that gaze direction can be inferred while standing behind someone watching a tennis match, where the person’s head movements track the ball’s). Further details of the coding are provided in the text that follows.

Gestural signals are typically categorized by modality into three groups corresponding to silent-visual, audible, or contact (e.g., Hobaiter and Byrne 2011a). All gestures include a visual component, audible gestures always included an audible component as a result of the action (cf. silent-visual gestures that occasionally make “accidental” contact with a surface, such as an arm “Swing” gesture that contacts leaves), and contact gestures always make physical contact with the recipient and may also include an audible component. After video coding, but before analysis, we collapsed the categories of modality into visual and tactile. Visual included both silent and audible visible gestures. We combined these because reliably discriminating audible from silent-visual gestures was challenging in the arboreal habitat, with leaf and branch noises accompanying most movements. Although some gestures did appear to employ sound purposefully, such as “Stomp” and “Shake object,” we saw these in very low frequencies and therefore combined them with all other visual gestures.

For both the signaler and recipient we coded individual identity and age, the behavioral context immediately before and after signaling (Affiliating, Agonistic, Display, Feeding–individual, Feeding–food sharing, Grooming, Nesting, Nursing, Play–social, Play–solitary, Resting, Sex, Moving, Traveling, Other, Unknown; see Table SII for definitions), the estimated distance between the signaler and recipient (<1 m, 1–2 m, 2–3 m, 3–5 m, 5–10 m, >10 m; distances estimated using body size as a point of reference; Cant 1992; Oishi et al.2009). We recorded the state of the recipient’s visual attention at the time that the signal was initiated as attending (recipient had eye contact with the signaler or showed tracking of the signaler’s behavior through head or body movements); head in direction (recipient located in front of the signaler with the head in an arc of up to 45° in either side of the direction the signaler is facing); partial view (the recipient is in the signaler’s peripheral view, with the head at 45–90° to either side); out of sight (recipient is not in a position to see any physical movement made by the signaler); out of sight but in body contact (as out of sight, but recipient is in physical contact with the signaler). Signal combinations may be produced as a planned combination of signals, or because of the addition of another signal after the failure of an earlier signal (Genty and Byrne 2010; Hobaiter and Byrne 2011b; Liebal et al.2004b). We followed Hobaiter and Byrne (2011b) in distinguishing these two types of signal combination. Sequences are two or more signals that are overlapping or separated by <1 s. Bouts are two or more individual signals or sequences of signals that are produced with ≥1 s of response waiting between them.

Interobserver Reliability

We code video data across ape gestural studies employing the same methodology and coding protocol independently of the hypotheses tested or study population. AK and EH were trained by experienced gesture coder CH, and each then coded 55% of all video footage. We assessed reliability both between and within coders. We used an overlapping 10% of coded footage to assess interobserver reliability. We evaluated intraobserver reliability by coding a separate 7.5% of the total video footage twice, but ≥72 h apart. We selected videos for reliability testing using a random number generator, and measured the degree of concordance for specific coding categories between the ratings using both percentage agreement and Cohen’s κ (Altman 1991). The results of the inter- and intraobserver reliability testing showed 76–95% overlap and “moderate” to “very good” agreement for 10 of the 11 variables (Table III), suggesting that coefficients exceed chance for coded behavior (Bakeman and Gottman 1997; McHugh 2012). The interrater agreement on signaler persistence was 76% but achieved only a “weak” degree of agreement κ score (0.49).

Table III Results of interobserver and intraobserver reliability testing, by coding category

Analysis

In describing the repertoire of wild chimpanzees, Hobaiter and Byrne (2011a) required at least two instances of gesture use by an individual to include it in an individual repertoire, and use by at least two individuals to include it in possible species repertoires. However, as our dataset was relatively small and research has indicated that repertoire size is closely correlated to the quantity of data recorded in smaller dataset (Hobaiter and Byrne 2011a), we describe all potential gesture types used and provide the number of instances of gesture use. We calculated the number of gesture types identified relative to the number of gesture instances (an individual example of gesture use) coded for the total dataset, and individually for both the adult and offspring datasets. We graphed these for visual inspection to assess whether the repertoires reached asymptote. To address any effect of pseudoreplication from the use of ad libitum sampling, we converted data to means for each individual before analyses.

In analyses of signal choice we excluded any signals where the recipient’s attention state was unclear, as well as any signals apparently directed toward observers in order to restrict our analyses to signal use between orang-utans. We conducted analyses of signal choice with recipient attention state by fitting generalized linear mixed effect models (GLMM) using a binomial error distribution and logarithmic link function in RStudio 1.0.136 running R version 3.3.1 (2016-06-21). We fitted models using the lme4 package for R. We included only single signals, or the first signal in a rapid sequence (signals separated by 1 s or less; following Hobaiter and Byrne 2011b), in analyses of attention state, and excluded signals from an individual with fewer than five communicative interactions (communications). The GLMM included only intentional gestures and used gesture modality as the response to recipient attention. In addition, we included the social relationship (mother–infant, other; typically mother–infant), signaler age class (adult, immature), and signaler location (ground, tree) in the model as control effects. The GLMM included signaler identity (N = 12), recipient identity (N = 13), signaler context before communication (N = 16 levels; see Table SII), and recipient context before communication (N = 16 levels; see Table SII) as random effects. These factors have the potential to influence the choice of intentional signal; however, we have insufficient data to fully explore these in this analysis and the observations recorded represent a small and random sample of the possible levels in each factor. We include them as random effects in order to take into account their impact. We report the influence of recipient attentional state and the three control factors (Bolker et al.2008). We applied a likelihood ratio test using χ2 tests of independence to assess the potential correlation between the attention state of the recipient and the intentionality and modality of the following communicative signal.

To further quantify the use of gesture modality with recipient attention, we calculated the variation in usage (following Hobaiter and Byrne 2011a). First, we calculated the proportion of signal usage by modality across the complete corpus by individual. Next, we calculated the percentage deviation from this baseline use for each state of recipient attention with the following formula: Deviation = (β / α – 1) * 100, where β = portion of signals within each attention state and α = portion of signals in the overall corpus. We then analyzed the resulting deviations, indicative of adjustments made by the signaler based on recipient attention state, using planned t-tests, and reported them with the mean ± standard deviation.

We did not have sufficient cases of successful gesture use per individual to explore whether specific goals were associated with each gesture type. However, as research has shown that individual signaler identity did not impact signal meaning (Graham et al.2018; Hobaiter and Byrne 2014), we present a preliminary investigation here in which gesture use was combined across signalers. After a gesture was employed, the reaction that caused the signaler to stop signaling was deemed to be the apparently satisfactory outcome, or goal, of the gesture.

Data Availability

The datasets analysed during the current study are available in the figshare repository, https://figshare.com/articles/DATA_Orang-utan_Signalling/8132159.

Ethical Note

This was an observational study that did not contain any interventions. All research adhered to the ethical ASAB/ABS Guidelines for the Use of Animals in Research and followed the IPS Code of Best Practices for Field Primatology. Permission for the study was granted by RISTEK. The authors declare they have no conflict of interest.

Results

Video coding yielded 1299 communicative signals: 858 vocal signals and 441 gestural signals. The majority of signals in our dataset were produced while in the canopy (signaler location canopy: N = 1267, ground: N = 17, unclear N = 15). Where a recipient could be identified, signaling was typically between mothers and infants (N = 412). Signaling between other individuals in our dataset (e.g., siblings, unrelated individuals) was relatively rare (N = 19; see Table SIII). Signals apparently directed toward human observers (N = 295), unknown recipients (N = 530), and other species (N = 41) were excluded from all further analyses.

We found no difference in the proportion of signal types used between adult (N = 7; mean proportion of gestural signals = 32.4% ± SD 22.5, range: 0–58.9) and juvenile signalers (N = 6; mean proportion of gestural signals = 24.3% ± SD 12.8, range: 13.3–49.3; t-test, t = 0.778; df = 11, P = 0.453). There were too few infant or adolescent signalers to compare the use of signal types in these age groups.

Signal Choice in Response to Recipient Visual Attention

Orang-utans also varied their selection of gesture modality (visual or tactile) with the recipient’s visual attention (N = 335 gestures; GLMM: χ2 (4, N = 335) = 24.56, P < 0.001; see Tables SIV and SV for full model information). The distribution of residuals from the full model was normal, supporting the model fit (Fig. S1). Orang-utans increased their use of visual gestures when the recipient showed visual attention, and decreased it when the recipient was out of sight (attending: N = 10 mean = 47.3 ± SD 50.1; not attending N = 10, mean = −57.6 ± SD 39.3; t = 5.21, df = 17.03, P < 0.001). In contrast, they decreased their use of tactile gestures when the recipient showed visual attention, and increased it when the recipient was out of sight (attending: N = 10 mean = −23.2 ± SD 29.9; not attending N = 10 mean = 35.9 ± SD 33.5; t = −4.16, df = 17.77, P < 0.001). There was an effect of signaler location (ground, tree) on signal choice (Table SIV); however, only 17 of the 335 gestures were produced while on the ground (all in a feeding context) so further statistical exploration was not possible at this time.

Vocal Communication

Our identification of vocal signals was limited to those that could be discriminated within our video data, and likely underrepresents signal frequency, and biases signal types to those that are more distinct within the surrounding acoustic environment. Of the 855 vocalizations recorded we identified 11 types, all of which had been previously recorded in Bornean orang-utans (Table II). Four call types were used across age groups: “Grumph” (N = 3 recorded instances of use), “Kiss squeak” (N = 321), “Kiss squeak + hands” (N = 29), and “Raspberry” (N = 14). “Complex calls” (N = 5), “Gorkums” (N = 52), and “Kiss-squeak grumphs” (N = 266) were recorded only from adult females; and “Cries and screams” (N = 5), “Frustration screams” (N = 6), “Grunts” (N = 2), and “Soft hoot whimpers” (N = 147), were recorded only from immature orang-utans.

Gestural Communication

We recorded 441 instances of gestural signals that met criteria for intentional use. Gestures were produced individually (N = 131), and in sequence with other gestures (N = 30 instances; N = 15 sequences). Twenty-six distinct gesture types were provisionally identified; however, five of these were recorded on only a single occasion, and so do not, so far, meet the criteria for inclusion in individual or species repertoires (Table IV). When plotted cumulatively against the total number of gesture tokens coded, neither the total, nor the individual adult or offspring repertoires reached asymptote (Fig. 1), suggesting that further gesture types remain to be identified. Fourteen gesture types were used by adult female orang-utans, and 22 gesture types by their immature offspring. The gestures ‘Push,” “Protrude lower lip,” “Beckon,” and “Tap” were observed being used only by the adult females; and the gestures: “Swing arm,” “Swing leg,” “Swing object,” “Hand on,” “Dangle,” “Reach palm,” “Throw object,” “Hit object/ground,” “Fling,” “Stomp,” “Object move,” and “Object in mouth” were observed being used only by immature individuals. Signalers were more likely to gesture when the recipient was in close proximity: 84% of all gestures were executed when the recipient was <1 m from the signaler. Gestures carried out when the recipient was 2 or more m away were exclusively silent-visual “Present” gestures.

Table IV Wild Bornean orang-utan gesture types recorded in this study
Fig. 1
figure 1

The total number of intentional gesture types in the repertoire (solid circles) for all observed Bornean orang-utans (N = 14 individual) is plotted against the cumulative number of coded gestures. Data were collected in the Sabangau peat-swamp forest in Borneo, Indonesia (2014–2016). The graph also includes the same plot with the data separated for immature orang-utans (N = 8; crosses) and adults (N = 6; hollow diamonds).

Adults deployed a similar proportion of tactile gestures (N = 6; mean proportion = 54.3% ± SD 23.9, range: 33.3–85.7) as visual (audible or silent: mean proportion = 45.7% ± SD 23.9, range: 14.3–66.7; paired t-test: t = 0.440, df = 5, SE = 0.195, P = 0.679), whereas juveniles employed fewer tactile gestures (N = 6; mean proportion: 13.9% ± SD 16.3, range: 0–37.7) than visual (audible or silent: mean proportion = 86.1% ± SD 16.3, range: 62.3–100; paired t-test: t = 5.44, df = 5, SE = 0.133, P = 0.003). There were too few infant or adolescent signalers to compare the use of modalities in these age groups.

Gesture–Vocal Combinations

Our sample size of signal combinations is small, and interrater reliability testing indicated that there was only weak consensus on the coding of persistence, which distinguishes sequences and bouts. As a result we provide only a basic description of signal combination use. Gesture–vocal combinations within a sequence (<1 s separation between signals or overlapping) were rare (N = 15), and typically involved Kiss-squeak vocalizations (N = 14) together with a range of gesture types (“Objects shake,” “Swing with object,” arm “Swing,” “Dangle,” and “Throw object”). Individual vocalizations or vocal-only sequences were employed in the same communicative bouts (individual signals or sequences separated by ≥1 s of response waiting) as gestures or gesture-only sequences (N = 13) but here the signals from the two modalities were produced in succession, separated by periods of response waiting.

Limb Use in Gestural Communication

Many gesture forms can be produced with either the hand/arm or foot/leg, for example Reach, Swing. We compared the relative frequency of hand/arm and foot/leg forms of these gestures produced by orang-utans with those in a dataset of chimpanzee communication (N = 4221 instances of gesture use). Thirty of the chimpanzee gesture types could be produced with either the hand/arm or foot/leg; within these gesture types there was a strong bias toward hand/arm foot production in chimpanzees (N = 2239 gesture instances, N = 2185 hand/arm, N = 54 leg/ft.; binomial test: P < 0.0001). There was a similar bias toward production of gesture types with the hand/arm in orang-utan gestures (N = 18 gesture types, N = 258 gesture instances, N = 214 hand/arm, N = 34 leg/ft.; binomial test: P < 0.001); however, orang-utans were more likely than chimpanzees to use the leg/ft. forms (Chi-square: χ2 = 83.5, P < 0.001) and also produced gesture forms that involved use of both a hand/arm and a leg/ft. at the same time (N = 10).

Responsiveness to Gestural Requests

Orang-utans were generally very responsive to gestural requests; 80% (N = 205/255) of communications including intentional gestures were successful in achieving a satisfactory behavioral response. There was relatively little variation in responsiveness between mother–offspring pairs (pairs with three or more communications, N = 7; mean = 79% ± SD 9; range: 67–91%). Where gestural communications were successful, orang-utans frequently responded before the end of the final signal—overlapping response (pairs with three or more communications, N = 6; mean = 35% ± SD 26)—or within 1 s of the final signal (pairs with three or more communications, N = 6; mean = 55% ± SD 25), although this varied strikingly between mother–offspring pairs (overlapping responses: range: 0–67%; within 1 s responses: range: 20–83%).

Goals Associated with Gestural Communication

Orang-utans used N = 237 gestures in N = 205 communications to successfully achieve at least one of eight goals: Acquire object; Climb on me; Climb on you; Climb over; Move away; Play change: decrease intensity; Resume play; Stop that (Table V). Twelve gesture types were used successfully on three or more occasions (11 of these 12 were used by more than one individual successfully, range: 1–8 signalers per gesture types, mean = 3.8 ± SD 2.1). These gestures were used to achieve a mean 2.8 ± SD 1.4 goals (range 1–6 successful goals per gesture type); however 4 of the 12 gestures were used toward one or more play goal. If we consider only nonplay goals, gestures were used to achieve a mean 1.9 ± SD 0.9 goals (range 1–4 successful goals per gesture type).

Table V The goals for which gestures were employed

Discussion

Twenty-one gesture types and 11 call types were identified in the signaling of wild mother and offspring orang-utans in the Sabangau peat-swamp forest in Borneo. All call types had previously been identified, but our ability to discriminate call types was limited by our use of video data and by our focus on mother–offspring communication, and as a result we likely underestimate the call repertoire. In addition to the 21 gestures included in the repertoire, an additional five gesture types were observed on a single occasion. Three of these have been described in the gestural repertoires of captive populations, and the two gesture types new to descriptions of orang-utan gesture use have been described in other ape repertoires (see Table IV and Byrne et al.2017). Our criteria for inclusion as a case of gesture are strict, and can lead to the exclusion of 20–40% of potential cases (e.g., Genty et al.2009; Kersken et al.2018). Given our small dataset we suggest that these gesture types, and others that remained unobserved, would likely meet criteria for inclusion with additional observations, also extending the total gestural repertoire size. Of the gesture types observed, 3 were previously unreported in orang-utans, and 20 were previously unrecorded in wild orang-utans. Eleven of these gesture types were employed with a higher degree of physical versatility than in previous descriptions, for example, the same action being used with the foot or leg, where previously only the hand or arm had been described (e.g., Byrne et al.2017). Combinations of gestural and vocal signals occurred as both simultaneous and sequential combinations, but both forms were rare.

Orang-utan signalers were very responsive to gestural requests, with ca. 90% of all successful communications occurring with either an overlapping or immediate (<1 s) response. This fluid “style” of communication more closely resembles that described for bonobos (Fröhlich et al.2016c), as compared to chimpanzees, where delayed responses (≥2 s) were observed as frequently as more responsive behavior (Fröhlich et al.2016c). Previous studies from both captive and wild groups of apes suggest that apes’ use of gesture is impacted by their early social experience (e.g., Bard et al.2014; Fröhlich et al.2016a, b; 2017; Hobaiter and Byrne 2011b; Schneider et al.2012; Tomasello et al.1989). The extended period of dependency, necessary for orang-utans to occupy a cognitively demanding ecological niche (Galdikas 1988; Jaeggi et al.2008; van Adrichem et al.2006; Van Noordwijk and Van Schaik 2005; Wich et al.2004), as well as their very restricted number of social partners as compared to African apes (Mitani et al.1991; van Schaik 1999), likely influence their understanding of the communicative intentions of their partners, perhaps increasing their responsiveness.

Orang-utans are highly flexible in their use of limbs in order to exploit their arboreal niche (Manduell et al.2011; Thorpe and Crompton 2005, 2006). In comparison with chimpanzee gesturing, we found that orang-utans were more likely to employ leg/ft. forms of gestures that had both a hand/arm and leg/ft. variant, and that orang-utans were more likely to use hand/ft. or leg/arm combinations in gesture forms that involved the use of two or more limbs. As a result, where only the limb use varied from existing descriptions of gesture types in other apes, we were conservative in listing them as variants of the same gesture type, rather than as novel gesture types. While we provide an initial description of gesture types employed by wild orang-utans, our analyses show no asymptote in the repertoire, strongly suggesting that additional gesture types remain to be described. Similarly, while we found low levels of overlap in the gestural repertoires of mothers and their offspring (10 of 26 gesture types), no age-class repertoire reached asymptote, and so further study is needed to properly describe any variation in the use of different gesture types within the repertoire across development. We found that the proportion of tactile to visual gesture types varied between adult and juvenile signalers, with a bias toward visual gestural signal use in younger signalers. Previous studies have found either similar proportions of use (Schneider et al.2012), or a bias toward the use of tactile gestures in infant apes (Fröhlich et al.2016a); however, the use of tactile signals decreased with increasing independence from the mother (Fröhlich et al.2016b; Plooij 1978), and so the variation in our data may reflect the comparison of juvenile, rather than infant, to adult signalers. Furthermore, as our data were collected across a wide range of behavioral contexts, and specific gesture types are associated with specific signaler goals (Cartmill and Byrne 2010; Hobaiter and Byrne 2014), the variation in signal selection may reflect a variation in adult and juvenile signaler goals, rather than a specific shift in modality of gesturing with age. The semisolitary social environment of wild orang-utans not only limits the frequency of social interactions in general, but particularly limits the frequency of contexts, such as play or sexual solicitation, that are among the most prolific for gesture use in wild African apes and captive orang-utans (Cartmill 2008; Fröhlich et al.2016a; Genty and Zuberbühler 2014; Goodall 1986; Graham et al.2017; Hobaiter et al.2017; Plooij 1978; Schneider et al.2012; Tomasello et al.1997). As a result, gestures employed only rarely and gestures associated with goals typically expressed in play or sexual contexts are likely absent from our data.

Supporting previous findings from captive groups (Cartmill and Byrne 2007; Liebal et al.2004a; Poss et al.2006; Waller et al.2015) and across other ape species (e.g., Hobaiter and Byrne 2011a; Leavens et al.2005a; Tomasello et al.1994), orang-utan signalers vary their production of intentional gestures depending on the visual attention of their audience, adjusting the modality of their signal (visual or tactile) to match the visual attention of their recipient (attending or not attending). These findings suggest that apes understand the physical basis of their gestural signals and that the gaze of the recipient is important in successfully deploying a visual gesture. Adjusting signaler behavior to match recipients’ visual attention can be achieved either by moving position so that the visual gesture is produced within the recipient’s line of sight, or by selecting a gesture type that includes information encoded in the auditory or tactile modalities. While evidence for both methods has been found in captive signalers (Liebal et al.2004a, 2004b), we very rarely observed wild orang-utan signalers adjusting the position from which they signaled, instead finding that they employed tactile gestures toward out-of-sight recipients. This bias may reflect the constraints of their arboreal niche, in which repositioning may be more costly than in more open terrestrial habitat typical of captive environments (Hebert and Bard 2000; Manduell et al.2011).

While a much larger dataset is needed to systematically define the gestural meanings of specific gesture types, we could describe the goals for which the orang-utans used their gestures and the relative flexibility of gesture types. Gestures were employed to achieve eight distinct goals, including six positive requests (Acquire object; Climb on me; Climb on you; Climb over; Play change; Play continue) and two negations (Move away; Stop behavior). As found in chimpanzees and bonobos (Graham et al.2018; Hobaiter and Byrne 2014), gestures were employed flexibly toward multiple goals, but the majority of gesture types were used to achieve at least one play-related goal; thus, flexibility decreased (to an average of two goals per gesture type) when play data were removed. These findings suggest that orang-utans also show means–ends dissociation in their gesturing, with individual gesture types used to achieve several distinct goals and several gestures employed to express an individual goal. While there is increasing evidence for signaler control over and intentional use of some ape vocalizations (Crockford et al.2012, 2015, 2017; Schel et al.2013a, 2013b), these tend to be one or two signals each tightly associated with specific information, such as danger or food. Ape gestural communication remains distinct, outside of human language, in the scope and flexibility of the gestural signals and their meanings.

The exploration of gestural communication in wild orang-utans is not straightforward. Orang-utans experience fewer social interactions than other nonhuman ape species (Delgado Jr. and van Schaik 2000; Galdikas 1985; van Schaik 1999). In combination with their arboreal lifestyle in a dense, high-canopy habitat (Thorpe and Crompton 2006), this makes video data collection of communicative signals particularly challenging: almost 700 h of video footage were required to provide even a modest first description. Nevertheless, within this we could provide confirmation of gesture types never previously recorded for orang-utans, and not previously recorded in wild populations, and that some aspects of their gesturing—for example, an apparent difference in the diversity of limb use—may reflect specific adaptations to their natural habitat. Given the rapid decline in orang-utan populations across all species (Davis et al.2013; Goosens et al. 2006; Meijaard et al.2010, 2011; Nowak et al.2017), and the occurrence of cultural variation between groups in their signaling (Lamiera et al. 2013), further research on understudied areas of their behavior, such as their gestural communication, is urgently needed..