Testing Hypotheses for the Emergence of Gestural Communication in Great and Small Apes (Pan troglodytes, Pongo abelii, Symphalangus syndactylus)

Gestural communication is crucial for primates. However, little is known about how gestural repertoires emerge through development. We conducted behavioural observations on captive apes, including 18 siamangs (Symphalangus syndactylus), 16 Sumatran orangutans (Pongo abelii), and 19 chimpanzees (Pan troglodytes), to test different hypotheses for the emergence of gestures (i.e., Phylogenetic Ritualization, Ontogenetic Ritualization, Social Negotiation, and Social Transmission hypotheses). Our results showed little variation in individual gestural repertories, and only one idiosyncratic gesture. Moreover, across subjects (N = 53), repertoire size did not increase with age and social centrality. When comparing repertoires across all possible combinations of conspecifics, including apes in different groups (N=273) for the four groups of siamangs and the two of orangutans, repertoire similarity was higher in dyads of the same group than of different groups, but it also increased with more observational effort and lower age difference between group members. Finally, when comparing repertoires across all dyads of conspecifics in the same group (N = 260), we found no differences in repertoire similarity depending on dyadic relationship quality. Overall, these results provide support for the Phylogenetic Ritualization hypothesis, according to which individuals are endowed with complete gestural repertories from birth. These repertoires are largely similar across individuals and groups, although they may be partially refined through social experiences.


Introduction
In recent decades, researchers have extensively investigated the communication systems of nonhuman primates (hereafter, primates). Gestural communication, in particular, has been the focus of abundant research and is thought to have played a central role in the evolution of human language (Arbib et al., 2008;Fitch, 2010;Prieur et al., 2020). By studying captive and wild individuals with a variety of observational and experimental methods, researchers have revealed many features that primate communication systems share with human language, including the intentional and flexible use of gestures (Anderson et al., 2010;Call & Tomasello, 2007;Roberts et al., 2014), the ability to elaborate signals (Bard et al., 2019;Leavens et al., 2005;Roberts et al., 2012), and the ability to combine them to convey novel meanings .
Despite the importance of gestures in primate communication systems (Call & Tomasello, 2007;Cartmill et al., 2012;Pika & Liebal, 2012), how gestural communication emerges through development is still highly debated. In particular, although many studies suggest that the majority of gesture types is largely genetically channeled, researchers differ in the importance they attribute to social experience in modifying the communicative and contextual use of gestures. Researchers have proposed various hypotheses to explain the emergence of gestural communication, including the Phylogenetic Ritualization hypothesis, the Ontogenetic Ritualization hypothesis, the Social Transmission hypothesis, and the Social Negotiation hypothesis (Bard et al., 2019;Hobaiter & Byrne, 2011a, b;Liebal et al., 2013Liebal et al., , 2018Pika & Fröhlich, 2019;Tomasello & Call, 2018;Tomasello et al., 1989). These hypotheses are not necessarily mutually exclusive, because multiple mechanisms of gesture acquisition may coexist, or they might play different roles in the acquisition of different types of gestures or at different stages during ontogeny (Bard et al., 2014;Halina et al., 2013;Liebal et al., 2018). However, these hypotheses make testable predictions about the characteristics of gestural repertoires that are more likely to occur under different scenarios (Pika & Fröhlich, 2019) and are thus important to draw inferences about the role that social factors and social experience might play for the emergence of gestural communication.
According to the Phylogenetic Ritualization hypothesis, gestures are largely innate, and social experience plays only a minor role in shaping individual gestural repertoires (Hobaiter & Byrne, 2011a, b). Gestural repertoires therefore are predicted to be very similar across individuals and groups (and even species), with no individual-or group-specific gesture types (Hobaiter & Byrne, 2011a, b). Under this hypothesis, interindividual differences in gestural repertoires can be explained in two ways. First, given that individual repertoire size increases with higher observational effort, differences in individual repertoires might simply emerge when observations are limited and repertoires fail to reach asymptote, suggesting that an increase in observational effort would likely allow the detection of more gesture types (Genty et al., 2009;Hobaiter & Byrne, 2011b). Second, if individual repertoires are innate, they should be identical at birth across 1 3 Testing Hypotheses for the Emergence of Gestural Communication… individuals; however, they might gradually contract through repeated interactions, as individuals identify gestures that are more effective and discard others, reducing their repertoire size (Hobaiter & Byrne, 2011a). Therefore, although the gesture types available to each individual are largely genetically determined, social experience may affect the way in which they are used through development (Fröhlich & Hobaiter, 2018), in line with the idea that individuals refine their communication systems through age, and become more proficient .
Other researchers, in contrast, suggest that social interactions and social experience play a more active role than genetics do in gestural acquisition. According to the Ontogenetic Ritualization hypothesis, for instance, gestures are created by two individuals who reciprocally adjust their behaviour during repeated social interactions, so that noncommunicative actions gradually acquire a communicative function through reciprocal anticipation (Plooij, 1978;Tomasello et al., 1985;. Being acquired by individuals in a dyadic context, gestural repertories are expected to differ strongly across individuals and groups and to include gestures that are only used by specific individuals (i.e., idiosyncratic gestures; Call & Tomasello, 2007;Liebal et al., 2006;Pika et al., 2005). Several studies of captive apes have found evidence of significant differences in gestural repertoires between groups and idiosyncratic gesture types (Halina et al., 2013;Liebal et al., 2006;Tomasello et al., 1994). In contrast, studies in the wild have provided no evidence of group-specific or idiosyncratic gestures but have found significant overlap in gestural repertories across individuals and groups of the same species (Genty et al., 2009;Hobaiter & Byrne, 2011a), in line with the Phylogenetic Ritualization hypothesis, according to which gestures are largely genetically predisposed and differences in their repertoires are an artifact of too short observational periods (Byrne et al., 2017;Hobaiter & Byrne, 2011a, b).
The idea that social interactions are crucial for the emergence of gestural communication also underlies the Social Transmission hypothesis, as individuals are expected to acquire gestural signals by first understanding their communicative function, then gradually learning to produce them through social learning processes, such as role reversal imitation Pika, 2008;Tomasello et al., 1994). Through social transmission, repertoires become similar across individuals of the same group, but not across conspecific groups, and may contain several group-specific gestures, but no idiosyncratic ones (Call & Tomasello, 2007). In line with this hypothesis, some studies have found evidence of groupspecific gestures in captive great apes (Liebal et al., 2006;Pika et al., 2003;Tanner & Byrne, 1996;Tomasello et al., 1989, although this appears to be a rather sporadic phenomenon (Pika & Fröhlich, 2019, for a review). Moreover, if gestures are mostly acquired through social learning, gestural repertoires should be more similar in dyads with closer social bonds, such as mother-offspring dyads, where opportunities for social learning are more frequent. However, the gestural repertories of infant bonobos (Pan paniscus) and chimpanzees (Pan troglodytes) appear to be more similar across individuals of the same age, rather than within mother-offspring dyads (Schneider et al., 2012), providing indirect evidence that gestural repertoires are likely not acquired through imitation.

3
Finally, social interactions are important for the acquisition of gestural communication systems under the revised Social Negotiation hypothesis (Fröhlich et al., 2016;Pika & Fröhlich, 2019). According to this hypothesis, gestures are continuously shaped within dyads by the physical and social context in which they take place, starting from complete actions that are mutually understood as having a specific meaning in certain contexts (Bard et al., 2014;Fröhlich et al., 2016;Pika & Fröhlich, 2019). In line with the Phylogenetic Ritualization hypothesis, these gesture forms may be innate, explaining similarities across populations and species. However, in contrast to the Phylogenetic Ritualization hypothesis, individuals learn their context-dependent usage (i.e., the circumstances in which the communicative event occurs, including the recipient's affordances and the communicative scenario), and the only limit to the number of possible gesture types that can be produced lies in the anatomical constraints of the species, so that large variation is expected in the gestural repertoires of individuals (Fröhlich & Hobaiter, 2018;Pika & Fröhlich, 2019). Moreover, the gestures acquired in a dyad can be used directly when interacting with other partners, because they do not have to be negotiated within each dyad, so that primates are expected to show no group-specific gestures when they experience similar demographic and ecological conditions (Fröhlich et al., 2016;Pika & Fröhlich, 2019). This is in contrast with the Ontogenetic Ritualization hypothesis, according to which any action can be ritualized into a gesture, potentially leading to differences even across dyads of the same group (Pika & Fröhlich, 2019).
In this study, we compared the gestural repertoires of several ape species (i.e., siamangs, Symphalangus syndactylus, chimpanzees, and Sumatran orangutans, Pongo abelii) to test the four different hypotheses for the emergence of gestures that we described above. In all study species, individuals live in groups and engage with each other in different forms of interactions, including gestural communication, making them good models for the purpose of this study. According to the Phylogenetic Ritualization hypothesis, gestural repertoires should be very similar across individuals and groups, and repertoire size should not increase with social experience, as individuals are endowed with complete gestural repertories from birth (although there might be some variation across individuals, as some gestures may be used only in specific contexts that might be common only in certain groups or at certain ages; Fröhlich & Hobaiter, 2018;Hobaiter & Byrne, 2011a, b; Table I). According to the Ontogenetic Ritualization hypothesis, in contrast, gestures are largely learned by individuals in a social context (so that repertoire size should increase with social experience), and similarity in gestural repertoires should be generally low across individuals and groups (Call & Tomasello, 2007;Liebal et al., 2006;Pika et al., 2005; Table I). According to the Social Transmission hypothesis, gestures are mainly acquired through social learning (so that repertoire size should increase with social experience), and similarity in gestural repertoires should be high across individuals of the same group, especially if they have better relationship quality and more opportunities for social learning, but not across conspecific groups (Pika et al., 2003;Tanner & Byrne, 1996;Tomasello et al., 1989 Table I). According to the Social Negotiation hypothesis, finally, gestures are continuously shaped through social interactions (so that repertoire size should decrease with increasing social experience, as individuals refine their repertoires), and they might be used by the same individual with different partners, so that variation in gestural repertoires should be higher across individuals than across groups in similar demographic and ecological conditions (Bard et al., 2014;Fröhlich et al., 2016;Pika & Fröhlich, 2019; Table I). Social experience is a complex construct, which includes different aspects that are not necessarily easy to measure. We operationalized it in terms of age and social centrality (as a measure of integration in the social network, i.e., the sum of the centralities of an individual's neighbours; Farine, 2017;Farine & Whitehead, 2015). Although age and social centrality are only two aspects of social experience and do not reflect all the inherent complexity of this construct, we considered that they provide a good proxy of social experience, because older and more central individuals should have had more opportunities to observe and interact with other group members compared with younger and less central individuals. Finally, given that our study sample included subjects belonging to different species, we also explored interspecific variation in gestural communication. However, we refrained from formulating specific hypotheses and only provide post-hoc interpretations in the discussion, because our study species differ in several socioecological characteristics that may affect important aspects of communication (e.g., fission-fusion levels: Aureli et al., 2008;dominance style: Maestripieri, 1997dominance style: Maestripieri, , 1999.

Ethical Note
The study was approved by all the facilities in which the observations took place (i.e., Zoo Krefeld and Leipzig Zoo, Germany; Zurich Zoo, Switzerland; Howletts Wild Animal Park, Bekesbourne, United Kingdom; and Yerkes Regional Primate Research Center, Atlanta, GA, USA). The study adhered to all the national regulations of the countries in which it was conducted. As the study was purely observational, we did not require ethical approval from an institutional board. During the study, we used no invasive procedures and did not alter the daily routines of the Table I Hypotheses, predictions, and models used to test them in a study testing hypotheses for the emergence of gestural communication in great and small apes. Asterisks mark where data supported the predictions. We operationalized social experience in terms of age and social centrality (a measure of subjects' importance as "social hubs"; Farine, 2017;Farine & Whitehead, 2015), and relationship quality as dyadic proximity scores following Silk and colleagues (Silk et al., 2009)  study subjects. We never separated individuals from the other group members, and we never water-or food-deprived them for the study. The observer never interacted with the study subjects.

Study Subjects
Our study subjects included 53 captive apes, belonging to (i) four groups of siamangs (N = 18), two at Zoo Krefeld (Germany) and two at Howletts Wild Animal Park in Bekesbourne (United Kingdom), (ii) two groups of Sumatran orangutans (N = 16), one at Zürich Zoo (Switzerland) and one at Leipzig Zoo (Germany), and (iii) one group of chimpanzees (N = 19) at Yerkes Regional Primate Research Center (Field Station) in Atlanta, GA (Table II). All siamang groups were housed in external enclosures with adjacent sleeping rooms, except for Group 1 at Zoo Krefeld, which lived in an indoor enclosure. The orangutan groups were both housed in an indoor and outdoor enclosure, whereas the chimpanzees lived in an outdoor enclosure with adjacent indoor enclosure, including sleeping rooms and rooms where they could spontaneously enter to participate in noninvasive experimental tasks. All groups had various structures and objects as enrichment in their enclosures, such as trees, ropes, and platforms. According to the STRANGE framework (Webster & Rutz, 2020), our study sample had several limitations, but we considered it to be relatively representative for the purpose of studying gestural communication: 1. The study subjects lived in groups that offered them a range of social experiences (e.g., opportunities to interact socially with other group members, including communication, and learn from others). Individuals had different dominance ranks, and we observed them during natural interactions in their groups. 2. We observed all the individuals in the group (except for one siamang group, see below), with no systematic bias in participation. However, the total number of individuals observed remained relatively low, and in the case of chimpanzees it only included one group. 3. All subjects lived in groups that received regular enrichment activities and that partly resembled the ones in the wild in terms of group size and composition (e.g., chimpanzees lived in larger groups, siamangs in smaller groups). However, our sample also had several limitations; most subjects were born in captivity (N = 50), and captive individuals may not be good representatives of their wild counterparts (Boesch, 2007). Wild chimpanzees and orangutans, for instance, live in groups with high levels of fission-fusion dynamics, which is very difficult to approximate in a captive setting. Their group size and composition also can vary in greatly the wild, providing different social opportunities and challenges in terms of communication, meaning that the inclusion of more groups would be essential to ensure a representative sample. Moreover, captive settings may reduce the variety of activities in which individuals can engage (e.g., predatory 1 3 Testing Hypotheses for the Emergence of Gestural Communication… , and thus the variety of signals that they need to use, reducing their repertoire size. 4. All individuals were habituated to the presence of human observers. 5. We accounted for variation during individual development, although a longitudinal approach would have captured variation in gestural communication through development better (Bard et al., 2014). 6. Although study subjects were captive, they included both males and females who did not belong to a specific genetic line. Finally, although some study subjects had already participated in behavioural and cognitive experiments, it is unlikely that these previous experiments affected the natural occurrence of gestural communication in the study groups.

Data Collection
We collected data between April and December 1999 for the chimpanzees between May and July 2001 for the orangutans at Leipzig Zoo, between February and March * Measure of subjects' importance as "social hubs" (Farine, 2017;Farine & Whitehead, 2015) (Altmann, 1974), for a total of 10 h per individual (except for group A at Howletts Wild Animal Park, where we only observed the youngest individual). We selected focal animals in a pseudo-randomized order, between 7.30 a.m. and 6 p.m on every week day, distributing observation times equally between mornings and afternoons. If a subject moved outside the range of the observer's vision, we stopped the recording and started another session with a new focal animal if it did not return within 5 min. For chimpanzees, we collected 5-min focal animal samples. We conducted most observations between 8 a.m. and 12 a.m., three to four times per week, but we also conducted some observations in the afternoon. For each session, we randomly selected the order of the individuals and moved to the next session when all the individuals had been recorded once. We collected 42 h of focal-animal samples (i.e., 26 to 27, 5-min, focal samples per individual). Although ideally the observational effort should be similar across species (e.g., to avoid the repertoire size being lower in some species simply because of lower observational effort), we consider that this is not a major problem in our study, for two main reasons. First, the gestural repertoires of each species reached asymptote (see below). Second, our observational effort allowed us to include a comparable number of gestures for all species (mean number of gestures observed for each individual, N = 80 in siamangs, N = 59 in orangutans, N = 63 in chimpanzees), because chimpanzees produced gestures with a higher frequency. Part of this dataset has already been analyzed in other studies to address different research questions (Liebal et al., 2004a(Liebal et al., , b, 2006.

Coding
We coded the videos with Adobe Premiere and VLC media player to extract information about (i) the gestures produced by each individual (to assess repertoire size and similarity); and (ii) the social relationships among the study subjects (to assess Eigenvector centrality as a measure of social experience, and proximity scores as a measure of relationship quality). We defined gestures as any expressive movement of the head or limbs, as well as body postures (excluding complete body actions) that were directed to a specific recipient and showed some intentionality (e.g., persistence, response-waiting, means-end dissociation; Tomasello et al., 1985Tomasello et al., , 1994Liebal et al., 2004b). We categorized gestures in line with literature (Liebal et al., 2004a(Liebal et al., , b, 2006 after removing whole body actions (i.e., 18 gesture types for chimpanzees, 17 for orangutans, and 14 for siamangs; Table III). We avoided finer-graded distinctions (Hobaiter & Byrne, 2011a), because the categorization of gestures in different types is highly controversial (Bard et al., 2019;Fröhlich & Hobaiter, 2018). Whenever we detected a gesture in the video, we coded the gesture type produced and the identity of the individuals gesturing and to whom the gesture was directed. Interobserver reliability was good and was assessed in previous publications (Liebal We assessed social relationships among group members from the videos using scans (Altmann, 1974). We conducted up to one scan every 10 min, noting all individuals within 2 m of the focal subject (except for group A at Howletts Wild Animal Park, where we conducted the scan on the first visible individual from a pseudo-randomized list). This resulted in 35 scans for each study subject, except for the chimpanzee group (where we conducted a total of 163 scans) and for group A at Howletts Wild Animal Park (where we conducted 10 scans for each study subject). We constructed an undirected weighted matrix for each group based on these proximity measures and used the packages vegan (version 2.5-3; Oksanen et al., 2018), asnipe (version 1.1.10; Farine, 2018) and igraph (version 1.2.1; Csardi & Nepusz, 2006) to run social network analyses and assess individual Eigenvector centralities (Farine, 2017;Farine & Whitehead, 2015), which we used as a proxy of individual social experience. We used dyadic proximity scores as a proxy of relationship quality, following Silk and colleagues (Silk et al. 2009). For each dyad, we divided the number of observations in which we saw the two individuals within 2 m by the total number of times we observed them (separately or within 2 m of each other), obtaining values that could range between 0 and 1 (with 0 meaning that the two individuals never spent time in proximity, and 1 meaning that they were always seen in proximity to each other). We then calculated the mean of these proximity values for the whole group and divided all the dyadic values by this mean value to obtain dyadic proximity scores that ranged between 0 and 8.66 (with proximity scores lower than 1 representing weaker than average social relationships, and those higher than 1 representing stronger than average social relationships; Silk et al., 2009).

Statistical Analyses
We first assessed whether gestural repertoires reached asymptote by plotting the cumulative number of gesture types observed in each group against the total number of gestures observed (Fig. 1). We used the number of gesture types observed (instead of the time spent observing the individuals), because this allowed us to account for intergroup differences in the frequency with which gestures are observed more effectively. Visual inspection of the figure suggests that, for all study groups, we needed approximately 200-400 gestures to reach asymptote.
Given that every individual was included in multiple dyads, we used the brms package (version 2.16.3; Bürkner, 2021) in (R Core Team, 2020), which allows the implementation of multimembership models. These models account for the fact that the same individual identities can appear in both variables (individual 1 or individual 2 in each dyad) and accounts for the lack of independency in these data points. We ran three models. In M1, we tested whether gestural repertoire size increased with individuals' age and social centrality (Table I). For this purpose, we entered one line for each study subject (N = 53). Our response was the ratio between the number of gesture types produced by the subject and the number of gesture types for the species (i.e., 18 for chimpanzees, 17 for orangutans, and 14 for siamangs), which we modelled with a binomial distribution, as commonly done with proportions of discrete variables. As test predictors, we included the subject's age (in years), social centrality and species, and we controlled for the subject's sex and observation effort (i.e., the number of gestures overall the subject produced during the study).
In M2, we tested whether gestural repertoires differed across conspecific groups and, in particular, whether repertoires were more similar in dyads that belonged to the same group compared with dyads that belonged to different groups (Table I). We only included orangutans and siamangs, for which we tested more groups than for other species, and we entered a line for each possible combination of conspecifics (N = 273 dyads). In this way, we compared repertoire across all possible pairs of individuals, within and across conspecific groups. In line with previous studies (Halina et al., 2013), we measured repertoire similarity with the Dice coefficient (Dice, 1945), as the ratio between twice the number of gesture types common to two individuals and the sum of gesture types in the repertoire of each of the two individuals (so that a value of 0 means that two individuals have no gesture types in common, and a value of 1 means that they have identical gestural repertoires). For modelling purposes, we used a binomial distribution to jointly model the numerator and the denominator of the Dice coefficient, as commonly done with proportions of discrete variables. As test predictors, we included whether individuals in the dyad belonged to the same group (binomial variable) and species. As controls, we included the dyadic observational effort (i.e., the sum of the number of gestures produced by each individual in the dyad), the age difference (as absolute difference, in days), and the sex combination (i.e., female-female, female-male, male-male). Finally, we included both individuals' identities as random factors, using the mm function of the brms package.
In M3, we tested whether repertoire similarity varied across dyads depending on their relationship quality and, in particular, whether repertoires were more similar in dyads that spent more time in proximity or in maternal kin (i.e., mother-offspring or maternal-sibling dyads; Table I). In this dataset, we included all species, entering a line for each dyad of conspecifics in the same group (N = 260). In contrast to M2, therefore, M3 did not include dyads of individuals belonging to different groups (for which no measure of relationship quality was possible). As in M2, we modelled repertoire similarity with a binomial distribution. As test predictors, we included the dyadic proximity score, whether the dyad were maternal kin (binomial variable) and species. As controls, we included the dyadic observation effort, the age difference, and their sex combination, as above. Finally, we included both individuals' identities as random factors, with the mm function of the brms package.
Before running the models, we z-transformed continuous predictors and controls (i.e., age, centrality, observational effort, age difference, proximity score) to avoid convergence issues and increase comparability of predictor estimates. We compared each full model (containing test predictors, controls, and random factors, as described above) to a corresponding null model (containing controls and random factors only). For this purpose, we used the approximate leave-one-out (loo) crossvalidation in the loo package (Vehtari et al., 2020) and selected the best model using the difference and standard error between the expected log pointwise predictive densities of the full and null models (Vehtari et al., 2017). We ran all models using flat priors, 4 chains in parallel (to increase the number of independent samples and increase inference accuracy) and 2,000 iterations per chain, half of which were warm-up samples to enhance sampling efficiency (McElreath, 2016). We conducted posterior predictive checks using the bayesplot package (Gabry et al., 2019). All Pareto k estimates were below 0.7, and convergence was suggested by Rhat estimates of 1.00 (and 1.01 in M3) and a high effective number of samples in our models (McElreath, 2016). We found no collinearity issues (maximum VIFs = 1.37).

Individual Repertoire Size
Individual repertoire size varied from 9.4 ± 3.5 (mean ± SD) in chimpanzees to 9.1 ± 3.1 in orangutans and was 7.2 ± 2.5 in siamangs. There were no idiosyncratic gestures in our study, except for arm raise in orangutans, which we only observed once in a 4-year-old female, Padana. In all species, all the other gesture types were produced by at least two individuals and, on average, by around half of the conspecifics (i.e., 9.9 ± 5.4 of 19 chimpanzees, 8.5 ± 5.2 of 16 orangutans,

3
Testing Hypotheses for the Emergence of Gestural Communication… and 9.2 ± 5.8 of 18 siamangs). For M1, the difference between the expected log pointwise predictive densities of the full and the null models was − 0.9 ± 2.1, suggesting that the null model better fit the data, and that individuals' age and social centrality were not linked to their repertoire size. In the null model, moreover, only observational effort reliably predicted individual repertoire size (Table IV).

Repertoire Similarity
Overall, repertoire similarity within dyads varied between 0.23 ± 0.28 (mean ± SD) for chimpanzees (with the lowest mean level of observational effort per dyad), and 0.75 ± 0.16 for siamangs in Krefeld 1 (with an intermediate observational effort per dyad). For M2, the difference between the expected log pointwise predictive densities of the full and the null models was − 3.9 ± 2.7, suggesting that the full model provided a better fit to the data. In particular, repertoire similarity was higher between individuals of the same group (β = 0.24, lower-upper 95% confidence interval [CI] = 0.09-0.39; Fig. 2), but it also increased with more observational effort (β = 0.40, lower-upper 95% CI = 0.14-0.69; Fig. 3; Table IV). For M3, the null model fitted the data significantly better than the full model (− 0.8 ± 1.9). In the null model, both higher observational effort and lower age difference reliably predicted higher repertoire similarity in dyads (Table IV). . Boxplots show the data distribution for repertoire similarity from a generalized linear mixed model (Model 2, after standardizing for species and sex combination, so that their effect is not visible in the figure). Horizontal ends of the box represent the 75% and 25% quartiles, ends of the whiskers represent the 97.5% and 2.5% quartiles, central lines represent the model estimates. Grey circles represent dyadic data points for dyads belonging to different groups, and black crosses represent dyadic data points for dyads belonging to the same group.

Discussion
Our study found little difference across individual gestural repertoires in apes. We only found one idiosyncratic gesture (in orangutans), and repertoire size did not increase with individuals' age or social centrality (M1). Moreover, repertoire similarity was higher for dyads that belonged to the same group, rather than to different groups (M2). However, repertoire similarity was overall relatively low, increased with more observational effort (M2-M3) and did not vary depending on relationship quality (M3). In terms of individual gestural repertoires, we found little variation across individuals and very little evidence of idiosyncratic gestures, because all gesture types but one were used by at least two conspecifics, and on average gesture types were used by around half of the individuals. These findings are in line with both the Phylogenetic Ritualization and the Social Transmission hypotheses (Table I), which predict little variation across individuals, either because gestural repertoires are largely innate and thus similar (Phylogenetic Ritualization hypothesis: Hobaiter & Byrne, 2011a, b) or because gestural repertoires become similar across individuals of the same group through social learning processes (Social Transmission hypothesis: Pika, 2008;Tomasello et al., 1994). However, we found no evidence that repertoire size increased with social experience. In M1, in particular, the model, including age and social centrality, did not provide a better fit to the data than the model without these variables, and both age (− 0.11) and centrality (− 0.05) had negative estimates in the full model, suggesting that, if anything, repertoire size decreases with increasing age and social centrality. These results are therefore fully in line with the Phylogenetic Ritualization and Social

3
Testing Hypotheses for the Emergence of Gestural Communication… Negotiation hypotheses (Table I), according to which individuals are endowed with complete gestural repertories from birth, which are refined during development through social processes (Fröhlich & Hobaiter, 2018;Hobaiter & Byrne, 2011a, b). In contrast, these findings provide no support to the other hypotheses that we tested (Table I), which hypothesize a more active role of social experience for the acquisition of gestural repertoires (Call & Tomasello, 2007;Liebal et al., 2006;Tanner & Byrne, 1996).
Although the results above provide general support for the Phylogenetic Ritualization hypothesis, repertoire size may vary strongly within species depending on the way in which gestures are operationalized and, in particular, on how fine-graded distinctions between different gestural categories are (Bard et al., 2019;Fröhlich & Hobaiter, 2018). In chimpanzees, for example, the species repertoire size can vary from fewer than 30 (Pika et al., 2005) to more than 100 gesture types (Roberts et al., 2014), depending on how gesture types are defined and how fine-grained the level of analysis is. We used relatively broad categories for each gesture type, but finer-grained distinctions and/or bottom-up approaches that better assess variation in the form of gesture types (Bard et al., 2019) might reveal a much stronger role of social experience permeating the subtle forms in which gestures are performed by different individuals, rather than their general occurrence.
Furthermore, our study found that repertoire similarity within dyads was higher when individuals belonged to the same group than to different groups. These results are in line with the Social Transmission hypothesis (Table I), according to which individuals acquire gestures through social learning processes and similarity is higher across individuals of the same group, as they can learn from each other Pika, 2008;Tomasello et al., 1994). However, although repertoire similarity was higher between individuals of the same group, repertoire similarity was relatively low, with individuals sharing between 23 and 75% of their repertoires, depending on the species. Such relatively low repertoire supports the Ontogenetic Ritualization hypothesis (Table I). However, our study also showed that repertoire similarity increased with higher observational effort. Therefore, our results can be better explained by the Phylogenetic Ritualization hypothesis, according to which gestures are largely innate, but repertoire size and similarity may increase with observational effort, as also more infrequent gestures can be observed (Genty et al., 2009;Hobaiter & Byrne, 2011a, b).
At first sight, the finding that repertoire similarity is higher between conspecifics belonging to the same group might appear to contrast with the Phylogenetic Ritualization hypothesis, because if gestures are mostly innate, gestural repertoires should be very similar also across conspecific groups. However, the Phylogenetic Ritualization hypothesis does not exclude the possibility that, as they age, individuals prune the innate larger repertoires that they were born with, reducing them to a subset of gestures that are more effective for the specific context in which they live (Byrne et al., 2017;Genty et al., 2009;Hobaiter & Byrne, 2011a). Therefore, this hypothesis does not necessarily exclude higher repertoire similarity within groups than across groups, because individuals in the same group may face more similar challenges that require specific subsets of gestures, leading to higher repertoire similarity between individuals from the same groups. Similarly, the fact that repertoire similarity was relatively low, especially for chimpanzees, is not in contrast to the Phylogenetic Ritualization hypothesis. Our study subjects mostly included adult individuals (mean age ± SD across species: 14 ± 10 years), whose repertoires might have already experienced extensive pruning. In the future, longitudinal studies will be crucial to monitor how individual repertoires change through development and whether pruning really explains the relative low repertoire similarity in our study.
In line with our interpretation, relationship quality did not predict repertoire similarity across dyads; we found no evidence that maternal kin or dyads with stronger social bonds had more similar gestural repertoires than nonmaternal kin or dyads with weaker social bonds. If the Social Transmission hypothesis were true and social learning processes shaped individual repertoires, leading them to gradually converge through repeated interactions, one would expect these processes to more frequently happen in dyads with better relationships, which should have higher repertoire similarity, but this was not the case. However, there are methodological reasons that might explain why we failed to find a link between repertoire similarity and quality relationship across dyads. First, we operationalized dyadic quality relationships based on maternal kinship and matrixes of spatial proximity. However, these two measures might not capture the complexity of ape relationships. Some species of primate, for instance, can reliably discriminate paternal kin and may preferentially affiliate with paternal half-sisters over non-kin (Smith et al., 2003;Widdig et al., 2001). Moreover, the intensity of dyadic social relationships often is assessed with composite indexes, in which multiple affiliative measures (e.g., grooming, proximity) are combined into a single score to obtain a more comprehensive evaluation of relationship quality (Silk et al., 2006(Silk et al., , 2009. Having only used proximity measures, our study might have failed to properly capture relationship quality across our study dyads. Including better measures of social relationships and taking into account paternal relationships might provide different results. In our study, this was unfortunately not possible, because we could not determine paternity for the chimpanzee group, and we did not have enough data to assess composite indexes for all study groups. Moreover, as discussed above, repertoire size and repertoire similarity may vary strongly within species depending on how gestures are operationalized, so that finer-grained distinctions might provide different results. However, finer-grained distinctions are likely to provide even lower levels of repertoire similarity across dyads, in contrast to the Social Transmission hypothesis. Also in line with the Phylogenetic Ritualization hypothesis, and with the idea that innate larger repertoires may be partially refined through age depending on the individuals' needs and experiences (Fröhlich & Hobaiter, 2018;Hobaiter & Byrne, 2011a), repertoire similarity in our study was higher when individuals were closer in age. These results suggest that repertoire similarity might simply increase when individuals share similar contexts or activity budgets (e.g., because they have a similar age or sex), because they might be more likely to use the same gesture types that are appropriate in those contexts. These results are in line with literature that shows that repertoire similarity in chimpanzees and bonobos was higher in individuals with similar ages than in mother-infant dyads (Schneider et al., 2012). Overall, these findings suggest that gestural repertoires are unlikely to be acquired through social learning processes, because gestural repertoires should be more similar when individuals have higher opportunities for social learning, such as in mother-infant dyads, but not necessarily across age peers (Table I).
Hypotheses for the emergence of gestural communication are not necessarily mutually exclusive, because they may coexist or play a different role at different developmental stages or for different gesture types (Bard et al., 2014;Halina et al., 2013;Liebal et al., 2018). For instance, only the gesture types produced by young chimpanzees when interacting with higher-ranking partners were preceded by the spontaneous appearance of weaker forms of the signal (likely as an emotional response), which the authors interpreted as these gestures having a different origin (i.e., largely genetically based) compared with the others (which would be instead socially acquired, in line with the Ontogenetic Ritualization hypothesis; Bard et al., 2014). Therefore, longitudinal analyses will be necessary to detect finer-grained changes in individual repertoires and better disentangle whether social experience really plays a different role for the different gesture type.
Finally, we found no consistent differences across study groups and species in terms of repertoire size and similarity. In the future, it will be important to include more groups and species to assess whether specific socioecological characteristics are linked to interspecies variation in repertoire size and similarity. Some authors, for instance, have suggested that the degree of flexibility in signal production is at least partly determined by the species social system (Preuschoft & van Hooff, 1995). In species with higher levels of fission-fusion dynamics, subgroups frequently vary in size and composition, social relationships between group members may be more differentiated and uncertain, and communication repertoires might more be likely to include signals that favor the maintenance of long-term differentiated social relationships and the resolution of their uncertainties (Aureli et al., 2008). In contrast, primates living in smaller, more cohesive groups may show higher overlap in their individual gestural repertoires and thus more uniform repertoires on a group level compared with species with larger, more flexible group structures (Call & Tomasello, 2007;Pika et al., 2005). Furthermore, interspecific differences in gestural communication also vary depending on dominance styles, with more despotic species having more predictable outcomes of social interactions and thus having comparably smaller and less flexible repertoire sizes (Maestripieri, 1997(Maestripieri, , 1999. Future studies should ideally compare several groups and species to test these different hypotheses for interspecific differences in the complexity of gestural repertoires. Despite the limitations of our study, our results contribute to the debate about how gestural repertoires emerge through development. Overall, the Phylogenetic Ritualization hypothesis is perhaps the one that best explains our findings: apes are likely endowed with complete gestural repertories from birth, which are largely similar across individuals and groups, but these repertoires might be partially refined through age depending on the contingencies that individuals experience (Hobaiter & Byrne, 2011a, b). Future work will ideally use larger sample sizes, including individuals from wild settings, and finergrained categories for different gesture types. Moreover, a longitudinal approach will be important to monitor changes in gestural repertoires and detect the emergence of single gesture types at the individual level. Finally, it will be interesting to further disentangle the relative contribution of social and ecological experiences to the development of complex gestural communication, and the role of socioecological drivers in the evolution of human communication.