Introduction

Vocal signals, which are usually produced by the vibration of vocal tracts in vertebrates, are essential for various biological functions including species recognition (de Oliveira et al., 2020; Fan et al., 2022a, 2022b, 2022c; Kwong-Brown et al., 2019), territorial defense (de Kort et al., 2009; Ma et al., 2020; Naguib & Wiley, 2001), anti-predation (Nielsen et al., 2019), reproduction (Fan et al., 2009) and inter-group spacing (Levrero et al., 2019; Ma et al., 2022). Vocal signals may have been shaped by selection pressures such as habitat structure (Arasco et al., 2022; Chen et al., 2022; Sementili-Cardoso et al., 2022), social system (Bouchet et al., 2013; Pougnault et al., 2022), predation (Coye et al., 2022; Jiang et al., 2022; Tuttle & Ryan, 1981), and sexual selection (Seddon, 2005). Many species with a close phylogenetic relationship show similar vocal structures due to their similar morphological and anatomical features, although this is not always the case (Thomassen & Povel, 2006). For example, closely related rodents have more similar laryngeal cartilage structures than those that are less closely related (Borgard et al., 2020), leading to similar acoustic features such as fundamental frequency, amplitude, and bandwidth (Miller & Engstrom, 2012). This phenomenon is also found in birds (McCracken & Sheldon, 1997; Seneviratne et al., 2012), anurans (Manzano & Sawaya, 2022), and primates (Hasiniaina et al., 2020).

Animal vocalizations show species differences at multiple levels, including notes, sequences, and coordination (Nardone et al., 2017). At the note level, clear differences occur in frequency (Friis et al., 2022), duration (Beckers & Ten Cate, 2001; Sementili-Cardoso et al., 2022), amplitude (Beckers & Ten Cate, 2001), and formant dispersion (Bergman et al., 2016). For instance, four source-related acoustic features, three filter-related acoustic features, and feature duration differ significantly among the three Spheniscus species (Favaro et al., 2016). Differences at the note level are driven by sexual selection (Favaro et al., 2016), climatic conditions and geographic isolation (), physiological differences (Friis et al., 2022), predation (Coye et al., 2022), and vegetation cover (Arasco et al., 2022). At the sequence level, animals utter hierarchical structures like sequences of notes, or phrases. This higher-level structure has been found in avians (Suzuki et al., 2018), anurans (de Carvalho et al., 2015), cetaceans (Pace et al., 2022), and primates (Girard-Buttoz, Bortolato, et al., 2022a; Girard-Buttoz, Zaccarella, et al., 2022b; Leroux et al., 2021; Leroux et al., 2023). Sequences also show species differences, especially in birds, in many features such as repertoires, syntax, and intervals between notes (Ivanitskii et al., 2017; Vokurková et al., 2013). For example, Luscinia cyane shows a flexible order, while L. luscinia uses linear syntax in which song types are executed in a strict sequence (Ivanitskii et al., 2017). At the coordination level, patterns of overlap (Mendez & Sandoval, 2021), the initiation of sex (Tobias et al., 1998), and sex specificity (Muller & Anzenberger, 2002) of song coordination (e.g., duetting, a coordinated acoustic display where mated pairs combine their vocalizations) also differ across species. For instance, based on the dominant frequency, duration, call rate, and sex difference, the family Callicebinae shows four distinct duetting patterns (Adret et al., 2018). Unlike at the note level, inter-species differences in the sequence and song coordination levels are mainly driven by group size and structure (Valderrama et al., 2013) and social learning (Vokurková et al., 2013).

The divergence of vocal signals, particularly loud calls, has been studied extensively in primates. For example, temporal-related and frequency-related acoustic features of loud calls show striking differences, influenced by genetic drift in Microcebus spp. (Hasiniaina et al., 2020) and by sexual selection in Alouatta spp. (Bergman et al., 2016). Therefore, vocal signal divergence serves as a valuable tool for species delimitation in many species, including Microcebus spp. (Zimmermann et al., 2000), Nomascus spp. (Ruppell, 2009), and Leontopithecus spp. (Snowdon et al., 1986). Further supporting this idea, playback experiments show that Eulemur spp. (Rakotonirina et al., 2016), Tarsius spp. (Nietsch & Kopp, 1998), and Macaca spp. (Muroyama & Thierry, 1998) can distinguish between species based on vocal signals. Given that hierarchical structures and song coordination have evolved in multiple primate species, with diverse units serving different functions and conveying varying information (Berthet et al., 2019; Ma et al., 2024; Seiler et al., 2015), we need to investigate whether and to what extent differences exist at each level (Bouchet et al., 2010). However, previous studies have compared species at a single level, such as the call note (Propithecus diadema and Indri indri; Valente et al., 2022), sequence (Presbytis spp.; Meyer et al., 2012), or duet (Pitheciidae; Adret et al., 2018), or combine all parameters into one discriminant analysis (Van Ngoc et al., 2011).

Gibbons (Hylobatidae) are arboreal and highly territorial small apes that are distributed throughout Southeast Asia (Yang et al., 2023; Bartlett et al., 2016). Although they live mainly in pairs of one adult male and one adult female (Brockelman et al., 1998), multiple gibbon species can also live in stable groups with one male and two breeding females, or one female and two adult males (Guan et al., 2018; Lappan et al., 2017; Savini et al., 2009). All gibbon species emit a variety of notes that are organized as sequences. Except for Java gibbons (Hylobates moloch) and Kloss’ gibbons (H. klossii), adult males and females coordinate their sequences to form duet or trio songs (Cowlishaw, 1996; Geissmann, 2000). At the note level, physiological factors may strongly influence some features. For instance, frequency-related features of notes such as pitch are significantly correlated with androgen levels in H. lar (Barelli et al., 2013). The sequence level is more flexible. For example, H. lar can produce and perceive speech-like phrases (Terleph et al., 2018a), while H. muelleri inserts phrases (Inoue et al., 2020) and chunk structures (Inoue et al., 2017). In terms of song coordination, gibbons also show high flexibility and complexity. Species including H. funereus (Lau et al., 2022), H. lar (Terleph et al., 2018b), and Symphalangus syndactylus (Geissmann & Orgeldinger, 2000) show high pair consistency in duets by adjusting their song to temporal and spectral aspects of another’s.

Gibbon vocalizations have been thought to be genetically determined because the rate of notes and spectral characteristics of female hybrid gibbon great call sequences are intermediate between those of their parental species (H. lar and H. pileatus; Brockelman & Schilling, 1984; Tenaza, 1985). Additionally, a significant positive correlation has been found between genetic similarity and gibbon song structure (Ruppell, 2009; Van Ngoc et al., 2011), indicating that gibbon species with closer geographical distance and phylogenetic relationships exhibit more similar vocal structures. However, sub-adult and adult H. agilis and H. moloch show socially mediated vocal flexibility and mother–daughter co-singing interactions may enhance vocal development (Koda et al., 2013; Yi et al., 2022), which suggests that a social learning process may exist in the song development of gibbons.

Crested gibbons (Nomascus) are well known for their highly sex-specific and hierarchical vocalizations (Table I; Geissmann, 1995). Among them, cao vit gibbons (N. nasutus), Hainan gibbons (N. hainanus), and western black gibbons (N. concolor) were previously considered as a single species (N. concolor), referred to as the black gibbon, because adult males are all black and display subtle morphological differences (Geissmann, 1995; Mootnick & Fan, 2011). However, adult females differ in the presence or absence of a white face ring, the color of the chin and abdomen, and the shape and size of the black crest (Mootnick & Fan, 2011). Furthermore, the natal color of these species is also different (Zhu et al., 2024). A phylogenetic study analyzed mitochondrial cytochrome b gene sequence data from six Nomascus species, and showed that N. hainanus and N. nasutus are the basal branches of crested gibbons, while N. concolor branched off first in another clade (Thinh et al., 2010). These morphological and anatomical findings suggest they all are distinct species. These species often live in groups consisting of one adult male and two adult breeding females (Fan et al., 2015; Hu et al., 2018; Zhou et al., 2008). They all produce vocalizations at various levels, including note, sequence, and song coordination (Huang et al., 2020).

Table I Terms used to describe Nomascus gibbon songs

N. nasutus occurs along the border between Guangxi province, China, and Cao Bang, Vietnam. This area is characterized by a typical karst limestone landscape and is surrounded by degraded scrub and secondary forest (Fan et al., 2013). N. nasutus was thought to be extinct in both China and Vietnam by the 1960s, until a small population was rediscovered in Vietnam in 2002 (Geissmann et al., 2002) and in China in 2006 (Chan et al., 2008). A recent population survey estimated that 11 groups remain, with about 74 individuals (Wearn et al., 2024). N. hainanus is only found in Hainan Tropical Rainforest National Park (18°57’-19°11’ N, 109°03’-17’ E, 350-1560 m), China, which is covered in dense tropical rainforest (Zhang et al., 2010). The current N. hainanus population consists of six groups including about 40 individuals (Zhong et al., 2023). N. concolor is distributed in the mid-montane humid evergreen broadleaved forests and semi-humid evergreen broad-leaved forests in central and southeast Yunnan province, China, north Vietnam, and north Laos (Fan et al., 2022a, 2022b, 2022c; Yang et al., 2021). N. concolor has a larger population size than N. nasutus and N. hainanus. The current global population size is about 1300 (Fan, 2017; Fan et al., 2022a, 2022b, 2022c).

In these Nomascus species, the adult male and females coordinate their sex-specific vocalizations to produce duet or trio singing bouts in the early morning, which serve as territory or resource defense, and mate relationship maintenance (Fan et al., 2009; Ma et al., 2022). Studies have investigated vocal differences in four buff-cheeked Nomascus species: N. leucogenys, N. siki, N. annamensis, and N. gabriellae. Males of N. leucogenys gave loud staccato aa notes, which appeared rarely in N. siki, N. annamensis, and N. gabriellae. N. leucogenys could also be distinguished from the three other species by their female great calls, which had a faster frequency modulation and a longer duration (Van Ngoc et al., 2011). Although studies have described the sonogram structure of N. nasutus (Feng et al., 2013), N. hainanus (Deng et al., 2014), and N. concolor (Fan et al., 2010), no work has explored vocalization differences among these three species. To test whether species differences in vocal structure are present at different levels in gibbons and whether they are influenced by genetic factors, we explored (1) interspecific differences in vocal structure among the three Nomascus species, (2) which features contribute to interspecific differences, and (3) whether the similarity in vocalization at different levels is correlated with genetic similarity.

Methods

Study subjects

We recorded the songs of seven groups of N. nasutus in Bangliang Gibbon National Nature Reserve (22°49’-59’ N, 106°9’-30’ E, 500–1000 m), Guangxi, six groups of N. hainanus in Hainan Tropical Rainforest National Park, Hainan, and six groups of N. concolor from Dazhaizi (24°21’ N, 100°42’ E, 1700–2700 m) on the western slope of Mt. Wuliang National Nature Reserve, Yunnan (Fig. 1).

Fig. 1
figure 1

Distributions of three species of Nomascus in China. We recorded vocalizations in Bangliang Gibbon National Nature Reserve, Guangxi (2008 to 2021), Dazhaizi, Mt. Wuliang National Nature Reserve, Yunnan (2006 to 2021), and Hainan Tropical Rainforest National Park, Hainan (2020 to 2021). We also show the annual temperature, annual precipitation, altitude span, and the number of groups recorded for each species at each site.

In these species, typical male sequences usually begin with a boom note (bo) or several aa notes (aa), followed by a pre-modulated note (pre), and end with modulated notes (mR) with 0–4 rolls (Table I). Adult females contribute great calls and coordinate their singing with the males (Ma et al., 2024). The great call of females generally consists of 1–2 introductory notes and multiple wa notes (Fig. 2). Following the female great call, the adult male responds with a male coda, collectively forming a successful great call sequence (Fig. 2). If females have failed to produce the wa notes, we define it as a failed great call (Fig. 2c).

Fig. 2
figure 2

Spectrograms of the songs of three Nomascus species with features we measured. Plots on the left show sections of song bouts in N. nasutus (a), N. hainanus (b), and N. concolor (c). d A successful great call sequence of N. nasutus. e A male solo sequence of N. concolor. b A modulated note of N. concolor. MSS: typical male solo sequence; SGCS: successful great call sequence; FGCS: failed great call sequence; OD: overlap duration; CD: male coda sequence duration. Vocalizations in black represent the female great calls. We recorded vocalizations in Bangliang Gibbon National Nature Reserve, Guangxi (2008 to 2021), Dazhaizi, Mt. Wuliang National Nature Reserve, Yunnan (2006 to 2021), and Hainan Tropical Rainforest National Park, Hainan (2020 to 2021). See Table II for descriptions of these features.

Table II Acoustic features measured for Nomascus nasutus, N. concolor, and N. hainanus in China. MSS: typical male solo sequence; SC: song coordination

We used Sony PCM-D100 recorders with Sony C‐76 directed microphones, Marantz PMD 660 solid-state digital flash recorders, and ZOOM H6 handy recorders with Sennheiser ME 66 directional microphones at a sampling rate of 48 kHz and 24-bit resolution (Huang et al., 2020) to record all loud morning song bouts (Ma et al., 2024). Based on long-term behavioral observations and population monitoring, we knew the home range of each study group. In the center of each groups’ home range, we recorded all song bouts from 6:50 am to 9:00 am at a distance of 20 m to 400 m from the singing individuals.

Data processing

We selected 15 song bouts for each species: six groups for N. concolor (seven song bouts for CG5M4, three song bouts for CG6M2, two song bouts for CG4M1, one song bout for CG4M2, one song bout for CGAM1, and one song bout for CGBM1; Table S1), seven groups for N. nasutus (four song bouts for NG1M3, four song bouts for NGLM1, three song bouts for NGMM1, one song bout for NG2M1, one song bout for NG3M1, one song bout for NG4M1, and one song bout for NG5M1; Table S1), and six groups for N. hainanus (seven song bouts for HGBM1, three song bouts for HGCM1, two song bouts for HGDM1, one song bout for HGAM1, one song bout for HGEM1, and one song bout for HGFM1; Table S1) with the highest quality (a high signal-to-noise ratio) for analysis. We used Annotate (to TextGrid) analysis in Praat (× 64) 6.0.17 to show the spectrograms of each song bout and measured temporal features based on a fast Fourier transform window (1024 points, 50% overlapped Hanning window of 0.005s width, 0–6000 Hz frequency range, 80 dB dynamic range, 0.002 s time step and 20 Hz frequency step, Huang et al., 2020). The first author made all annotations. We used the package librosa 0.9.2 in Python 3.10 to measure frequency features (McFee et al., 2015). We used the function librosa.specshow to load an audio file as a floating-point time series, and then we used the functions librosa.stft and librosa.amplitude_to_db to convert an amplitude spectrogram to a dB-scaled spectrogram. We used the function librosa.display.specshow to generate an interactive spectrogram, with all parameters set to default values. Finally, we manually extracted frequency features from the spectrogram.

We measured 18 features at three levels: six features for each of four types of notes, eight features of the male solo sequence, and four features of song coordination (Fig. 2, Table II). We used a feature called “shape” to evaluate the rise and fall trend and stability of f0 of each note:

$$\text{shape}=\frac{{\text{end}} \, {{f}_{0}} \, - \text{start }{f}_{0}}{{\text{max}} \, {f}_{0} \, - \text{ min }{f}_{0}}$$

Thus, the closer the shape is to 1, the more stable the note's f0 is. We also described the note males were singing when the female(s) began to join the duet or trio and the difference of modulated notes between male solo sequences and coda sequences.

Statistical analysis

We used Python (version 3.10) to conduct statistical analysis. We used Kruskal–Wallis tests to test for differences among species for each feature, then used Wilcoxon tests with Bonferroni correction for multiple testing to test for differences between pairs of species. If p < 0.017, we rejected the null hypothesis and considered that feature to differ significantly between pairs of species. To account for pseudoreplication, we used a random forest model with permutations to test for interspecies differences and feature importance (Breiman, 2001) at the note, male solo sequence, and song coordination levels. For each level, we had m individuals, among which the lowest number of observations was n. We extracted \(\frac{\text{n}}{{2}}\) observations from each of m individuals as the training set and took all remaining observations as the testing set. We repeated this process 100 times, training and testing a random forest classifier at each time to calculate the mean testing accuracy M and mean importance of each feature Ni, where i is the number of features. Next, we permuted species labels of each individual and split data (m ×\(\frac{\text{n}}{ \, {2}}\) observations for training, others for testing) to train and test a random forest model. We repeated this 1000 times, resulting in 1000 testing accuracies pMj and 1000 values of importance for each feature pNij, where j is the number of repeats (Mundry & Sommer, 2007).

To test for species differences at each level, we calculated the number of pM with values greater than M and divided by 1000 to obtain a p-value. To determine which feature contributes significantly to species classification, for each feature, we calculated the number of pN values greater than N, divided by 1000, and obtained a p-value. If p < 0.05, we rejected the null hypothesis and considered that level to differ significantly between species or that feature to contribute significantly to species classification (Mundry & Sommer, 2007).

We took the arithmetic mean and standard deviation of all features of each level as the features of hierarchical clustering (Dong et al., 2021) to compare whether the vocal structures are consistent with phylogenetic relationships at different levels, and we used Euclidean distance to calculate similarity between features of vocal structures.

Ethical note

The research activities reported in this article comply with corresponding national and institutional guidelines and were approved by the Bangliang Gibbon National Nature Reserve Administration Bureau, Mt. Wuliang National Nature Reserve Administration Bureau, and Hainan Tropical Rainforest National Park. All research activities reported in this article comply with Chinese legal requirements. The authors have no conflicts of interest to disclose.

Data availability

The codes and datasets from the current study can be found in figshare: https://figshare.com/s/adb9a0bd068bb2491ebb (will be published at acceptance).

Results

Inter-specific differences in vocal structure

All note features differed significantly among the three species (Kruskal–Wallis tests: p < 0.001; Table S2; Fig. 3). However, some differences between two species were not significant after correction for multiple testing (Wilcoxon tests with Bonferroni correction: p > 0.017; Tables S3; Fig. 3). We found significant interspecific differences at the note level (permuted random forest analysis: p < 0.001; Table III), except for the boom note (permuted random forest analysis: p = 0.294; Table III).

Fig. 3
figure 3

Box plots of four types of notes, male solo sequences, and song coordination levels for Nomascus nasutus in Bangliang Gibbon National Nature Reserve, Guangxi (2008 to 2021), N. concolor Dazhaizi, Mt. Wuliang National Nature Reserve, Yunnan (2006 to 2021), and and N. hainanus in Hainan Tropical Rainforest National Park, Hainan (2020 to 2021). Box colors show different levels of analysis, the midline represents the median, boxes represent the 25th and 75th percentiles, and the whiskers represent the maximum and minimum values of the acoustic features for each species. All features show significant differences among three species (Kruskal–Wallis tests: p < 0.05; Table S2, S4, and S6).

Table III Inter-specific statistical difference at the level of note, male solo sequence, and song coordination in Nomascus nasutus in Bangliang Gibbon National Nature Reserve, Guangxi (2008 to 2021), N. concolor Dazhaizi, Mt. Wuliang National Nature Reserve, Yunnan (2006 to 2021), and and N. hainanus in Hainan Tropical Rainforest National Park, Hainan (2020 to 2021). The p-value and mean classification accuracy were obtained by permuted random forest analysis, see details in the Methods section

All male solo sequence features differed significantly among the three species (Kruskal–Wallis tests: p < 0.001; Table S4; Fig. 3). Difference in feature duration between N.nasutus and N. hainanus and feature mintd between N.concolor and N. hainanus were not significant after correction for multiple testing (Wilcoxon tests with Bonferroni correction: p > 0.017; Tables S5; Fig. 3). We found significant interspecific differences at the male solo sequence level (permuted random forest analysis: p < 0.001; Table III).

All song coordination features differed significantly among the three species (Kruskal–Wallis tests: p < 0.001; Table S6; Fig. 3). Difference in feature duration and overlap duration between N. nasutus and N. hainanus were not significant after correction for multiple testing (Wilcoxon tests with Bonferroni correction: p > 0.017; Tables S7; Fig. 3). We found significant interspecific differences at the song coordination level (permuted random forest analysis: p < 0.001; Table III). N. nasutus and N. hainanus showed similar coordinated patterns, while N. concolor differed (Fig. 3, Table S6). In N. concolor and N. nasutus, females emitted great calls when males emitted aa notes. In contrast, the timing of the female's great calls was more flexible in N. hainanus (Table S8). In each song bout, females produced a mean of 4.13 ± SD 3.18 great calls in N. concolor, 3.27 ± SD 2.08 great calls in N. hainanus, and 3.00 ± SD 1.41 great calls in N. nasutus. In N. nasutus, male coda sequences often consisted of more deeply modulated notes than those produced during the male solo sequence (Fig. 2a). Coda sequences usually consisted of at least three mR notes, each with at least two rolls. In N. hainanus, no mR notes included rolls in the male solo sequence, but the males sometimes deeply modulated and produced a roll in the first modulated note of the coda sequence (Fig. 2b).

Importance of each feature

For boom notes, no feature contributed significantly to distinguishing species (Fig. 4a). For aa notes, end f0 contributed significantly to distinguishing species (Fig. 4b). For pre notes, shape significantly contributed to distinguishing between species (Fig. 4c). For mR notes, shape and min f0 contributed significantly to distinguishing species (Fig. 4d). At the male solo sequence level, the number of boom notes and rolls of mR notes contributed significantly to distinguishing species (Fig. 4e). At the song coordination level, no feature contributed significantly to distinguishing species (Fig. 4f).

Fig. 4
figure 4

The importance of vocalization features in classifications at four types of notes, male solo sequence, and song coordination levels after 100 iterations of a random forest model for three gibbon species in China. Bars show the importance of the feature in distinguishing species. Black vertical lines show standard errors. Numbers above bars are P-values derived from permuted random forest analysis, see Methods for details. The smaller the p-value, the greater the difference in feature importance before and after the permutation of species labels. We studied Nomascus nasutus in Bangliang Gibbon National Nature Reserve, Guangxi (2008 to 2021), N. concolor Dazhaizi, Mt. Wuliang National Nature Reserve, Yunnan (2006 to 2021), and and N. hainanus in Hainan Tropical Rainforest National Park, Hainan (2020 to 2021).

Hierarchical clustering of vocal structure

For boom and pre notes, the similarity in vocal structure did not align with phylogenetic relationships. Nomascus hainanus and N. concolor clustered together significantly (Fig. 5a, c), indicating that their vocal structures of boom and pre notes were more similar than both are to N nasutus. In contrast, N. hainanus and N. nasutus clustered together significantly for the aa note, mR note, male solo sequence, and song coordination level (Fig. 5b, d, e, f), which aligns with the phylogenetic relationships.

Fig. 5
figure 5

Results of hierarchical cluster analysis showing the similarity in four types of notes, male solo sequence, and song coordination levels in three species of gibbon in China based on the standardized mean values of features. Colors represent clusters, and the length of the branch represents the Euclidean distance between species. We studied Nomascus nasutus in Bangliang Gibbon National Nature Reserve, Guangxi (2008 to 2021), N. concolor Dazhaizi, Mt. Wuliang National Nature Reserve, Yunnan (2006 to 2021), and and N. hainanus in Hainan Tropical Rainforest National Park, Hainan (2020 to 2021).

Discussion

We found interspecific differences at the aa note, pre note, mR note, male solo sequence, and song coordination level among the three Nomascus species. This indicates these species have evolved different note and sequence structures and coordination patterns. Together with morphological (Mootnick & Fan, 2011) and genetic differences (Van Ngoc et al., 2011), our results support regarding N. concolor, N. hainanus, and N. nasutus as distinct species. However, the boom note did not show significant species differences. Boom notes show a lower frequency and amplitude compared to other notes (Fig. 3, Table S2). In other species, species-specific information is also conveyed by calls with high frequency or high amplitude (Czocherová et al., 2022; Zuk et al., 2008). We speculate that the soft short boom may function as within-group communication and that its acoustic characteristics may be subject to fewer selection pressures than other notes.

At the note level, feature shape was most important in identifying species in pre and mR notes, while feature end f0 and min f0 were most important in aa and mR notes. All these features are frequency-related, suggesting that frequency-related features may play a more important role than temporal features at the note level. These findings reflect those in a study of Alouatta palliata and A. pigra, which found that frequency-related features such as formant dispersion and highest frequency are more likely to reflect interspecific differences than temporal traits such as the longest syllable duration (Bergman et al., 2016). In environments with denser vegetation coverage, species with lower f0 may have higher fitness (Luther & Gentry, 2013; Martens & Michelsen, 1981). Furthermore, in an environment with high humidity and high temperature, signals with high f0 decay faster (Haupert et al., 2023). We currently lack data on the habitat differences among the three Nomascus species.

For the feature shape, N. hainanus has the most stable mR notes with no rolls (mR0 only). For learned signals, frequency modulation patterns are more complex in larger groups than in smaller groups (Beecher, 1989), and the same rule may apply to larger populations. In small populations, some notes, especially those that are harder to imitate and learn, will disappear (Hudson & Creanza, 2022). The increasing number of individuals that need to be distinguished in large populations also encourages signalers to generate unique personal signatures (Pollard & Blumstein, 2011; Smith-Vidaurre et al., 2021). The population of N. hainanus declined to two groups with only two singing adult males in 2003 (Deng et al., 2014). The reduction of population size and cultural drift might be a potential explanation of the smaller number of note types in N. hainanus and the relatively low complexity of the frequency modulation pattern of mR notes (only mR0 remains). However, the role of social learning in gibbon vocalization is an open question, which deserves more research.

At the male solo sequence level, the number of boom notes and rolls of mR notes contributed the most to the interspecific differences. Compared to specific note features, syntactic features such as the number, type, and order of notes mainly affected the structure of sequences. This reflects the phenomenon observed in the Presbytis species. The loud call sequence of P. tomasi, P. potenziani, P.comata, and P. melalophos contains different types of notes and phrases, and the position of the phrase and the number of notes are also different (Meyer et al., 2012). The diversity and complexity of sequences are also positively correlated with population size (Valderrama et al., 2013). Male solo sequences in N. hainanus contain the fewest boom and aa notes, and the simplest types of notes, the decrease in the note types and number of notes in the sequence may also be related to the decline in population size. Nevertheless, the basic syntactic structure of the male solo sequence in these three Nomascus species is similar, all following the pattern of boom-aa-pre-mR transmission.

The three species showed an obvious difference in coordination pattern. The female great call and the male coda of N. concolor did not overlap. In contrast, female great calls were overlapped by male vocalizations in N. nasutus and N. hainanus. Although the overlap between vocal signals increases the strength of the vocal signal and reflects the cohesion between the overlapping individuals, it may also prevent other individuals from recognizing the identity information of the overlapping individuals (Briseño-Jaramillo et al., 2021). This suggests that female vocalizations in great call sequences in N. nasutus and N. hainanus signal group cohesion better, but at the cost of signaling individual identity and/or caller quality. N. concolor females may advertise individual identity or quality information via great calls that are temporally separated from those of males. In other gibbon species, female great calls convey and advertise information about individual identity (Clink et al., 2017) or quality (Terleph et al., 2016). The maximum f0 of great call climaxes is also lower in older females compared to younger females of Hylobates lar (Terleph et al., 2016). N. concolor lives in a relatively large population (Fan et al., 2022a, 2022b, 2022c; Li et al., 2023) while N. nasutus (Wearn et al., 2024) and N. hainanus (He et al., 2023) each have a tiny population. In small populations, sexual selection is temporarily relaxed due to an increase in mate selection costs or a reduction in territorial competition (Kaneshiro, 1980). Due to the large population, there are also more floating males in the population of N. concolor, which leads to more potential mating choices and extra-group copulations for females (Huang et al., 2022). This might make females of N. concolor display their individual information; this might explain why female and male vocalization do not overlap. However, this hypothesis needs further investigation.

Vocalizations of gibbons have long been considered to be genetically determined (Brockelman & Schilling, 1984; Tenaza, 1985), with interspecific differences among gibbon vocalizations consistent with their phylogenetic relationships (Van Ngoc et al., 2011). Our results are largely consistent with this hypothesis because interspecific differences in aa notes, mR notes, male sequences, and coordination patterns all aligned with phylogeny. However, this pattern is violated in boom notes and pre notes, which indicates that variations in vocal signals do not result solely from phylogenetic factors but can also be influenced by other selective pressures such as group size (Fan et al., 2022a, 2022b, 2022c), vocal learning (Tyack, 2020), and habitat disparities (Morton, 1975). In the future, we need to study the effect of these selection pressures on the differentiation of vocal signals at different levels. We suggest that comparative studies of vocal structure differences between  species should distinguish between different levels and compare them one by one. This gives a clearer and more complete picture of the interspecific differences in different aspects of the vocal signal.

Conclusion

We found significant species differences in gibbon vocalizations at the male sequence level, song coordination level, and the note level, except for boom notes. Interspecific differences in aa notes, modulated notes, male sequences, and coordination patterns all align with phylogeny, suggesting that these differences are largely genetically determined. We found obvious differences in male–female coordination among these species. These differences may be driven by factors such as habitat conditions, population size, or vocal learning, and need further research. We suggest that comparative studies of gibbon vocalizations should analyze different levels such as note, sequence, and song coordination, to provide a more comprehensive and accurate representation of their differentiation and evolution.