Acoustic signals play various roles in mate choice, resource defence, and species recognition in a broad range of taxa (Wilkins et al., 2013), including lemurs (Rakotonirina et al., 2016). Divergence in acoustic traits mediates discrimination within and between species and has been proposed to play a role in speciation and evolution (Wilkins et al., 2013; Zimmermann, 2016). This is particularly true for sympatric cryptic species, in which species-specific vocal signals and recognition systems are involved in driving reproductive isolation. For instance, recent research showed this mechanism in species of the genera Microcebus (Braune et al., 2008) and Phaner (Forbanka, 2020). The complexity of mammalian vocal communication has been studied to understand possible factors determining convergent evolutionary patterns (Charlton & Reby, 2016) and species-specific differences (Gamba et al., 2015). Three main evolutionary frameworks have been proposed for the diversification of communication systems and vocal flexibility (Schuster et al., 2012).

First, the Phylogenetic Hypothesis suggests that phylogeny determines the vocal repertoire of a species (Ord & Garcia-Porta, 2012), implying that closely related members of a taxonomic group will have very similar signals (Zimmermann, 2017). This hypothesis is supported by studies indicating concordance between vocal and genetic diversity across Nomascus species (Thinh et al., 2011). However, no evidence indicates a relationship between vocal behavior and phylogeny across lemurs (Hending et al., 2020; Zimmermann, 2017), including the Indriidae family (Ramanankirahina et al., 2016).

Second, the Social Complexity Hypothesis posits that the evolution of vocal communication and that of social life are related (Bouchet et al., 2013; Pollard & Blumstein, 2012), such that a more complex social system requires more subtle communicative abilities to mediate interactions among group members (Freeberg et al., 2012). Under this hypothesis, the diversity in the communicative signals of a species is related either to a stable and egalitarian social structure (Mitani, 1996) or to group size (Kappeler, 2019; McComb & Semple, 2005; Peckre et al., 2019). For instance, social structure and social organisation reflect the vocal repertoire complexity in Cercopithecus neglectus, Cercopithecus campbelli, and Cercocebus torquatus (Bouchet et al., 2013).

Third, and finally, the Sensory Drive Hypothesis (Endler, 1992) suggests that signals, sensory systems, and microhabitat choice coevolve, with signal evolution being driven by environmental conditions, including predation (Zimmermann, 2017). This hypothesis is supported by the acoustic windows occupied by Microcebus spp., Mirza spp., and Cheirogaleus spp. (Zimmermann, 2018), which use high frequency and ultrasonic components. The latter are rare among primates and appear to have evolved to cope with the social and ecological needs of a dispersed social network (Zimmermann, 2018). The acoustic frequency window is likely a balance between being conspicuous to conspecifics while remaining cryptic for predators (Zimmermann, 2018).

Although many lemurs live in smaller groups than other primates (Kappeler & Heymann, 1996) some lemur species live in large groups. Such groups may require sophisticated intelligence (social intelligence; Dunbar, 1996) and signals to modulate the relationships among group members (Matsuzawa, 2008; Oda, 2008). For instance, the gregarious Lemur catta has a repertoire of 22 call types (Macedonia, 1993), whereas other species, such as Eulemur rufifrons and Propithecus verreauxi, show referential-like calling (Fichtel & Kappeler, 2002). Call types and use also differ with sex in Eulemur coronatus (Gamba and Giacoma, 200), Mirza zaza (Seiler et al., 2019), and Lepilemur edwardsi (Rasoloharijaona et al., 2006). Hence, lemur vocal diversity may provide useful information on the selective pressures that may have played a role in the evolution of vocal communication (Oda, 2008).

Among lemurs, Indri is the only species that sings (Baker-Médard et al., 2013; De Gregorio et al., 2019; Giacoma et al., 2010; Torti et al., 2013, 2017). Recent studies showed that indri’s song possesses a rhythmic structure (De Gregorio et al., 2019; De Gregorio, Valente, et al., 2021a; Gamba et al., 2016), conforms to the linguistic laws of brevity (Valente et al., 2021), shows an ontogenetic development (De Gregorio, Carugati, et al., 2021b), and a sex-dimorphic phrase organization (Zanoli et al., 2020). This species shows a rich vocal repertoire, including distinct alarm calls for terrestrial and aerial predators (Maretti et al., 2010) and several call types mediating intra-group dynamics (Valente et al., 2019). In contrast, information on the vocal communication of Propithecus diadema is limited to qualitative accounts examining the role of vocal behavior in contact seeking (Petter & Charles-Dominique, 1979), and antipredatory behavior (Fichtel, 2014; Fichtel & Kappeler, 2002, 2011; Macedonia & Stanger, 1994; Oda & Masataka, 1996; Patel & Owren, 2012; Petter & Charles-Dominique, 1979; Wright, 1998). All Propithecus species have call types with comparable structures and functions (Petter & Charles-Dominique, 1979; Macedonia & Stanger, 1994; Oda & Masataka, 1996; Wright, 1998; Fichtel & Kappeler, 2002, 2011; Patel & Owren, 2012; Fichtel, 2014; Online Resource 2). An exception is the zzuss, a call type only occurring in the repertoire of P. diadema, P. candidus, P. perrieri, and P. edwardsi (Anania et al., 2018; Macedonia & Stanger, 1994; Patel & Owren, 2012; Wright, 1998). The four western Propithecus species (P. verreauxi, P. coquereli, P. coronatus, P. deckenii) and P. tattersalli do have a call type serving similar functions to the zzuss (terrestrial predator alarming and group coordination, Patel & Owren, 2012) but with a different acoustic structure (Fichtel, 2014; Macedonia & Stanger, 1994; Oda & Masataka, 1996; Petter & Charles-Dominique, 1979). Within the genus, the most investigated call types are the alarm calls of P. verreauxi and P. coquereli (Fichtel and Kappeler, 2011), and the zzuss of P. candidus (Patel & Owren, 2012). The latter represents the only quantitative description of a call type of eastern Propithecus species.

To understand the extent to which the vocal systems of two strepsirrhine species differ, we compared the calls of two sympatric and similar-sized lemur species, Indri and Propithecus diadema, belonging to the same taxonomic family (Indriidae), both inhabiting the same rainforest environment and having diurnal habits (Geissmann & Mutschler, 2006). These species are the largest extant lemurs, and their estimated pairwise divergence time ranges between 18 (Federman et al., 2016; Kistler et al., 2015; Masters et al., 2013) and 29-36 MYA (Antonelli et al., 2017; Fabre et al., 2009; Fritz et al., 2009; Roos et al., 2004). Propithecus diadema lives in multimale/multifemale groups of two to eight individuals (Irwin, 2008; Powzyk, 1997; Weir, 2014), whereas I. indri groups range from 2 to 5 individuals (Bonadonna et al., 2020; Glessner & Britt, 2005; Torti et al., 2017; Torti et al., 2018), usually comprising a monogamous reproductive pair and their offspring (Bonadonna et al., 2019). Thanks to these features, they are suitable subjects to investigate the effect of the phylogenetic, environmental, and social influence on their vocal behavior. Moreover, Indri and P. diadema emit calls in similar contexts. Both species vocalize in the presence of terrestrial disturbance (disturbance call in P. diadema, wheezing grunt and kiss-wheeze in I. indri; Macedonia & Stanger, 1994) or aerial predators (roaring vocalisations: Macedonia & Stanger, 1994; Powzyk, 1997). Calls also are used to coordinate group movements during foraging or displacing activities (Macedonia & Stanger, 1994; Petter & Charles-Dominique, 1979).

We compared the number of distinct call types and their spectro-temporal structure, in the light of the Phylogenetic Hypothesis, Sensory Drive Hypothesis, and Social Complexity Hypothesis. Indri and P. diadema belong to the same taxonomic family, so the Phylogenetic Hypothesis predicts that their repertoires should be more similar to one another than to those of more distantly related species. We also predict the vocal repertoires of the two species will be similar to one another, based on the Sensory Drive Hypothesis. Lastly, we tested two versions of the Social Complexity Hypothesis. Frist, if vocal repertoire size is positively related to group size (McComb & Semple, 2005), we predict that P. diadema, which lives in larger groups, will have a larger repertoire than I. indri, which lives in smaller groups. Conversely, if vocal diversification is driven by a stable and egalitarian social structure (Mitani, 1996), we predict a larger repertoire in pair-living, monogamous I. indri, than in the more despotic P. diadema, with its multimale/multifemale groups.


Data Collection

We conducted the study in four forest sites: Analamazaotra Special Reserve (Madagascar National Parks, 18° 56' S - 48° 25' E), Andasibe-Mantadia National Park (Madagascar National Parks, 18° 28' S - 48° 28' E), Mitsinjo Forest Station (Association Mitsinjo, 18° 56' S - 48° 24' E), and Maromizaha Protected Area (Groupe d'Étude et de Recherche sur les Primates de Madagascar, 18° 56' 49" S – 48° 27' 33" E). We collected vocalisations of I. indri by sampling 18 habituated groups between 2005 and 2018. Group size ranged from two to six individuals (mean ± SD = 4.2 ± 1.2). We collected vocalisations of P. diadema by sampling three habituated groups in 2014 and 2016. Group size ranged from eight to ten individuals (mean ± SD = 8.8 ± 0.8). Further information on data collection (groups size and composition, sampling days, and overall observation time) can be found in Online Resource 1. For both species, we followed a focal group for one to five consecutive days, observing animals at a distance ranging from 0.5 to 20 m. We identified individuals using morphological criteria such as fur patterns and other natural marks. Both species are diurnal and their activity pattern is concentrated during the first half of the day (Petter & Charles-Dominique, 1979; Pollock, 1975). Indri vocal emissions are concentrated in the early morning (Geissmann & Mutschler, 2006). Propithecus diadema calls can be emitted anytime throughout the day but are more common early in the morning and at the beginning of the afternoon (Petter & Charles-Dominique, 1979). We, therefore, monitored the groups daily, from 06:00 h until their activities started to decrease (usually around 14:00 h), using focal animal sampling to collect data (Altmann, 1974). Occasionally, we also collected audio and video recordings of individuals' utterances using ad libitum sampling (Altmann, 1974). We recorded spontaneous vocalisations using a Sennheiser ME66 or a Sennheiser ME67 shotgun directional microphone (frequency response range of both microphones: 40-20,000 Hz ± 2.5 dB) connected to a solid-state digital audio recorder, a Sound Devices 702 (frequency response range: 10-40,000 Hz +0.1/−0.5 dB), or a Tascam DR- 100 MKII (frequency response range: 20-20,000 Hz +1/-3 dB). We set the recorders at a sampling rate of 44.1 kHz and an amplitude resolution of 16 or 24 bit. We recorded signals emitted from individuals at 15 to 20 m depending on signal intensity, weather conditions, and canopy thickness. We made recordings with the microphone facing the caller or in the direction of the whole group. We did not deliberately manipulate or modify the animals’ behavior and recorded only spontaneous vocal emissions.

Acoustical and Statistical Analyses

We visually inspected all recordings using Praat 6.0.28 (Boersma and Weenink, 2017). For P. diadema, we acquired 8,946 calls from 1,872 initial recordings, of which we chose 3,814 calls for acoustic analyses. We selected high-quality vocal emissions (higher intensity and lower background noise) and discarded noisy and overlapping calls (multiple individuals and different species) and vocalizations uttered by infants. We discarded calls where the signal-to-noise ratio was lower than 12 dB, that were acoustically distorted, or that overlapped with other sounds (Gamba et al., 2015).

Vocal emissions can include sequences of repeated temporally close calls. We considered two emissions as distinct calls when they were separated by at least 0.025 s. This threshold is recognized in humans and non-human animals, including primates, as a natural psychophysical boundary representing the minimum time interval needed by the auditory system to differentiate between two distinct acoustic signals (Kuhl & Padden, 1983; Lieberman, 1991). In the field, we noticed that different call types can be emitted sequentially (e.g., the mmm often is uttered after a roar chorus; A. Anania, personal observation). Within the recordings, we found that the most conspicuous association concerned zzuss and tsk. We, therefore, measured the mean duration of the silent interval between these two call types across 145 recorded sequences. We normalized each sound file using a scale to peak function in Praat (Comazzi et al., 2016) and assigned it to nine a priori classes based on audio-visual evaluation (Lemasson et al., 2014). Some call types are described in studies of rainforest Propithecus (Macedonia & Stanger, 1994; Patel & Owren, 2012; Petter & Charles-Dominique, 1979; Powzyk, 1997; Wright, 1998). We chose the names zzuss (n = 400), roar (n = 176), and grunt (n = 145) to ensure consistency with the literature (Macedonia & Stanger, 1994; Patel & Owren, 2012; Wright, 1998). We labelled new call types according to the sound quality (chatter-squeal, n = 317; soft grunt, n = 221), the hypothesized function (lost call, n = 193), or with onomatopoeic terms (hum, n = 1,927; mmm, n = 246; tsk, n = 189). For each call type, we measured duration, mean, minimum, and maximum fundamental frequency. We also considered the range of emission and phonatory mechanisms. We employed the methodology shown in Valente et al. (2019) and used a custom-made script in Praat to extract spectral coefficients for each call: we measured the total duration of a sound and divided it into ten equal portions. Then, considering a frequency range from 50 to 22,000 Hz, representing the frequency spectrum covered by the calls in our sample, we split each portion into frequency bands (or bins) of 500 Hz each (e.g., 50–500 Hz, 501–1,000 Hz), then extracted the energy value of each bin (through the function ‘Get band energy’ in Praat). The resultant dataset included the duration and 220 frequency parameters for each call. We used the Rtsne package (Krijthe, 2015) in R (R Core Team, 2021) to embed the dataset into a bi-dimensional plan through a t-distributed stochastic neighbour embedding (van der Maaten & Hinton, 2008) with a Barnes-Hut implementation, initializing the algorithm with perplexity = 40 and theta = 0.5. We then submitted the reduced dataset, containing two features, to a clustering procedure, using a k-means algorithm (MacQueen, 1967).

Lastly, we investigated whether the two species shared some call types and assessed the difference among the two vocal repertoires. For the comparison, we used a dataset of 3360 calls used to quantify I. indri’s repertoire (Valente et al., 2019), containing 10 call types: clacson, hum, grunt, kiss, long tonal call, roar, short tonal call, songbit, wheeze, and wheezing grunt. Valente and colleagues used the same acoustic approach (extraction of duration and spectral coefficients of the calls, Valente et al., 2019), which allowed us to combine the features of all calls of both species into a single dataset. We first reduced the combined data through a t-SNE based compression and then submitted the compressed dataset to a k-means clustering algorithm (MacQueen, 1967). We used t-SNE to visualize data.

Ethical Note

We conducted observational research without manipulating animals, with permission of the Malagasy Ministry of Environment and Forests, Research permits: 2005 [N°197/MINENV.EF/SG/DGEF/DPB/SCBLF/RECH], 2006 [N°172/06/MINE NV.EF/SG/DGEF/DPB/SCBLF], 2007 [N°0220/07/MINENV.EF/SG/ DGEF/DPSAP/SSE], 2008 [N°258/08/MEFT/SG/DGEF/DSAP/SSE], 2009[N°243/09/MEF/SG/DGF/DCB.SAP/SLRSE], 2010 [N°118/10/MEF/SG/DGF/DCB.SAP/SCBSE, N°293/10/MEF/SG/DGF/DCB.SAP/SCB], 2011 [N° 274/11/MEF/SG/ DGF/DCB.SAP/SCB], 2012 [N°245/12/MEF/SG/DGF/DCB.SAP/SCB], 2014 [N°066/14/MEF/SG/DGF/DCB.SAP/SCB], 2015 [N°180/15/MEEMF/SG/DGF/DAPT/SCBT], 2016 [N°98/16/MEEMF/SG/DGF/DAPT/SCB.Re, N°217/16/MEEMF/SG/DGF/DSAP/SCB.Re], 2017 [N°73/17/MEEF/SG/DGF/DSAP/SCB.RE], 2018 [N°91/18/MEEF/SG/DGF/DSAP/SCB.Re]. We declare the data collection procedure conforming to the national legislation and international regulation concerning animal welfare. The authors declare that they have no conflict of interest.

Data Availability

The dataset is available from the corresponding authors on reasonable request.


t-SNE Mapping: Propithecus diadema calls

The algorithm identified eight clouds of points, where each point represents a call and each cloud might represent a cluster (van der Maaten & Hinton, 2008), so we imposed k = 8 for k-means clustering (Fig. 1c). The eight different clusters were mostly consistent with the putative identification of calls and with their acoustic structure (Table I). Clusters 3, 5, 6, 7, and 8 included one vocal type each: zzuss, chatter-squeal, soft grunt, lost call, and roar, respectively (Fig. 1a, c). Conversely, both Clusters 2 and 4 mainly included hum (94% and 84%) and mmm (6% and 16%, Fig. 1b). Grunt and tsk were grouped in Cluster 1 (Fig. 1b, c). Analysis of a subsample of 145 zzuss-tsk sequences showed that when these two calls are uttered sequentially, the mean duration of the pause between them is 0.62 ± SD 0.11 s.

Fig. 1
figure 1

Representation of P. diadema calls (recorded in Maromizaha Protected Area in 2014 and 2016) on a bidimensional plan obtained by initializing a t-SNE algorithm with perplexity = 40 and theta = 0.5. a Visualization of t-SNE mapping combined with a priori identification of call types (cs = chatter-squeal, gr = grunt, hum = hum, lc = lost call, mmm = mmm, ro = roar, sg = soft grunt, tsk = tsk, zz = zzuss). We generated spectrograms (Hanning window, 512 samples, overlap = 64, zero-padding = 16) using the R package Seewave (Sueur et al., 2008). b The distribution of vocal types within the clusters. Colors follow those in panel (a). c Results of k-means clustering on the bi-dimensional vector produced using t-SNE. Numbers indicate clusters (i.e., 1 = Cluster 1).

Table I Definition of call types emitted by Propithecus diadema recorded in Maromizaha Protected Area in 2014 and 2016

Acoustic parameters (duration, mean, maximum, and minimum fundamental frequency are expressed as mean ± standard deviation). We evaluated the range of emission (short vs. long) based on the call amplitude and the possible occurrence of counter-calling within or between groups.

t-SNE Mapping: Calls of Propithecus diadema and Indri indri

The algorithm identified 16 clouds of points, so we chose k = 16 for k-means clustering. The 16 clusters were partially consistent with the putative identification of calls. Clusters 1, 2, 8, 12, and 16 each included a single call type: soft grunt, chatter-squeal, roar, grunt, zzuss, respectively (all belonging to P. diadema’s vocal repertoire; Fig. 2a, c). Clusters 6, 9, and 11 included I. indri’s clacson, wheezing grunt, and songbit, respectively. Clusters 3, 4, 5, and 7 included mostly hum (95%, 74%, 92%, 96%) and a smaller percentage of mmm, both emitted by P. diadema (5%, 26%, 8%, 4% respectively; Fig. 2b). Cluster 10 grouped I. indri’s grunt and hum (73% and 27%; Fig. 2b) while Cluster 14 grouped indri’s kiss and wheeze (66% and 34%; Fig. 2b). Cluster 13 grouped P. diadema’s lost call (64%) with I. indri’s roar, long tonal call, and grunt (21, 10%, and 5%, respectively). Cluster 15 included mainly P. diadema’s tsk (81%; Fig. 2b).

Fig. 2
figure 2

Representation of P. diadema and I. indri calls on a bidimensional plan obtained by initializing a t-SNE algorithm with perplexity = 40 and theta = 0.5. a Visualization of the t-SNE mapping combined with the a priori identification of call types (II = I. indri; cl = clacson, gr = grunt, hum = hum, lt = long tonal call, ki =kiss, ro = roar, sb = songbit, st = short tonal call, wg = wheezing grunt, wh = wheeze, PD = P. diadema; cs = chatter-squeal, gr = grunt, hum = hum, lc = lost call, mmm = mmm, ro = roar, sg = soft grunt, tsk = tsk, zz = zzuss). We recorded calls of I. indri in four forest sites (Analamazaotra Special Reserve, Andasibe-Mantadia National Park, Mitsinjo Forest Station, and Maromizaha Protected Area) from 2005 to 2018, and calls of P. diadema in Maromizaha Protected Area in 2014 and 2016. We generated spectrograms (Hanning window, 512 samples, overlap = 64, zero-padding = 16) using the R package Seewave (Sueur et al., 2008). b The distribution of the call types within the clusters. Colours follow those in the panel (a). c Results of the k-means clustering performed on the bidimensional vector produced using the t-SNE. Numbers indicate the relative clusters (i.e., 1 = Cluster 1).


Our cluster analysis of the vocal repertoire of P. diadema highlighted the presence of eight clusters, mostly consistent with the a priori identification of the calls, with only a few call types grouping together. Based on acoustic and spectrographic analysis, we identified nine distinct call types. Five clusters showed homogenous grouping of as many call types: lost call, chatter-squeal, soft grunt, zzuss, and roar. Two of the remaining clusters showed a mixture of hum and mmm (94% and 6% in one case, 84% and 16% in the other), possibly indicating some gradation between the two (Wadewitz et al., 2015). The last cluster also grouped two call types: tsk and grunt. Given the results, we estimated the vocal repertoire of P. diadema to consist of nine call types, with some showing a graded structure (tsk and grunt, and mmm and hum, in particular). We used this estimate in our comparisons.

Comparison between the vocal repertoire of P. diadema and that of I. indri showed that loud calls of both species possess distinctive features, while some low-frequency calls resulted grouped together, meaning that these call types are characterized by similar spectro-temporal features. We identified eight homogeneous groups. Five (chatter squeal, both grunt and soft grunt, roar, and zzuss) were P. diadema’s most distinctive calls. Three (clacson, wheezing grunt, and songbit) were I. indri calls. This analysis suggested four clusters mainly consisting of P. diadema’s low-pitched calls, like hum and mmm (the latter in smaller percentages). It also confirmed the gradedness between these two call types found in the singles-species analyses. Two other clusters (10 and 14) grouped mostly I. indri’s low- (grunt and hum; 61%) and medium-pitched calls (wheeze and kiss; 66%). This result is in line with previous analyses of lemur low-pitched calls, in which the grunt, click, grunted hoot, hoot, snort, and long grunt of Eulemur ssp. (Gamba et al., 2012; Gamba & Giacoma, 2005, 2007; Nadhurou et al., 2015; Pflüger & Fichtel, 2012) showed little differentiation compared to alarm calls or high-pitched calls. Interestingly, two other clusters included P. diadema’s tsk (81%) and I. indri's short tonal call (19%) as well as P. diadema’s lost call (64%) and I. indri's roar and long tonal call (21%; 10%). These clusters grouped voiceless calls (e.g., tsk) and calls with a more broadband structure (e.g., roar, both long and short tonal call). This finding shows how feature extraction can be useful to characterize resonance frequencies of lemur calls, agreeing with earlier evidence (Gamba et al., 2015).

P. diadema’s roar and I. indri’s clacson were among the most distinctive call types. P. diadema’s roar is emitted in presence of raptors across congeneric species (Fichtel & Kappeler, 2002; Macedonia & Stanger, 1994; Petter & Charles-Dominique, 1979; Wright, 1998), and I. indri’s clacson also mediates antipredatory behavior and is given in presence of terrestrial predators (Macedonia & Stanger, 1994; Maretti et al., 2010). We also found both species’ loud calls to be unambiguous (for instance, I. indri’s songbit and P. diadema’s chatter-squeal and zzuss). Two studies have addressed the role of species-specific signalling in lemurs (Braune et al., 2008; Rakotonirina et al., 2016), with conflicting results. Support for species recognition driven by advertisement calls has been found in Microcebus spp. (Braune et al., 2008) while acoustic signalling seems not to be involved in species recognition across Eulemur species (Rakotonirina et al., 2016). A mechanism similar to that demonstrated in Microcebus spp. (Braune et al., 2008) could allow I. indri and P. diadema to distinguish among hetero- and conspecifics at distance, in an environment where the acoustic channel is more effective than the visual one (Waser & Brown, 1986). The stereotypy we found in the loud calls of our subject species is partly in line with the Sensory Drive (Endler, 1992) and the Acoustic Adaptation Hypotheses, both of which state that vocal signals are adapted to the environment in which they are emitted (Endler, 1992; Morton, 1975). The acoustic structure of vocal signals, and in particular that of those used for long-distance communication, is expected to be optimized to ensure sound propagation. This is especially true in closed habitats, where higher vegetation density represents a greater surface for reverberation and absorption than in open habitats (Waser & Brown, 1986). However, our results do not fully support the Sensory Drive Hypothesis, since only a small portion of P. diadema’s vocal repertoire (tsk and lost call) clustered with I. indri calls. A study of Microcebus murinus, M. ravelobensis, M. berthae, and M. lehilahytsara also did not support the Sensory Drive Hypothesis, suggesting that predatory pressures may be more relevant in shaping vocal communication than differences in habitat structure (Zimmermann, 2016).

Our findings only partially supported our predictions based on the Social Complexity Hypothesis (Bouchet et al., 2013; McComb & Semple, 2005). The hypothesis predicts that the species living in a larger group—namely P. diadema—would have a bigger repertoire size; McComb & Semple, 2005). We found no support this prediction and P. diadema’s repertoire consisted of a smaller number of different call types than that of I. indri (10, Valente et al., 2019). Moreover, a repertoire including nine call types, with an average group size of five individuals (Irwin, 2008), conflicts with the group size–vocal repertoire size paradigm. At least two other primate species with comparable group size (Saguinus fuscicollis: 5.9 individuals, Leontopithecus rosalia: 5.8 individuals) have a vocal repertoire of 16 call types (McComb & Semple, 2005). However, the Social Complexity Hypothesis also predicts that the species living in an egalitarian social structure, such as I. indri, require a more sophisticated communicative system, in terms of the number of different call types in their repertoire, i.e., the repertoire size; Mitani, 1996). Our results, indicating a smaller repertoire in P. diadema, are in line with this second prediction and with studies on other lemur species. For example, the same deviation from the paradigm group size–vocal repertoire size has been shown in E. rubriventer (with an average group size of three individuals and a repertoire of 14 call types; Gamba et al., 2015) and I. indri (with a group size ranging from 4 to 6 individuals and a repertoire of 10 call types; Pollock, 1975; Valente et al., 2019).

In terms of vocal repertoire size, P. diadema is more similar to I. indri than to other more phylogenetically distant species, such as L. catta (22 call types; Macedonia, 1993) and the sympatric Varecia variegata (16 call types; Pereira et al., 1988; Gamba et al., 2003). Furthermore, the repertoire size in P. diadema is in line with the variation displayed within the Indriidae family (3 to 10; Zimmermann, 2017) and in particular with that of two other Propithecus species, with a repertoire of six (P. verreauxi; Zimmermann, 2017) and 10 call types (P. candidus; Patel & Owren, 2012). Nonetheless, in contrast with the Phylogenetic Hypothesis, besides their size, the vocal repertoires of I. indri and P. diadema differed from each other. This is not surprising, given that the last common ancestor of the two species lived at least 18 MYA (Federman et al., 2016; Kistler et al., 2015; Masters et al., 2013) and that closely related Indriidae species show acoustic differences (P. deckenii and P. coronatus: Fichtel, 2014). Moreover, across lemurs, there is no pattern of vocal similarity based on phylogenetic proximity (Bergey & Patel, 2008; Gamba et al., 2015; Hending et al., 2020; Zimmermann, 2017). This lack of correlation also applies to the Indriidae family (Ramanankirahina et al., 2016). Despite the phylogenetic relatedness, closely related species exhibiting the same social pattern but different activity mode (diurnal vs. nocturnal, respectively) also differ in the complexity of vocal signalling (I. indri and Avahi occidentalis, Ramanankirahina et al., 2016).

Interestingly, P. diadema had the same vocal repertoire size as Daubentonia madagascariensis (studied in captivity; Stanger & Macedonia, 1994), which is a solitary nocturnal species (Sterling & McCreless, 2006). According to some authors, D. madagascariensis descended from the most basal divergence from all other lemur taxa (Delpero et al., 2006), whereas recent evidence suggests that it descended from independent colonization of Madagascar (Gunnell et al., 2018). Thus, considering the phylogenetic history, common ancestry of the vocal behavior of these species is unlikely.

Some of the comparisons we make rely on studies employing analogous methods (i.e., Eulemur spp., Gamba et al., 2015). However, other vocal repertoire estimates rely on different approaches (McComb & Semple, 2005; Stanger & Macedonia, 1994). Thus, our comparisons should be taken with caution; different methodologies used to measure repertoires lead to very different results and the lack of common acoustic and statistical approaches undermines cross-taxa comparisons (Peckre et al., 2019).

The use of computationally accessible and powerful methods opens new perspectives in the study of acoustic signals (Sainburg et al., 2020). The t-SNE embedding allowed efficient analysis of the vocal repertoire of P. diadema, in line with findings on other animal species (mammals: Mus musculus, Megaptera novaeangliae, Pteronura brasiliensis, Macaca mulatta; birds: Taeniopygia guttata; Sainburg et al., 2020). The t-SNE also allowed us to compare the calls of P. diadema with those of another diurnal species in the Indriidae family, I. indri (Valente et al., 2019). In line with studies using unsupervised clustering in the quantitative analysis of animal vocalisations (Gamba et al., 2015; Riondato et al., 2017), we found that the extraction of linear frequency bins revealed a remarkable potential for grouping calls based on their spectrographic similarity, comparable to clusters obtained using dynamic time warping–generated dissimilarity indices.

The standardized technique we employed in this study allowed us to reduce the need for a priori human input and to overcome potential limitations due to human perceptual bias (Sainburg et al., 2020). We do not neglect the importance of previous work, but argue that standardized and reproducible techniques (for alternatives see Gamba et al., 2015, where the authors employed a combination of Dynamic Time Warping and clustering algorithms, or Sainburg et al., 2020, where the authors compared the efficiency of data reduction algorithms across multiple datasets) should be prioritized in the future.


Our study supports previous findings on lemurs: it is likely that Indriidae vocal diversity has been shaped by a combination of social and environmental characteristics, and phylogenetic history (Ramanankirahina et al., 2016). Further research could investigate synapomorphies and autapomorphies in the vocal repertoires of the Indriidae family. For instance, some call types, such as the roar and the lost call (the first emitted in the anti-aerial predator context, the other used to regulate the group cohesion are comparable in structure and functions across Propithecus species (Online Resource 2). Conversely, the main terrestrial disturbance call differs structurally between two groups of Propithecus species, one consisting of the species producing the zzuss (P. diadema, P. candidus, P. perrieri, and P. edwardsi; Patel & Owren, 2012; Anania et al., 2018; Wright, 1998; Macedonia & Stanger, 1994) and the other including the species emitting the tchi-fak (P. verreauxi, P. coquereli, P. coronatus, P. deckenii, and P. tattersalli—representing the so-called western species, evolutionarily split from eastern species; Pastorini et al., 2001; Mayor et al., 2004; Rumpler et al., 2004 but see Herrera & Dávalos, 2016). The acoustic divergence between zzuss and tchi-fak does not completely follow the current spatial proximity of these species’ distributions, or the type of environment (dry forest, rainforest, transitional forest). Furthermore, acoustic differences in the loud calls of closely-related species living in the same environment have been demonstrated (P. deckenii and P. coronatus; Fichtel, 2014). A comparison among Propithecus species could highlight which factors (genetic, anatomical, social, ecological, or biogeographical) have been important in the evolution of vocal signals and provide us with clues about why some acoustic structures have been conserved and others have changed in the divergence of species.