Introduction

Primates communicate using a limited vocal repertoire, which largely develops in species-specific ways (Seyfarth & Cheney, 1997). The acoustic structure of calls uttered by infants typically resemble the corresponding adult call types, suggesting that vocal structures develop under strong genetic control (Hammerschmidt & Fischer, 2008; Janik & Slater, 2000; Owren et al., 2011), with some room for socially acquired call variants (Ruch et al., 2018; Snowdon, 2009). The acquisition of novel call types is virtually absent in wild primates (Fischer & Hammerschmidt, 2020; Tyack, 2020; but see Lameira, 2017). Overall, primate development of species-typical calls results from a combination of genetic, social, and environmental influences, though the relative role of each is still debated (Fedurek & Slocombe, 2011). Data on very early utterances shortly after birth are critical to assess the departure point in vocal ontogeny, prior to social and environmental influences. However, most primate births occur at night (Dunn, 2012) and are difficult to observe due to the unpredictability of parturition and maternal avoidance of other group members (Nishie & Nakamura, 2018; Otali & Gilchrist, 2006; Ramsay & Teichroeb, 2019), probably as a response to infanticide risk (Palombit, 2012). As a result, primate perinatal behaviours remain poorly understood (Trevathan, 2015), despite their theoretical relevance for developmental research (Nagy, 2011).

Vocal development has usually been analysed at three different levels: (1) production learning — how individuals modify specific acoustic features of calls after exposure to others’ calls, (2) usage learning — how individuals give existing calls in new contexts or combine them as part of new vocal sequences, and (3) comprehension learning — how individuals respond appropriately to the vocalisations of others (Janik & Slater, 2000; Seyfarth & Cheney, 1997; Vernes et al., 2021). In primates, production learning is regarded as mostly fixed, while usage and comprehension learning are considered more flexible (Seyfarth & Cheney, 1997; Snowdon, 2009). However, this model is largely based on studies of alarm calls, which are expected to be less flexible than calls with more social functions, which should instead be the focus when making comparisons with human vocal development (Elowson et al., 1992; Snowdon et al., 1997).

The majority of data on primate vocal development stem from studies of monkey vocalisations (Seyfarth & Cheney, 1997; Tomasello & Zuberbühler, 2002). However, monkeys are arguably less directly relevant to studies of language evolution than great apes, who share a more recent last common ancestor with humans (Langergraber et al., 2012) (Fischer & Hage, 2019; Fitch & Zuberbühler, 2013). Great apes often produce sequences of calls (e.g., chimpanzees, Pan troglodytes verus: Girard-Buttoz et al., 2022), including combinatorial structures (e.g., bonobos, Pan paniscus: Schamberg et al., 2016; gorillas, Gorilla gorilla beringei and Gorilla gorilla gorilla: Hedwig et al., 2014), with some evidence for socially-learned call variations (e.g., orangutan, Pongo pygmaeus wurmbii and Pongo pygmaeus abelii: Lameira et al., 2022). While both humans and great apes use a limited set of sounds comparable in size (e.g., McComb & Semple, 2005; Moran et al., 2012), the ability to combine these sounds hierarchically to form vocal sequences varies greatly between humans and other apes, and sets apart human language from the communication of other animals (Hauser et al., 2002; Townsend et al., 2018). Although vocal learning abilities in great apes are clearly more constrained than in humans, investigating the degree to which great apes vocal sequences are socially learned or hard-wired, and thus present from birth, can inform us about the evolution of more complex vocal structures.

There are only a handful of direct observations of perinatal behaviour in our one of our two closest living relatives, the chimpanzees (Fujisawa et al., 2016; Goodall & Athumani, 1980; Kiwede, 2000; Nishie & Nakamura, 2018; Zamma & Shabani, 2012). As a consequence, very little is known about newborn chimpanzee vocal behaviour, although subsequent stages of vocal development are somewhat better documented (e.g., Dezecache et al., 2019; Laporte & Zuberbühler, 2011; Plooij, 1984; Taylor et al., 2021). Early qualitative descriptions indicate that the first vocalisations of wild chimpanzees are comparable to the corresponding adult call types, such as grunts, whimpers, cries, and screams (Plooij, 1984). Human-reared chimpanzees initially exhibit vocal output that have some similarities to that produced by human infants in the first months of life (Kojima, 2008), although these are often elicited by human caretakers or researchers (Bard, 1998; Kojima, 2008). Human infants are special, however, in producing highly variable and functionally flexible vocal sequences, referred to as babbling — a form of vocal exploration considered a milestone during language acquisition (Oller, 2000; Oller et al., 2021). Typically, babbling starts soon after birth, consists of a subset of the acoustic features characterising the adult repertoire, and does not require a social context or to be communicative (Oller, 2000; ter Haar et al., 2021). In addition to simple vocal practice, one probable function of this peculiar behaviour is to enhance social interactions and bonding with caregivers (Locke, 2006; Oller & Griebel, 2008). However, evidence for babbling-like vocal behaviour is absent in chimpanzees (Oller et al., 2019; ter Haar et al., 2021).

The vocal repertoire of wild chimpanzees consists of a relatively small number of acoustically distinct call types that can grade into each other (Crockford, 2019; Goodall, 1986; Marler & Tenaza, 1977). Different call types often appear in sequences (Crockford & Boesch, 2005; Girard-Buttoz et al., 2022; Leroux & Townsend, 2020), such as pant hoots and food grunts (Leroux et al., 2021) or screams and barks (Fedurek et al., 2015). Whether such call combinations function to convey different information is still unclear and a topic of ongoing research (Engesser & Townsend, 2019; Zuberbühler & Lemasson, 2014). Furthermore, no study to date has investigated how and when the capacity to produce vocal sequences appears during chimpanzee ontogeny.

One vocalization commonly produced by chimpanzees, the pant hoot, comprises smaller vocal components (i.e., phases) produced in an orderly sequence of introduction, build-up, climax, and let-down (Marler & Hobbett, 1975; Marler & Tenaza, 1977). Pant hoot phases, in turn, consist of a varying number of voiced exhalations, which are the smallest units of this vocal sequence and are separated by short periods of silence or panted inhalations (Fedurek et al., 2017). Pant hoots are flexibly produced across many contexts suggesting various functions, including coordinating fission–fusion dynamics (Fedurek et al., 2014), signalling individual and group identity (Crockford et al., 2004; Mitani et al., 1996), signalling social bonds (Fedurek, Machanda, et al., 2013a; Mitani & Brandt, 1994), or signalling social status (Clark & Wrangham, 1994; Fedurek, Slocombe, et al., 2016a). Some phases can be omitted (Fedurek, Zuberbühler, & Dahl, 2016b; Notman & Rendall, 2005) or produced in isolation (Soldati et al., 2022). In addition, phases are sometimes regarded as equivalent to distinct call types or sub-types within the vocal repertoire (e.g., climax as panted scream: Crockford, 2019; Girard-Buttoz et al., 2022). Although pant hoots are amongst the most common and most studied vocalisation in wild chimpanzees (Marler & Tenaza, 1977), data are mainly from sub-adult and adult males (e.g., Crockford et al., 2004; Fedurek et al., 2014; Mitani & Brandt, 1994), and pant hoots are only rarely uttered by immature individuals from around 2 years of age (Hiraiwa-Hasegawa, 1986), with the rate of production increasing with age (Marler & Tenaza, 1977; Pusey, 1990). Importantly, to our knowledge, newborns have never been reported uttering pant hoots, and no study has systematically investigated the development of this vocal sequence.

In this study, we report on the vocal behaviour of a new-born wild chimpanzee in the Sonso community of Budongo Forest, Uganda. We used two methods of call classification: (1) qualitative spectrographic and auditory categorisation of calls supplemented by auditory categorisation by human experts, and (2) quantitative soft clustering analysis to determine distinct acoustic clusters and discriminant function analyses to investigate pant hoot production across age categories.

Methods

Study Site and Population

We studied the Sonso chimpanzee community in Budongo Forest, western Uganda. Chimpanzees from the Sonso community have been studied and followed daily by field assistants since 1990 (Reynolds, 2005). At the time of the study, the community was composed of 71 individuals, including nine adult males and 31 adult females (Table SI Supplementary Material). The main individuals involved in the study were members of the Kutu family (Table I).

Table I Key Chimpanzee Individuals of the Kutu Family Involved in the Birth of KU7 at Sonso, Budongo Forest, Uganda, in November 2019. Kinship Status Refers to the Relation Between KU and Her Offspring

Data Collection

On the 20th of November 2019 at 10:12 am, the first and second authors observed the birth of KU7. Three additional researchers attended part of the afterbirth period and assisted with data collection and identification of callers. We collected audio recordings of all vocalisations produced by KU7 and collected the vocalisations produced by other individuals in the party opportunistically with a directional microphone, the Sennheiser MKH416 (Sennheiser Electronic GmbH & Co. KG, Wedemark, Germany) with a Marantz PMD661 MkII (Marantz, Kanagawa, Japan) solid-state recorder (sample rate 44.1 kHz, resolution 32 bits, ‘wav’ format). We defined party composition as all individuals present within a radius of approximately 35 m of the focal individual (Newton-Fisher, 1999). We set the recorder’s gain on level 9 to maximise the signal/noise ratio due to the softness of calls and the distance from the subject (approx. 15 m). We maintained this distance due to the delicate nature of the event and to reduce any effects of our presence on the chimpanzees’ behaviour. We dictated observations to the microphone or noted them using CyberTracker (ver. 3.496) on a Samsung Xcover 4 portable device (Samsung Group, Seoul, South Korea). We recorded videos using a Panasonic VHC-770 HD (resolution: 1920*1080/50p). We recorded all relevant events and changes in the behaviour of all individuals in the party, and recorded the composition of the party continuously.

Qualitative Acoustic Analysis

We inspected audio recordings to extract vocalisations using spectrograms generated with Praat software (ver. 6.0.42) and Sennheiser HD650 headphones. We transformed calls with the Fourier function using a Hanning window function and 1024 time steps. Four authors independently categorised call types (Table II) based on auditory features and inspection of spectrograms using published chimpanzee vocal repertoires composed of nine call types (Table III). If one of the four authors disagreed with the categorisation, we used group majority to determine the call type. There were no instances where more than one author disagreed with the categorisation.

Table II Terms Used to Describe the Vocalisations Produced by a Newborn Chimpanzee in the Sonso Community, Budongo Forest, Uganda, in November 2019.
Table III Call Types Produced by Chimpanzees Based on Marler and Tenaza (1977), Slocombe and Zuberbühler (2010), and Taylor et al. (2021). The Definition of Pant Hoot is Based on Marler and Hobbett (1975) and Notman and Rendall (2005). Example Spectrograms for Each Call Type Produced by Chimpanzees in the Sonso Community, Budongo Forest, Uganda, are Presented in Slocombe and Zuberbühler (2010)

To provide a more comprehensive and diverse assessment of the call types, we asked seven independent experts in chimpanzee vocal communication (Table SII Supplementary Material), blind to any information about the recordings, to categorise the calls recorded from KU7, estimate the age of the caller, and comment on the vocal structures. We provided an unlabelled audio file in which we collated all the calls produced by KU7 in chronological order, with sequences separated by 1 s of silence (Online Resource 1).

Quantitative Acoustic Analysis

We manually extracted six acoustic features from each call unit using Praat software (ver. 6.0.42): duration of each exhaled unit, fundamental frequencies (F0) at the start, middle, and end of the unit, maximum and minimum F0, and range of the F0. We selected these features based on the acoustic data extractable from the recordings and on measurements typically considered when determining call types in chimpanzees (e.g., Marler & Tenaza, 1977; Mitani et al., 1999; Mitani & Brandt, 1994; Slocombe & Zuberbühler, 2010). We only considered exhaled vocal units to make our acoustic analyses comparable with previous studies of pant hoots (e.g., Clark & Wrangham, 1993; Desai et al., 2021; Fedurek et al., 2013a, b, Fedurek et al., 2017; Mitani et al., 1992, 1999; Riede et al., 2007; but see Crockford et al., 2004), but also noted the number of inhaled (panted) units produced between exhaled units when these were visible on the spectrogram. Because of the quiet nature of the newborn vocalisations, the presence of environmental background noise, and the distance between the newborn and the microphone, we could not use automated procedures to extract acoustic features.

Clustering Analysis

The general approach to studying how experience mediates vocal development is to catalogue the different call types across developmental stages, using acoustic measurements and classification algorithms (Bradbury & Vehrencamp, 2011; Kershenbaum et al., 2016). A common problem is that vocal repertoires are often graded, making objective classifications particularly challenging (Fischer et al., 2017). However, human vocal behaviour is also highly graded, but receivers still perceive transitions in categorical ways, suggesting that human perceptual judgements can be used to disambiguate gradual transitions (Deecke & Janik, 2006; Janik, 1999). For animal vocal repertoires, data-driven categorisation approaches are preferable, mainly because the degree to which human perceptual bias reflects that of other species remains unclear, and because they allow systematic comparisons across communities (Crockford, 2019; Fischer et al., 2017). Soft clustering methods based on fuzzy-set theory (Zadeh, 2008) are very suitable to describe graded vocal repertoires of primates (e.g., chacma baboons, Papio ursinus: Wadewitz et al., 2015), an approach that is also promising for chimpanzees (e.g., immature chimpanzees: Taylor et al., 2021).

We used fuzzy c-means clustering to identify the best fitting model for the number of clusters representing different call types in the newborn vocalisations. The fuzzy c-means algorithm measures the degree to which sounds belong to categories based on their acoustic proprieties without restricting them to a single category, capturing more details than hard clustering methods, including the graded transition between call types. We analysed the stability and reliability of model solutions to evaluate the extent to which the optimal description of the calls depended on a small number of acoustic parameters, and how robust optimal descriptions were to overlap between clusters. We z-transformed the acoustic features prior to analysis to prevent the influence of measurements with different scales (i.e., Hz and s) on cluster solutions. Since fuzzy c-means clustering is based on the individual acoustic features of each call instead of the total number of calls available (Wadewitz et al., 2015), the small number of newborn vocalisations we recorded was not a limiting factor because a minimum number of data points for each call type is not required.

We adjusted two parameters to identify the best cluster solution to describe the newborn calls: the maximum number of clusters extracted (K), and the ‘fuzziness parameter’ (μ) which limits the degree of overlap between clusters (i.e., lower values allow less overlap between clusters). We ran fuzzy models using the “fanny” implementation in the “cluster” package (ver. 2.1.2, Maechler et al., 2021) varying K values from a minimum of two (required to quantify gradation) and a maximum of seven, which matched the number of call types we assessed qualitatively and was in line with Taylor et al. (2021). We varied μ values starting at 1.1 with increments of 0.5 following Taylor et al. (2021) and stopped at 3.0 when all membership coefficients were too close to 1/K, which corresponds to the limit of the algorithm to assign cluster membership to calls (Zadeh, 2008). All models considered converged within 500 iterations. We evaluated the fit and confidence of each solution based on the mean silhouette value of all data points combined, which represents how separable the acoustic clusters are. Silhouette values range from −1 to 1, with positive values representing data points that are closer to their primary cluster and indicate some degree of confidence with regard to their cluster membership, while negative values represent datapoints that overlap between clusters and are potentially misclassified (Wadewitz et al., 2015). We assessed the reliability of the model by looking at the range of μ values obtained for any given K value, which provided an indicator of ‘gradedness’. Solutions for which low and high μ values can be extracted are regarded as more robust to overlap between clusters (Fischer et al., 2017). Using the “clValid” package (ver. 0.7, Brock et al., 2008) we assessed the stability of the clusters by calculating four measurements that compare the result of the clustering algorithm by systematically removing one variable at a time and measuring how much the clusters are based on a small number of acoustic parameters, which represented how ‘generalisable’ the cluster separations are. The four measurements we used are: the mean proportion of non-overlap between data points (APN), the mean distance between data points in the same cluster (AD), the mean distance between the cluster’s centre and the data points in the same cluster (ADM), and the mean variance of data points in the same cluster (FOM) (Brock et al., 2008). Given that our aim was to categorise calls into clusters, we gave priority to mean silhouette values to identify the best model. We extracted a hard-clustering solution for the best-fitting model and assigned all the calls to their primary cluster membership. We then examined the distribution of qualitatively categorised call types in each cluster.

Comparison with Pant Hoots

We conducted additional acoustic analyses to compare vocal sequences produced by the newborn which contained calls resembling pant hoot phases with pant hoots from infant, juvenile, sub-adult, and adult males of the Sonso community. We selected pant hoots produced during resting or feeding events, because pant hoots vary depending on the behavioural context of production (Fedurek, Zuberbühler, & Dahl, 2016b; Notman & Rendall, 2005), and the newborn vocalised while resting or potentially before/after nursing. Although pant hoot sequences can be composed of repeated vocal units from a single phase (Soldati et al., 2022), we selected calls composed of two or more phases to be consistent with previous studies (e.g., Fedurek et al., 2014; Mitani et al., 1999; Notman & Rendall, 2005). To control for potential differences between the sexes in the acoustic structure of pant hoots (e.g., Holden, 2017), we only selected male pant hoots. We selected recordings based on their overall quality (lack of background noise or overlap with other callers) and good signal-to-noise ratio. Although these recordings were of higher quality than recordings of the newborn, we extracted the acoustic measurements manually in the same way to avoid introducing a potential bias. We sampled the first and the middle vocal units of the introduction and climax phases for four units from each pant hoot. This allowed us to take into consideration the acoustic gradation that can occur within phases. Where there were an even number of units, we chose the first of the two middle units (as in Desai et al., 2021). In total, we extracted features from 189 vocal units (42 pant hoots) produced by three infants, four juveniles, four sub-adults, and four adults, with a minimum of two pant hoots per individual (Table SVIII).

To determine whether the acoustic structure of Phase 1 and 2 calls produced by the newborn differed from the introduction and climax phases produced by infant, juvenile, sub-adult, and adult individuals from the Sonso community, we used permuted discriminant function analyses (pDFA; Mundry & Sommer, 2007), following previous studies (e.g., Leroux et al., 2021; Soldati et al., 2022). To analyse the introduction phase, we used 19 vocal units from 11 calls produced by four adults, 21 units from 11 calls produced by four sub-adults, six units from four calls produced by three juveniles, and 19 units from ten calls produced by four infants. Together with the Phase 1 calls (n = 26), we obtained a total of 91 calls. To analyse the climax phase, we used 17 units from 11 calls produced by four adults, 20 units from 11 calls produced by four sub-adults, 18 units from nine calls produced by four juveniles, and ten units from ten calls produced by three infants. Together with the Phase 2 calls (n = 18), we obtained a total of 93 calls. Before analysis, we assessed multicollinearity to avoid including correlated acoustic parameters. We removed at a time the parameter with the highest variance inflation factor (VIF) using the ‘performance’ R package (version 0.8.0, Lüdecke et al., 2021) until we obtained a set of variables with low correlation. In the final set of four variables (Duration, Start F0, End F0, Range F0), the highest VIF for introduction calls was 2.85 and the highest VIF for climax calls was 1.63. We assessed the distribution of the data, and when variables were not normally distributed and this could be improved, we applied a log or squared-root transformation. We then used nested pDFA with 1000 permutations to test whether the acoustic structure of newborn’s Phase 1 & 2 calls differed significantly from the corresponding phases produced by the other age categories (Mundry & Sommer, 2007). In comparison with a conventional DFA, a pDFA allows the inclusion of repeated data points per individual and controls for unbalanced data sets at the same time. We included the ‘ID’ of the caller as a control factor.

We conducted all statistical analyses in R (version 4.1.2, R Core Team, 2021).

Ethical Note

Data collection was entirely observational, adhering to the ASAB guidelines for the treatment of animals during behavioural studies (Association for the Study of Animal Behaviour, 2018). The study was approved by the Uganda Wildlife Authority (UWA/COD/96/5) and the Uganda National Council for Science and Technology (NS 637). The research ethics committees of the University of Neuchâtel (38/2019-B) and University of St Andrews (No 171) also approved this project. We evaluated the scope for bias in our study subjects using the STRANGE framework (Webster & Rutz, 2020) (see Supplementary Material). The authors declare that they have no conflict of interest.

Data Availability

Data on the newborn vocal and perinatal behaviours generated or analysed during this study are included in this article and its supplementary information files.

Results

A detailed report of the birth is available in the Supplementary Material. Video and audio recordings are available as Online Resources (1, 3–5).

Qualitative Call Classification

We recorded 70 call units from the newborn during 2 hours and 15 minutes of observation (0.5 per minute). These calls were divided into 12 separate vocal occurrences (also referred as ‘utterances’; call rate 0.1 per minute), of which three were single calls and nine were call sequences (see Online Resource 2 for the acoustic spectrograms). Vocal sequences contained a mean of 7.4 vocal units (range 2–17). We identified barks (n = 2), grunts (n = 8), hoos (n = 6), squeaks (n = 2), whimpers (n = 8), and units that we labelled as part of a pant hoot (n = 44). We distinguished two variants which we refer to as “pant hoot phase 1” (n = 26), hereafter Phase 1 for brevity, and “pant hoot phase 2” (n = 18), hereafter Phase 2 (Fig. 1). The four vocal sequences that included Phase 1 or Phase 2 were composed of a mean of 11 vocal units (range 5–15) when excluding other call types. Of all the newborn’s Phase 1 and 2 units, 36% (n = 15) included panted units between exhaled units (Fig. 1).

Fig. 1
figure 1

Spectrographic representations of pant hoot calls produced by a newborn chimpanzee in the Sonso community, Budongo Forest, Uganda on 20 November 2019 with pant hoots produced by members of the mother’s family for comparison [sex and age (years) are shown]. For each call, the different phases or types are indicated underneath (Other = other call type). The red asterisk indicates three examples of panted units. Duration (s) on the x-axis. Note the presence of bird songs and cicada sounds above frequencies of approx. 2500 Hz in spectrograms a and b.

Five of seven independent experts agreed with our decision to classify KU7’s Phase 1 and Phase 2 calls as resembling adult pant hoots (Table SIV Supplementary Material). One expert classified the calls as either pant hoots or whimpers, and one expert classified the calls as whimpers. The experts did not reliably classify the other call types (barks, grunts, hoos, squeaks, and whimpers) citing challenging conditions (soft signal volume and background noises), but two experts reported the presence of quiet hoos and grunts among these other call types. Three experts classified the caller as a young individual, one as a juvenile, one as an infant, one as immature, and one as either a juvenile or a young adult.

Quantitative Analyses

We extracted six acoustic features from 70 call units and used them to model the best clustering of acoustically similar units using fuzzy analyses (Table IV). Overall, we obtained 20 unique models varying between two to seven clusters (K) and fuzziness parameters (μ) of 1.1, 1.5, 2.0, and 2.5 (Fig. 2). Two-, three-, four-, and five-cluster solutions could be calculated up to μ = 2.0, while six- and seven-cluster solutions could be calculated up to μ = 2.5. The model that best fit our data was calculated with three clusters (K = 3) and with a fuzziness parameter of 1.1 (Fig. 2), scoring the highest mean silhouette value of 0.450, which indicates confidence in the overall solution. Although this model was not the most stable, only 6% of call units (n = 4) changed membership when we recalculated the model with one less variable (Table SV and Fig. S1 Supplementary Material). In the most stable model (K = 2, μ = 1.1) 2% of call units (n = 1.3) changed membership, but this model had a mean silhouette value of 0.374, which is considerably lower than the best fitting model. Six- and seven-cluster solutions could be calculated for a larger range of fuzziness values, suggesting they might be more reliable. However, these solutions were less consistent in mean silhouette value (range: 0.177 for six clusters, 0.174 for seven) than three-cluster solutions (range: 0.095) (Table SVI Supplementary Material). Furthermore, six and seven-cluster solutions had lower mean silhouette values (0.399 and 0.343 respectively) than the best fitting model with a three-cluster solution.

Fig. 2
figure 2

Mean silhouette values obtained by varying the number of clusters (K = 2 to 7) and fuzziness values (μ = 1.1 to 2.5) using fuzzy c-means clustering. Mean silhouette values measure the confidence of the overall cluster solution of calls produced by a newborn chimpanzee in the Sonso community, Budongo forest, Uganda on 20 November 2019; the higher the silhouette values, the more distinct the acoustic clusters are and the better the model fits the data.

Table IV Mean ± SD of Extracted Parameters for Call Types Produced by a Newborn Chimpanzee in the Sonso Community, Budongo Forest, Uganda on 20 November 2019. Table SIII (Supplementary Material) Contains Extracted Data from Each Call Unit Separately

Cluster Composition

We calculated the percentage of each call type (determined qualitatively) that belonged to each of the three clusters in the best fitting model identified by the quantitative analysis (Fig. S2 Supplementary Material). The first cluster was composed of grunts and whimpers (Table V). The second cluster consisted of barks, squeaks, Phase 2 calls, Phase 1 calls, and grunts (Table V). The third cluster consisted of hoos, Phase 1 calls, and Phase 2 calls (Table V).

Table V Percentage and Number (in Brackets) of Calls per Call Type in Each Cluster Produced by a Newborn Chimpanzee in the Sonso Community, Budongo Forest, Uganda on 20 November 2019. See Table SVII in the Supplementary Material for Within-Cluster Call Type Percentages

Call Combinations

KU7 produced single calls in three instances (grunts only) and nine different call sequences (range: 2–17 units). Of the nine sequences, four were combinations of units from different call types (Fig. 3). Overall, calls from two or three different clusters were produced in a single combinatorial structure, and two to four different calls were combined in a structure.

Fig. 3
figure 3

Four vocal sequences composed of different call types produced by a newborn chimpanzee in the Sonso community, Budongo Forest, Uganda on 20 November 2019. Each series of connected ‘blocks’ represents one of the four vocal sequences. Each ‘block’ indicates the call type and the number of repeated call units of the same call type. Colours represent clusters (Cluster 1 = blue; Cluster 2 = yellow; Cluster 3 = green).

Comparison with Pant Hoots

Pant hoots were composed of a mean of 15 vocal units in infants (range 1–18), 12.6 units in juveniles (range 1–12), 10.2 units in sub-adults (range 1–15), and 8.1 units in adults (range 1–15). A panted unit followed 43% of vocal units produced by infants (n = 21), 59% of units in juveniles (n = 20), 73% of units in sub-adults (n = 37), and 94% of units in adults (n = 44). Vocal usage rate varied with phase and age category (Table SIX). While introduction, build-up, and climax phases were observed across all age categories, the let-down phase was not observed in infants and juveniles (Table SIX). Furthermore, the let-down phase was rarely produced by sub-adults and adults, and the build-up phase was rarely produced by adults (Table SIX).

The discriminant function could not operate when including the introduction (produced by infant, juvenile, sub-adult, and adult males) and the Phase 1 calls produced by the newborn. This was likely because the within-group variance of the variables was lower than the level accepted by the function, which might indicate that variables are collinear or constant (Venables & Ripley, 2002). Because the variables we considered were not collinear and since pant hoots have mainly been studied in sub-adult and adult males, we repeated the analysis including only individuals from these age categories. The results are not compatible with the idea that calls are acoustically different in newborn vs sub-adult and adult males (expected correctly cross-classified: 42%, p = 0.257). For the climax and Phase 2 calls, the results are also not compatible with the idea that calls are acoustically different in newborn vs infant, juvenile, sub-adult, and adult males (expected correctly cross-classified: 26%, p = 0.164). We repeated the analysis including only the newborn, sub-adult, and adult individuals, and found similar non-significant results (expected correctly cross-classified: 40%, p = 0.266).

Discussion

We qualitatively discriminated seven call types from the 70 units produced by a newborn chimpanzee immediately after birth, in line with what was previously reported for older chimpanzee infants (Plooij, 1984). The majority of units were given as part of sequences. Interestingly, the newborn also produced vocal structures resembling pant hoots. Quantitative analyses revealed three acoustically distinct clusters of calls, with calls from different clusters combined into the same sequence. All call types were also produced in isolation, with the exception of Phase 2 (pant hoot) calls and squeaks, which were only produced in combination with other calls. From these data we concluded that chimpanzees have the capacity to combine some call types into larger structures from birth.

We can suggest four hypotheses explaining what could trigger the newborn’s vocal production. First, the newborn produced a series of long vocal sequences to attract the mother’s attention, for example due to discomfort or desire to be nursed. However, we did not notice any of the more typical calls for such contexts (i.e., cries and whimpers; Dezecache et al., 2021) and the newborn was always in bodily contact with the mother during our observations, making this explanation unlikely.

Second, the newborn’s vocal behaviour may represent a rudimentary form of babbling (Oller, 2000), which serves as vocal practice and to elicit care-giving (ter Haar et al., 2021). However, the newborn produced vocal sequences composed of repeated vocal units that were rhythmically produced and only contained a subset of adult calls. Furthermore, the sequences lacked variable acoustic structures, did not elicit vocal or social responses from the mother, were produced at lower call rates than human and marmoset infants (Elowson et al., 1998a, 1998b; Oller et al., 2021; Snowdon & Elowson, 2001) and at comparable rates with chimpanzee and bonobo infants (Kojima, 2008; Oller et al., 2019; Taylor, 2020), making the babbling hypothesis an unlikely explanation. In marmosets, one of the few primates where babbling has been reported, parents engage in vocal feedback and exchanges (Takahashi et al., 2015), while chimpanzee mothers rarely direct vocalisations to their offspring (Schick et al., 2022). While language-trained or human-raised chimpanzees have been reported to produce vocalisations similar to babbling when interacting with researchers (Hayes & Hayes, 1951, pp. 106–108), the behaviour is different from human infant babbling in terms of variety, quantity, and duration (Kellogg, 1968; Kojima, 2008). In contrast, the newborn vocalisations presented some similarities with protophone-like sounds produced by young bonobo infants (Oller et al., 2019). These sounds are regarded as akin to the exploratory protophones produced by human infants since birth during low- to moderate arousal contexts, without requiring social stimulation, and prior to babbling (Oller, 2000; Oller et al., 2016). However, the newborn chimpanzee produced sounds at much lower rates than humans, without clear signs of playfulness, and without interacting vocally with the mother, all of which also characterise vocal behaviours in infant bonobos (Oller et al., 2019). One possibility is that the period of vocal exploration in great apes is very reduced and limited to the earliest developmental phase, although further observations are necessary to test this possibility.

Third, the newborn vocal behaviour may have been an artefact resulting from limited vocal control. Interestingly, Phase 1 calls were always followed by Phase 2 calls in sequences, but never the other way around, and terminated with squeaks followed by grunts. This call order is akin to that of pant hoots, in which introduction units are followed by climax units, and akin to how pant hoots tend to be followed by food grunts in call combinations (Leroux et al., 2021). These observations do not support the artefact hypothesis, although there may have been anatomical constraints on vocal production. For instance, in adult pant hoots, the climax is never produced in isolation but is always preceded by an introduction or build-up phase (Soldati et al., 2022), perhaps because producing high-pitch and high-amplitude calls requires more time and effort (Riede et al., 2007).

Finally, intra-uterine auditory exposure to conspecifics’ calls may have affected the newborn’s vocal development. While we cannot address this hypothesis with our data, it has been documented in marmosets (Narayanan et al., 2022), humans (Gervain, 2018; Varga et al., 2019), and songbirds (Colombelli-Négrel et al., 2021), all of which are regarded as vocal learners (Vernes et al., 2021). Further studies are needed to clarify the effects of pre- and post-natal auditory exposure on the development of great ape vocalisations.

The most puzzling aspect of the newborn’s vocal behaviour was the presence of vocal structures that acoustically and visually resembled chimpanzee pant hoots, with clear resemblance to adult as well as infant and juvenile pant hoots. Most experts rated these structures as pant hoot attempts. Although some experts rated them as whimpers, cluster analyses revealed that they did not belong to the whimper cluster. Phase 1 calls closely resemble the introduction phase and Phase 2 calls closely resemble the climax phase, which was supported by the results of the discriminant analyses. All the hoo calls produced by the newborn belonged to the Phase 1 cluster, in line with the idea that the introduction can be seen as a variant of hoos (Crockford, 2019). The production of panted units, a characteristic feature of pant hoots, followed an incremental pattern from the newborn through all age categories, suggesting that it develops during ontogeny. In addition, we agree with the experts, who pointed out that the overall ‘rhythmicity’ of the newborn sequences is characteristic of pant hoots. Thus, the newborn produced vocal structures resembling pant hoots, suggesting that such complex structures are part of the innate vocal repertoire of chimpanzees. However, the newborn utterances differed from adult pant hoots in terms of absence of phases that correspond to the build-up and let-down. Alternating phases with a relatively low rate of unit production, such as the introduction and climax, with phases exhibiting a fast-paced and panted unit production, such as the build-up and let-down, is a key feature of pant hoots.

The differences between the arguably rudimentary form of the newborn pant hoot and adult pant hoots suggest that this call type undergoes some ontogenetic processes. Immature individuals might learn to produce certain phases in specific contexts or as part of structurally varying sequences (usage learning; Janik & Slater, 2000; Marshall et al., 1999), as well as pant hoots that resemble the pant hoots of group members or social partners (production learning; Ruch et al., 2018). The latter hypothesis is supported by the presence of community dialects (Crockford et al., 2004; Mitani et al., 1992) and by the stronger call similarity between social partners (Mitani & Brandt, 1994; Mitani & Gros-Louis, 1998), although genetic or ecological factors might also explain community differences (Mitani et al., 1999; Desai et al., 2021). While some developmental changes result from the maturational process (Nishimura et al., 2003), systematic study of the acquisition of vocal capacities, especially at the early ontogenetic stages, is a key missing element in the current debate on vocal learning in primates, particularly in great apes (e.g., Fischer et al., 2015; Watson et al., 2015a, 2015b). Our observations fit with the idea that primate vocal repertoires are largely fixed and present from birth (Fischer & Hammerschmidt, 2020), although they also indicate that fine acoustic structures undergo ontogenetic processes.

The limitations of our study include the short observation period and the small dataset, which reduce our ability to generalise from our study. Having complete access to the context of production typically facilitates call classification, although it can also be misleading with graded and flexible calls (Fischer & Price, 2017; Schamberg et al., 2018). For instance, pant hoots are produced across most contexts and in response to conspecific calls (Goodall, 1986), while food grunts are related to both feeding and agonistic events (Ischer et al., 2020; Marler & Tenaza, 1977). The categorisation of calls based on acoustic features is less subject to biases when the context of production is particularly unclear or flexible, as in pant hoots and immature calls. When applying fuzzy clustering on small datasets, calls can appear more discrete since they are less likely to represent the entire repertoire. In addition, extracting a small number of features can lead to higher spread of values but does not necessarily indicate better separation (Wadewitz et al., 2015). Because we did not observe the production of typical newborn utterances such as cries or screams (Kojima, 2008), it is possible that their absence affected our quantitative analyses. Specifically, the best solution was the most distinct but was also more influenced by a smaller subset of features. However, in a recent study the repertoire of infant chimpanzees was best described by a two-cluster model, with evidence of a potential third cluster (Taylor et al., 2021), which provides further validity to the model describing the newborn vocalisations. We do not claim that there are only three call types in the repertoire of newborn chimpanzees. To determine its true size, it will be necessary to investigate how receivers react to each vocalisation (Seyfarth & Cheney, 2017), including to graded variants (e.g., Fischer, 1998; Fischer et al., 2001).

The observation of a birth in a wild chimpanzee community provided a rare opportunity to investigate the vocal behaviour of a newborn chimpanzee, the starting point of a long developmental trajectory. The newborn demonstrated the capacity to combine different call types into larger vocal sequences. Some of these combinatorial structures were composed of unique calls that shared several characteristics with pant hoots and were identified by expert human listeners as such. Consequently, our study suggests that acoustically complex structures, akin to adult pant hoots, are part of the chimpanzee vocal repertoire from birth, and that these sequences are subject to ontogenetic processes that shape their acoustic structure. While extensive work has been conducted on adult male calls, and combinatorial capacities have recently gained attention, further work is necessary to elucidate the production, usage, and comprehension of complex vocal sequences in primates from an ontogenetic perspective. Although we remain careful in interpreting observations from a single individual, we believe it provides a valuable contribution to the study of chimpanzee vocal development that will hopefully encourage further research on the ontogeny of great ape vocalisations.