Myriad studies have furthered our understanding of the ontogeny of human behaviour and early neurofunctions that underlie our later capacities and skills. Early human behaviours are complex, dynamic, and diverse. Given commonalities in emerging neurofunctions along development, there are undeniable individual distinctions. One of the most fascinating questions in development is if, which, and how individual oscillations lead to long-term favourable or adverse outcomes? Following a neurodevelopmentalist perspective of development, acknowledging early functions as precursors and prerequisites for later ones, we presume that early deviations or impairments precede suboptimal traits or adverse outcomes, even if the core symptomatology of certain disorders may appear later in development (as for example in the case of autism spectrum disorder, ASD; e.g. Estes et al., 2019). This assumption, also known as deep constructivist notion (neuroconstructivism; e.g. Johnson, 2000; Johnson et al., 2021; Karmiloff-Smith, 1998; Mareschal, 2011; Westermann et al., 2011), is tightly linked to attempts at detecting and defining early functional markers of neurodiversity or atypicality, i.e. predictors of developmental trajectories (D’Souza & Karmiloff-Smith, 2017; Jones et al., 2014; Karmiloff-Smith, 1998, 2009; Marschik et al., 2014b, 2017; Micai et al., 2020). Concerns on whether behaviours reflect potential developmental atypicality or delay or mere diversity in typical development often result from recognising inter-individual discrepancies among peers caused by slowed or divergent functional acquisition within and across developmental domains, which might indicate stagnation or regression of intra-individual development. Notably, although the very early periods of speech-language development are not yet fully understood, atypicalities in the verbal domain are often one of the first perceived signs of neurodiversity during the first year of life.

Taking a closer look at the developing speech-language and communicative system, there is broad consensus regarding the essential role of prelinguistic vocalisations during early infancy for successful development of subsequent verbal functions (e.g. Karmiloff & Karmiloff-Smith, 2002; Locke, 1995; Oller, 2000; Vihman et al., 1985). Verbal development, meaning speech-language and communicative functions, follows a developmental trajectory of increasing complexity, accuracy and stability, thus building the complex human verbal capacity (e.g. Buder et al., 2013; Locke, 1995; Nathani et al., 2006; Oller, 1978, 2000; Papoušek, 1994; Stark, 1980). About four decades ago, stage-models were proposed describing developmental pathways from an infant’s first cry to becoming a competent communicator (c.f. Karmiloff & Karmiloff-Smith, 2002; Koopmans-van Beinum & van der Stelt, 1986; Oller, 1978; Papoušek, 1994; Roug et al., 1989; Stark, 1980). While there are differences in exact definitions and labels for categorically distinct vocalisation types, reported age of onset and stages/phases, and the mastering of certain milestones, researchers have offered similar models which describe evolving verbal functions. In the initial developmental phase, most vocalisations are faint and brief quasi-vowels. This first phase is often referred to as phonation stage or uninterrupted phonation stage (Fig. 1; Koopmans-van Beinum & van der Stelt, 1986; Oller, 2000). Thereafter, emerging at 1 to 2 months of age, vocalisations with articulatory movements of the tongue during phonation are uttered, a stage which was labelled “cooing” or “gooing” phase (Oller, 1978, 2000). Approximately 2 months later, an expansion of vocal and articulatory capacities can be observed. Vocalisation types at this expansion or vocal play stage, are vowel-/consonant-like sounds, squeals, and marginal syllables. These utterances are not yet produced with the articulatory accuracy and timing of adult-speech (Fig. 1; Nathani et al., 2006; Oller, 2000; Stark, 1980). The final stage of prelinguistic development, commonly referred to as canonical babbling stage, marks an infant’s start to produce speech-like syllables, usually starting between 5 and 10 months of age (Oller, 2000). Vocalisations are single or multiple consonant–vowel-combinations with rapid formant transitions between the consonantal and vocalic part. In some stage models, reduplicated and variegated babbling have been proposed as subsequent stages (Oller, 1978; Roug et al., 1989; Stark, 1980). In summary, specific vocalisation types occur in a cascading fashion and become increasingly speech-like towards the end of the first year of life, when the first (proto-)words are uttered. Besides this shift to language-specific phonetic forms, vocalisation types and developmental stages during the first year of life have been considered as universal (cf. Buder et al., 2013 who provide an acoustic phonetic catalogue of pre-speech vocalisations).

Fig. 1
figure 1

The developing speech-language capacity

The classical approach to assess whether the above-mentioned early speech-language milestones are met, follows a perceptual segmentation-annotation-classification procedure of infant utterances. In such studies (which are observational), vocalisation-entities are commonly defined through the breath group criterion (i.e. vocalisation(s) uttered in the exhalation/expiration phase of one breathing cycle; Lynch et al., 1995a; Nathani & Oller, 2001) and segmented accordingly. Other approaches segmenting infant speech have differentiated vocalisations through a pause criterion (e.g. pauses longer 300 ms subdivide vocalisation clusters; Oller et al., 2010). In either way, the segmentation step is usually followed by an annotation process, in which trained listeners assign vocalisations to the predefined vocalisation classes (e.g. Koopmans-van Beinum & van der Stelt, 1986; Lang et al., 2021; Lynch et al., 1995a; Nathani et al., 2006; Oller, 1978, 2000; Roug et al., 1989; Stark, 1980). Recently, a citizen science study externally validated the expert classification of babbling vocalisations and the onset of canonical babbling (Cychosz et al., 2021). Together with findings on auditory Gestalt perception of experts and naïve listeners differentiating early verbal functions of infants with neurodevelopmental disorders (NDDs), this points to the existence of an intrinsic human Gestalt of different vocal categories or typical vs. atypical pre-linguistic vocalisations (Marschik et al., 2012a). Human auditory Gestalt perception, or the adult capacity of intuitively recognising different vocal categories, becomes more robust when evaluating “higher order verbal functions” of infants. Explicitly, babbling vocalisations, being more salient in form, are easier to be categorised by listeners as compared to pre-babbling vocalisations uttered in the first 5 months of life (e.g. Marschik et al., 2012a; Pokorny et al., 2018).

In the first 5 months of life, before the canonical babbling stage, the various stage-models concordantly include descriptions of a developmental pathway from simple phonation to an expansion phase (Fig. 1, e.g. Kent, 2022; Nathani et al., 2006; Oller, 2000). Oller and colleagues introduced a classification scheme of three types of infant vocalisations: cry, laughter and protophones; the latter are defined as precursors to speech and subdivided into vocants, squeals and growls (Jhang & Oller, 2017; Oller et al., 2013). Interestingly, evidence showed spontaneously produced protophones to outnumber cries and laughter from early on (Jhang & Oller, 2017; Oller et al., 2019). The importance of protophones lies, in contrast to cry and laughter, in their functional flexibility. They can be used in variable contexts and may fulfil different communicative functions (Jhang & Oller, 2017; Oller et al., 2013). Besides flexibility in functioning, the ontogeny of vocalisations has been discussed in terms of physiological constraints. Physiological adaptation of peripheral anatomical structures, such as the larynx descent or vocal-tract shape (e.g. Fitch, 2010; Lieberman et al., 2001) as well as neurophysiological changes governing the functional output, shape the development and the increasing complexity of vocalisations (see Fig. 1; e.g. Kent, 2021, 2022; Oller, 2000; Zhang & Ghazanfar, 2020).

In infants with various developmental disorders (DDs), an increasing number of studies has investigated the prelinguistic development aiming to detect early atypical findings and potential associations with later speech-language development (for reviews see for example Lang et al., 2019; Roche et al., 2018; Yankowitz et al., 2019). Canonical babbling, for example, was reported to be delayed or deviant in infants with hearing impairment (HI; Eilers & Oller, 1994; Koopmans-van Beinum et al., 2001; Moeller et al., 2007; Nathani Iyer & Oller, 2008; Shehata-Dieler et al., 2013; von Hapsburg & Davis, 2006), Down syndrome (DS; Lohmander et al., 2017; Lynch et al., 1995b), cerebral palsy (CP; Levin, 1999; Nyman & Lohmander, 2018), Williams-Beuren syndrome (WBS; Masataka, 2001), Cri-du-chat syndrome (CDS; Sohner & Mitchell, 1991), tuberous sclerosis complex (TSC; Gipson et al., 2021), autism spectrum disorder (ASD; Patten et al., 2014; Paul et al., 2011; Yankowitz et al., 2022), Rett syndrome (RTT; Einspieler et al., 2014; Marschik et al., 2012b, 2013), and fragile X syndrome (FXS; Belardi et al., 2017; Marschik et al., 2014a). Findings were however inconsistent and may depend on measures applied. For example, some infants with late detected developmental disorders (LDDDs such as ASD, RTT, FXS) exhibited a delayed onset of canonical babbling whereas others have reached this milestone at an adequate age, i.e. between 5 and 10 months (Bartl-Pokorny et al., 2022; Lang et al., 2019; Marschik et al., 2013; Yankowitz et al., 2019, 2022).

As findings regarding achievement of developmental milestones in infants with DDs were inconclusive, recent research increasingly aimed at gaining in-depth knowledge about early vocal patterns through the extraction and characterisation of acoustic features of emerging verbal functions. For example, in cry but also in spontaneous infant vocal patterns acoustic features like fundamental frequency (lowest frequency of a periodic waveform, usually denoted as F0) or duration of vocalisations have been documented (Borysiak et al., 2017; Buder et al., 2013; Hamrick et al., 2019; Kent & Murray, 1982; Wermke & Robb, 2010). More complex models on analysing acoustic properties of infant vocalisations include machine learning approaches applied on a set of parameters or features on signal level (Pokorny et al., 2020; Schuller & Batliner, 2013). There are established parameter sets for analysing voice features such as the extended Geneva Minimalistic Acoustic Parameter Set (eGeMAPS; Eyben et al., 2015) and the Computational Paralinguistics ChallengEs parameter set (ComParE; Schuller et al., 2013).

The features that are included in such sets can be subdivided into three categories: parameters related to frequency aspects (e.g. pitch), parameters related to the energy or amplitude of the signal (e.g. harmonics-to-noise ratio; HNR) and spectral parameters (e.g. harmonic differences). Another common approach to produce a more specialised parameter set is the usage of the unsupervised Bag-of-Audio-Words (BoAW) approach to the best set of features according to a customised codebook quantisation of the low level descriptors (LLDs). In addition, machine learning models have been applied to vocalisations including neural networks in different varieties, testing classification tasks (e.g. adult vs. infant speech, canonical vs. non-canonical utterances; Ebrahimpour et al., 2020; Warlaumont et al., 2010). In our group, we have utilised a machine learning approach (i.e. support vector machines), that focused on automatic preverbal vocalisation-based differentiation between typically developing infants and infants later diagnosed with RTT, FXS or ASD (Pokorny et al., 2016a2017, 2022). Studies evaluating acoustic features of early vocalisations or applying machine learning models or neural networks will be referred to as “computational studies” hereafter.

Given recent efforts to perceptually classify preverbal vocal patterns and characterise them acoustically, there is still a lack of synergised information in the field of prodromal or pre-diagnostic development in infants with neurodevelopmental or genetic disorders, especially concerning the pre-babbling phase. Therefore, the current article aimed to (i) outline characteristics of age-specific pre-linguistic vocalisations in the first 5 months of age (i.e. the pre-babbling phase), (ii) summarise computer-based approaches for the automated analysis of physiological and pathological pre-babbling vocalisations, and (iii) compare computer-based approaches on atypical early verbal functions and outline their potential to serve as neurofunctional marker of DDs.

Methods

To address the above-mentioned issues, we systematically searched the existing literature for (a) characteristics of and (b) state-of-the-art computational and observational methods on prelinguistic vocalisations in infants with DDs. We conducted two rounds of paper extraction and selection, the first one in September 2021 and a second one in February and March 2022 in the following online electronic databases: PubMed, Web of Science, Science Direct, Scopus, and PsycINFO using the search strings “infan* AND (prelinguistic OR preverbal OR cooing OR babbling OR vocal) AND (syndrome OR “genetic disorder” OR “developmental disorder”)” and “infan* AND vocal* AND (“computational analysis” OR “acoustic analysis” OR “audio analysis”)”.

Following this initial step, we performed an ancestral search for papers from the retrieved articles and searched Google Scholar for further publications. The retrieved articles were screened by two independent raters (CW and SL). Results were discussed with the co-authors, duplicates were removed, and articles were selected according to the following criteria: (1) peer-reviewed; (2) original studies or reviews and meta-analyses; (3) written in English; and (4) focusing on the pre-babbling age (0 to 5 months) in (4a) typically developing infants and (4b) infants at elevated likelihood for or diagnosed with neurodevelopmental disorders (NDDs), late detected developmental disorders (LDDDs), genetic syndromes, or developmental disorders (DDs). Articles of interest were those based on human coder-based assessments (observational studies) as well as articles on machine learning approaches (computational studies). We intended to focus on spontaneous infant vocalisations and excluded all studies analysing or reporting infant cry or distress vocalisations as well as vocalisations from parent–child interaction paradigms (PCI).

Results

Our literature selection process led to a total of 27 papers, 17 of which are on pre-babbling in infants diagnosed with neurodevelopmental disorders or genetic syndromes applying observational methods (Table 1). Six articles focused on DS, seven on ASD (one of them also including infants with TSC), three on RTT or the preserved speech variant of RTT (PSV), and one on PWS. Two of the 17 articles reported acoustic features in addition to observational characteristics. The remaining ten articles focused on acoustic features/computational models, three studies applying computational methods on pre-babbling behaviour in TD infants (Table 2) and seven papers discussed the babbling stage in infants later diagnosed with a DD (i.e. ASD, CDS, PSV-RTT, RTT, WS and one study reporting on ASD, FXS, and RTT; Table 2). It is important to note that a differentiation between spontaneous vocalisations vs. vocalisations in interactive settings could not be reliably done for all articles. Thus, against the initially set exclusion criterion, we decided to report all observational studies of this age-range and outlined information on data sampling whenever possible (Tables 1 and 2).

Table 1 Studies analysing pre-babbling behaviour in infants with neurodevelopmental disorders or genetic syndromes applying observational methods (ascending order)
Table 2 Studies applying computational methods on pre-babbling behaviour in typically developing infants and on babbling behaviour in infants with neurodevelopmental disorders or genetic syndromes (ascending order)

Whilst there is a number of studies reporting early physiological development according to the established stage models (Fig. 1), reports of atypical development in infants with neurodevelopmental disorders or genetic syndromes in the younger ages are rare (Table 1). Most of the 17 included studies report on expanded age-bands up to 24 months; very few explicitly investigate the characteristics of early verbal functions emerging in the first 5 months of life (Brisson et al., 2014; Maestro et al., 2002; Pansy et al., 2019; Zappella et al., 2015). Most studies investigate developing verbal functions applying the classical approach of perceptual segmentation-annotation-classification. There is less effort present in delineating acoustic features (such as duration of vocalisations, syllables or phrases, pitch, fundamental frequency (F0) or intonation contours; Brisson et al., 2014; Lynch et al., 1995a). Observational studies reveal inconclusive results on behavioural differences in pre-babbling vocalisations in infants with DDs and typical development. Compared to TD infants several diverse behaviours have been reported for DD: e.g. longer duration of rhythmic units in infants with DS (Lynch et al., 1995a); divergent intonation contours and less vocal response in interactive settings (Brisson et al., 2014); some participants with ASD failed to achieve the developmental milestone “cooing” (Maestro et al., 2002; Zappella et al., 2015); typical vocalisations interspersed with atypical forceful and/or inspiratory vocalisations in infants with RTT (Marschik et al., 2009); more details on age-specific vocalisations and characteristics of this period are outlined in Table 1.

More advanced methods such as digital measurement instruments and computational analyses open new possibilities for earlier identification of atypical development, as they surpass human capabilities of perception. Most approaches identified aim to describe and investigate trends in the typical development of vocalisations throughout the first 5 months of life. Very early studies focus on a categorical analysis of vocalisations, applying spectral analysis to gain additional insights in addition to the verbal Gestalt-perception (Buder et al., 2008; Lynch et al., 1995a; Oller et al., 2019; Warlaumont et al., 2010). The spectra analysed were acquired through the application of a window function. Most commonly, a fast-Fourier transformation is used to present results as a graphical visualisation, showing the intensity of frequencies at a point in time (Heideman et al., 1985). With the resulting graphical representation, one can visually determine fundamental and formant frequencies (F0 and Fn, respectively) and the general “shape” of a vocalisation (Bauer & Kent, 1987; Kent & Murray, 1982; Oller et al., 2019). The method of spectrography has been applied in studies over the last 3 decades, finding specific intonation patterns in pre-babbling vocalisations and a developmental trajectory of the F0 and Fn (Kent & Murray, 1982). Oller and colleagues used spectrograms to visualise examples of vocants, squeals, growls, and cries at specific ages, providing a visual description of the noise found in the signal as well as other unique features (e.g. F0 contour) of the analysed classes of utterances (Oller et al., 2019). Another feature, which can be identified through inspection of the spectrogram or the waveform of a vocalisation, is the duration of a single utterance. The duration is used in several studies to gain an understanding of how utterance durations change with age (Apicella et al., 2013; Brisson et al., 2014; Lynch et al., 1995a; Smith & Oller, 1981).

More in depth analyses of audio signals require multidimensional parameter sets to provide feature-based representations of the underlying audio segment to a classifier, which can then build an optimal predictor for the classification scheme provided. There are pre-defined parameter sets that are commonly utilised in linguistic and acoustic analyses. Such parameter sets are for example the Computational Pralinguistics ChallengEs parameter set (ComParE; Schuller et al., 2013) or the eGeMAPS (Eyben et al., 2015). These parameter sets consist of low-level descriptors (LLDs). LLDs are parameters that are very closely related to the signal itself (e.g. fundamental frequency F0, loudness). To gain further insights about the general occurrence and statistical behaviour of those LLDs, functionals (e.g. mean, kurtosis, variance) are used on top of these (Schuller & Batliner, 2013).

Yet, in the field of pre-babbling vocalisations, most studies rely on basic features such as duration or fundamental frequency to gain a more in depth understanding of infant vocalisations (Apicella et al., 2013; Brisson et al., 2014; Lynch et al., 1995a; Smith & Oller, 1981). Visual spectrogram analysis has been used to evaluate different vocalisation shapes and help estimate signal to noise ratios in certain vocalisation types (Oller et al., 2019). These approaches, whilst not utilizing advanced computational methods, highlight the importance of particular features for identification of certain vocalisation types and analysis of developmental trajectories. Lynch and colleagues, who focused on a comparison between TD children and children with DS, present the only study that employs a feature-based approach in the analysis of pre-babbling vocalisations in infants with DDs (Lynch et al., 1995a). In this study, the duration of utterances was compared between DS and TD children across respective timelines. For the first 5 months of life, no significant difference was found between TD infants and infants with DS. Nevertheless, the duration of utterances increases until 8 months of age and then decreases until 12 months of age, continuously diverging between TD and DS groups (Lynch et al., 1995a). Although the methodology is not sensitive enough for an accurate differentiation between the two studied groups, it provides a starting point in the identification of possible features that can be used for future analysis of pre-babbling vocalisations (Lynch et al., 1995a). This early phase of verbal development is not yet very well researched in terms of the effectiveness of the aforementioned parameter sets (i.e. ComParE & eGeMAPS). So far, there is a lack of studies applying advanced computational approaches as well as comparative studies that enable rendering a verdict on their applicability (see Table 2). Deep learning approaches have been applied to different settings (e.g. interactive settings, home recordings; Pokorny et al., 2020) of pre-segmented infant audio signals to solve superficial classification tasks (e.g. infant vs. adult, canonical vs. non-canonical). However, none of these studies focused on infants with DDs in the first few months of life (Ebrahimpour et al., 2020; Warlaumont et al., 2010).

Several studies on machine learning approaches applied to vocalisations in the first year of life (pre-babbling and babbling) were identified. In the pre-babbling phase, only three studies utilised approaches beyond the manual analysis of LLDs in the assessment of vocalisations in TD infants (Table 2). To the best of our knowledge, there are no studies available in infants at risk or with a later diagnosis of DDs. These approaches investigate the effectiveness of different neural network architectures (i.e. convolutional neural network, self-organising map and perceptron hybrid network), input features (i.e. spectrograms, waveform, parametric representation), and classification schemes (i.e. infant-directed speech vs. adult-directed speech, infant vs. adult, vocalisation vs. non-vocalisation, canonical vs. non-canonical; vocant vs. squeal vs. growl; Ebrahimpour et al., 2020; Li et al., 2021; Warlaumont et al., 2010). Opposed to that, in the babbling phase, a number of studies analyse verbal capacities utilizing computational approaches (e.g. Pokorny et al., 2018, 2020, 2022). In general, manual analysis of LLDs such as fundamental frequency (F0) is not very common for babbling vocalisations. Spectrographic analysis is very often used only for representational purposes, e.g. to represent different syllable types (e.g. Poeppel & Assaneo, 2020). For analysis and detection of atypical development by utilising computational methods, the number of approaches described is limited (Table 2).

Discussion

Some 40 years ago, the field of early infant vocalisation study was revolutionised with new ways to assess, measure and interpret early development (Koopmans-van Beinum & van der Stelt, 1986; Oller, 1978; Papoušek, 1994; Roug et al., 1989; Stark, 1980). Since then, we have learned a lot about infant prelinguistic development and vocalisation categories. Most studies, however, focused on babbling and the emergence of first words (second half of the first year of life) whilst the pre-babbling phase (first months of life), especially in infants at elevated likelihood for or diagnosed with neurodevelopmental disorders and genetic syndromes, was less researched.

The very early phase of verbal development is mostly described through the achievement of certain milestones (e.g. phonation, cooing, expansion) or via perceptual assignment of infant vocalisations to certain types (e.g. vocant, canonical syllable). Another, albeit still rarely used approach is the description of infant vocalisations through acoustic features (e.g. duration, mean pitch, F0). Studies have only recently focused on the investigation of quantitative changes of different vocalisation types in the first 5 months of life (Jhang & Oller, 2017; Oller et al., 2013, 2019). However, these studies have not assessed infants with developmental disorders or genetic syndromes so far. Threshold definitions, such as the canonical babbling ratio (CBR) applied in the second half of the first year of life, have to the best of our knowledge, not yet been developed or used for types of pre-babbling vocalisations. For the later stages of development, a number of different approaches to define the onset of certain functions (e.g. canonical babbling) providing similar critical time periods in which milestones are achieved (Lang et al., 2021; Molemans et al., 2012; Oller, 2000), have been proposed. Oller and colleagues (Oller et al., 1998, 1999) reported that delayed onset of canonical babbling is a precursor to later adverse linguistic functioning. Whether precursors of atypical development may already be detected in earlier vocalisations has not yet been investigated. Further research observing typical verbal development is still needed for a basis to understand deviant patterns and trajectories.

Besides pioneering the field of perceptively evaluating infant vocalisations, Oller and colleagues were also at the forefront to propose semi-automated recording and analytical tools for the assessment of infant vocalisations (e.g. LENA system; Oller et al., 2010). Challenges of recording preverbal data as well as advantages of automated tools for the acquisition and analyses of acoustic features have been increasingly discussed (Pokorny et al., 2020). The aim of this article is not to discuss pros and cons of automated data acquisition approaches but to focus on whether such undertakings have been utilised in the study of infant vocalisations in the first half year of life, in typical cohorts, in individuals at elevated likelihood for DDs, or groups with DDs or pre-/perinatally diagnosed disorders.

When looking beyond behavioural observations and general perceptual evaluations of early infant vocalisations, there is a lack of computational methods that study, substantiate, and support the findings of observational studies. We found that despite the existence of thoroughly tested computational approaches for babbling-vocalisations, there are no attempts to use these methods in the evaluation of pre-babbling vocalisations. These perceptually less salient vocalisations, as compared to canonical babbling, have preferably been studied through simple LLDs such as F0 and duration. Only a few studies have used more advanced computational approaches to prove the applicability and value of such approaches in the field of pre-babbling vocalisations (Ebrahimpour et al., 2020; Li et al., 2021; Warlaumont et al., 2010). Besides missing analytical approaches, there is also a lack of standardisation of coding-schemes and datasets, which impedes the comparability of performance between applied computational models in the field of speech-language analysis in the first 5 months of life. Additionally, the sample sizes investigated in observational and computational studies are usually small (i.e. 1–119; see Table 2). Generalisation capabilities of machine learning approaches applied on small dataset sets are questionable. Computational or feature-based approaches are underrepresented in studying pre-babbling vocalisations, especially in infants with NDDs (Brisson et al., 2014; Lynch et al., 1995a). To fingerprint early neurofunctional development and its deviations (Marschik et al., 2017), we need in-depth understanding of physiological functioning as well as disorder specific characteristics. Early verbal development is one domain of interest cluing in the integrity of the developing nervous system. Recent development of analytical tools appear well suited for analysing pre-linguistic vocalisations at pre-babbling age to enhance our insights into emerging early verbal functions. Pioneer work is required to verify computational tools in identifying disorder-specific features in early vocalisations, which may inform future clinical diagnoses and be used for monitoring therapeutic success.