FormalPara Core Messages

In the introduction, the references have been searched with a view to subjects where our extended studies of the normal development of voice in combination with pediatric and hormonal development can be used for diagnosis and treatment compared to other development factors.

  • The possibilities and limitations of high-speed videos (HSVs), Voice Range Profiles (VRPs), and electroglottograms (EGGs) for fundamental frequency (F0) and register analysis as well as comparison to pediatric and hormonal stage development are presented in literature references of practical use.

  • High-speed videos in childhood are discussed.

  • Voice Range Profiles are called the audiograms of the voice; age-related dynamic ranges in decibels are compared with their total frequency range as presented in the literature.

  • Electroglottography is an online quantitative measurement curve of vocal fold closure in time, based on a high-frequency current of low intensity through the larynx, especially, to define the point where the vocal cords close. As the closing point is well defined, electroglottography is a good measure for the abrupt fundamental frequency changes in children and during puberty at the laryngeal level.

  • Register changes as a measure in pubertal boys measured with electroglottography, and acoustical measurements are presented together with testosterone measurement.

  • Comparison of voice measurements and pediatric and hormonal stages in the literature are presented.

2.1 High-Speed Videos (HSVs)

The history of HSV is long as illustrated in the book by Woo [1]. The need for devices with more frames per second to visualize the true movement of the vocal folds led to the use of HSV setups for laryngeal evaluation in this study. Videostroboscopy (VS) is useful for classification and standardized scoring, but for a functional evaluation of the vocal folds during phonation, HSV affords the examiner a more representative view of the true vocal fold movements during the development of the voice. When using a standardized protocol for classification, Olthoff et al. found that the rating “not assessable” was mentioned significantly more often with stroboscopy than with HSV [2].

Woo et al. discussed the amounts of pixels for the HSV analysis [3]. Mendelsohn et al. compared HSV with videostroboscopy (VS) for the classification of diagnoses and treatment aspects and found both methods to be valuable [4]. Tsutsumi et al. and Oliveira et al. discussed standardization values for HSV in adults [5, 6]. However, the functional assessment in HSV is better due to the asynchronicities in VS, which is a big problem [7]. The equipment for HSV has become less expensive [8]. Further development of quantitative analyses of HSV is on its way, also based on HSV kymography, including software for fundamental frequency measures on HSV [9,10,11].

Baravia et al. found that the open phases looked longer on HSV than on kymography [12]. Overall, Inwald et al. found rather big variations of many parameters based on HSV in normal persons [13]. Further development eventually, based on 3D closures of each vocal fold, gives the opportunity to measure the closure at various points of the vocal folds, which are of great interest during puberty [14]. Deep learning can facilitate the measurement of the glottis to calculate the distance between the vocal folds at a specific point [15].

Stroboscopy has been an invaluable tool for the classification of diagnoses of vocal folds. The frame rate of stroboscopy setups varies, but the majority records at 25 frames per second. During spontaneous speech under mean phonation, the vocal folds vibrate between 196 and 224 times per second (Hz) for women and between 107 and 132 times per second (Hz) for men, according to Oates et al. [16]. In children, the number of vibrations is much higher.

In the transitory period from childhood to adulthood, the voice experiences physical changes which are not adequately documented. When evaluating the movement of the vocal folds during voice breaks, stroboscopy setups do not visualize the change in frequency that the adolescent experiences as shown with electroglottography. Mansour et al. discussed the accuracy of voice disorders in children [17]. There is a discussion in the literature on the duration of childhood, and a supplementary period of adolescence could be added. Martins et al. discussed dysphonia in childhood from 4 to 18 years of age [18].

Clarós et al. presented selection criteria of children for choirs with HSV [19]. HSV is used for differentiation between normal and pathological voices, and a discussion is represented in children of HSV compared with VS [20, 21]. It is noted that Demirci et al. found that children prefer stiff to flexible scopes [22].

Mecke et al. defined closed quotients of the vocal folds in children on HSV [23]. Patel et al. have made quantitative measures of movements of child vocal folds also compared with adults and found phonation to be more unstable in children when it comes to quantitative measures and found that specific normality overviews in children should be made, and no detailed description of the vocal fold appearance was made in their papers [24,25,26,27,28,29]. For future comparison with, e.g., optical coherence tomography (OCT), HSV is more exact [30]. Therefore, it became apparent that to evaluate the voice breaks in puberty, HSV was needed [20].

HSV is illustrative for visualizing the vocal fold function; Fig. 2.1 is an image from a high-speed video of a postpubertal boy. The recording was done at 4.000 frames per second with 256 × 256 pixels for a full view of the vocal fold oscillation. Figure 2.2 shows 26 consecutive images from the recording covering nearly two full vocal fold oscillations (on a sustained tone /a/ with a stiff scope) of 202 oscillations per second (hertz). The movement is also visualized in the kymography in Fig. 2.3. High-speed kymography is a cross section of the vocal folds at a determined place, in this case, the middle of the vocal folds, which shows the oscillations over a period [5].

Fig. 2.1
A magnified view of the vocal fold of a post-pubertal boy illustrates a hollow passage inside the throat with a pair of lip-like flaps obstructing in between.

An image from a high-speed video of a postpubertal boy

Fig. 2.2
A collage of 26 magnified views of the vocal fold of a post-pubertal boy portraying two complete oscillations. There is a thin gap between the two folds in the first five views in the upper row whereas the gap is more prominent from the seventh to the twelfth view in the lower row.

Consecutive images from HSV

Fig. 2.3
A kymography scan of the cross-section of the central part of the vocal folds. The scan depicts a left-facing triangular surface with a hole in the middle throughout the horizontal axis.

Cross section (kymography) at the middle of the vocal folds

Further, HSV is presented and elaborated in the results.

2.2 Voice Range Profile (VRP)

Voice profile measurement complements the customary measurement of the tonal range of children with simultaneous registration of the dynamic range. A standardization proposal covering this method of investigation from the Union of European Phoniatricians has been available since 1981 [31, 32]. The template was developed as part of this proposal, and it was used in the current investigation. It can be seen in Fig. 2.4. The tones on the abscissa are given in the European and universal scientific way as well as in hertz. The ordinate gives the dB(A).

Fig. 2.4
A voice range profile measurement graph plots decibels versus frequency. The tones of a piano with keys C 2, C 3, C 4, C 5, and C 6 are below the horizontal axis.

Template for Voice Range Profile measurement according to the 1981 UEP standardization proposal

Early attempts at plotting Voice Range Profiles included measuring the dynamic range of given defined semitones with a Brüel & Kjær sound intensity meter. The protocol for measurement included placing the microphone 30 cm from the mouth of the test subject and providing the test subject with a sound from a piano of the desired tone. The test subject was requested to present the given tone as softly as possible, and then as loudly as possible. The respective sound intensities of the tones were determined with the sound intensity meter for 2 s and the documentation forms are manually entered eventually in an Excel sheet. The background noise is mostly up to 40–50 dB(A). However, this type of Voice Range Profile measurement requires some skill, both from the test subject (concerning the repetition of the given tones) and from the investigator (concerning the time interval at which the sound intensity measurement takes place in the process of the tone being reproduced). It is also time-consuming because of the manual documentation of the results of the investigation.

For these reasons, Pedersen et al. developed a computer-assisted Voice Range Profile measurement called phonetograph 8301 [33]. The equipment measures the given minimum and maximum intensities of a tone as the average over a chosen period of time (0.5–5 s), for each semitone, and stores the measured mean values of the tones. The apparatus has been compared with the Voice Range Profile measurement apparatus developed by Wendler and Seidner, and the measurements were agreed to within 96% [34, 35]. Exact and defined measurements were now possible. The standardized calculations of the Voice Range Profile areas in semitones times decibels were possible, preferred by the engineer on a diatonic scale of seven tones per octave. Averaging of Voice Range Profile and ranges in the programs of the phonetograph could be made (software programs in the phonetograph, pg100 and pg200). Total tone ranges were calculated with the chromatic scale of 12 semitones per octave, and standard deviations could be made.

2.2.1 Voice Range Profiles Used in Adults

The development from the use of conventional to computer-assisted measurements can be followed in the literature. After several publications based on conventional data logging, a survey of data collection methods has been made by Cutchin et al. [36]. They made this survey in order to evaluate whether VRP should be a standard method; they suggest that the next step is a standardization of the VRP protocol. A shortened protocol pilot study has been made after the overview, as presented in Fig. 2.5 [37].

Fig. 2.5
A chart lists the options and possible effects on V R P for sampling intervals, vowel choice, mouth opening, vocal registers, mode of production, repeated vocal productions, warm-ups, vocal quality, room acoustics, sound level meter settings, manual versus computerized elicitation, mouth to microphone distance, and coaching and feedback.figure 5

Factors that affect Voice Range Profile. The numbers refer to references in the original paper. “Reprinted from Journal of Voice, Rychel AK, van Mersbergen M, The Voice Range Profile – A Shortened Protocol Pilot Study, Copyright (2021), with permission from Elsevier”

There are multiple types of equipment on the market for Voice Range Profiles. lingWAVES from WEVOSYS GmbH is EU certified. XION GmbH has software under DiVAS for VRP. These two types of equipment are discussed by Caffier et al. in their attempt to simplify the analysis with their voice extent measure (VEM) project, which is proved to be less susceptible to registration programs and gender [38]. The VEM presents a diagnostic tool to quantify the dynamic and frequency range of VRPs. The VRP area is multiplied by the quotient of the theoretical perimeter of a cycle [39]. Using the theoretical perimeter in VEM is an attempt to derive resulting numbers that are easier for people to understand since the VRP area calculations are large numbers in hertz; this, however, limits the underlying information about the area.

A reliable setup was analyzed by Printz et al. with two microphones in a non-sound-treated room; they also commented on inter-examiner reliability [40]. In a study of assessment of voice, speech, and communication, changes were analyzed with equipment produced by Neovius data and signal system AB using Phog software [41]. Voice Range Profiles by Vocalgrama from CTS Informática were used to evaluate the effect of the resonance tube technique [42]. Another type of equipment is described by Sielska-Badurek et al. used for the therapy of clients with muscle tension dysphonia, based on Voice Range Profiles from computerized speech lab [43]. Barret et al. investigated the effect of elicitation methods with Voice Range Profiles and concluded that discrete half steps could elicit maximal vocal performance better than glissando in terms of minimum frequency, maximum frequency and minimum intensity [44].

A comparison between the clinician-assisted and fully automated procedures was made by Titze et al. who concluded that problems of self-inflicted voice abuse in automated procedures and surveillance in a clinician-assisted procedure need to be addressed further [45]. They illustrate the long-lasting discussion of automated equipment. The problem of standardization will include the standardization of the Voice Range Profile apparatus, not only that it is EU certified [36]. Older descriptions of equipment are made by the following: Klingholz and Martin, Seidner et al., Hacki, Pabon, Kay Elemetrics Corp, and Schutte [46,47,48,49,50,51].

The widespread use of Voice Range Profiles in phoniatric research is reflected in the literature. Relationships between tone and total intensity (loudness) have been discussed by Vilkman et al. and Sundberg [52,53,54]. The most comprehensive overview, referred to by Cutchin et al., was made in order to adapt Voice Range Profiles as a routine in the United States [36]. A shortened protocol was suggested by Rychel et al. [37]. The factors that affect the Voice Range Profile measurements are discussed, as presented in Fig. 2.5.

Cardoso et al. and Meerschman et al. showed that Voice Range Profiles can be used for documentation of clinical voice training [55, 56]. Voice Range Profiles are valid in evaluating voice therapy in a randomized clinical setting [57]. Their discussion is about individual voice therapy versus therapy in groups and controls without therapy.

The effect of emotional attachment, emotions as such, and trauma on voice with measuring of Voice Range Profiles has been illustrated by Monti et al. [58]. Correlations between Voice Range Profiles and central auditory processing have been found by Ramos et al. [59]. In pathological cases, there are intensity variations, which have been discussed by Gramming et al. [60, 61]. Hirano refers to the problem which arises during the investigation of nonmusical persons (copying the desired tone exactly, holding the tone) [62]. A technical solution has been found for this problem, involving making the measurement in half-octave steps over a shorter time interval or simply measuring the tone given by the patient.

2.2.2 Voice Range Profiles Used in Children and Adolescents

In children, Pieper et al. found that pedagogical training during 1 year in the third and fourth school years increased the highest frequency with 100.23 Hz, and the lowest tone declined by 18.36 Hz in both girls and boys using Voice Range Profiles [63]. Ma et al. showed that coaching of 6–11-year-olds facilitated greater maximum phonation frequency range using Voice Range Profiles, and Patinka et al. underlined tests of rhythm and Voice Range Profiles during child development due to the physiological and hormonal changes in young voices in ensembles [64, 65]. Zhang used a 3D model where he found that the development of voice could be explained by differences in length and thickness: the lower the F0, the higher the flow rate, the larger the vocal fold amplitude, and the higher the sound pressure level (SPL), the longer the vocal folds [66]. In contrast, the thickness effect dominated and contributed to the larger closed quotient of vocal vibration, larger normalized maximum flow declination rate, and lower harmonics 1–2 in adult males as compared to adult females and children [66]. Berger et al. and Dienerowitz et al. established normative data of fundamental frequency (F0) and tone range in German children and adolescents using the Voice Range Profile [67,68,69,70]. They found that the singing tone ranges were around 2 octaves, and they presented the annual development of the fundamental frequency (F0) under various circumstances as later discussed.

Acoustical measurements have been made in children with the genetic abnormality of Smith-Magenis syndrome, and the authors focus on their neurodevelopmental deficits as the background of the phonatory profiles; they used repeated recordings of the sustained vowel /a/, formant 1 and formant 2 extraction and cepstral peak prominence in order to enlighten the question of the underlying neuromotor aspect of the children; their findings could provide evidence of the susceptibility of phonation of speech to neuromotor disturbances regardless of their origin [71]. Voice Range Profiles could be used to visualize vocal development during childhood compared with pediatric and hormonal development in the pathology of genetic voice disorders [72]. Knowledge of normal voice development is of value for comparison. A review of voice characteristics in Down’s syndrome was carried out by Krishnamurthy and Ramani compared to typically developing children [73]. Acoustically, there was no significant difference; they found a lack of standardized criteria for the Down’s syndrome population. There are in the literature many examples of comparison of pathology to normal development not only of the pediatric and hormonal aspects but also of Voice Range Profiles.

2.3 Fundamental Frequency (F0) Measured with Electroglottography, and Register Analysis

2.3.1 Background

Electroglottography was, among others, introduced by Smith and Fabre as a procedure for investigating the voice [74, 75]. A high-frequency current of low intensity flows through the larynx between two skin electrodes at the level of the vocal cords. The amplitude modulation of the current due to the changes in resistance during phonation represents the movement of the vocal cords over time. We can follow the use of this method for research purposes over several decades by Loebell, Frokjaer-Jensen and Thorvaldsen, Fourcin et al., Lecluse, Guidet and Chevrie-Muller, Kitzing, Smith, Hirose et al., Rothenberg, and Hertegård and Gauffin [76,77,78,79,80,81,82,83,84,85,86].

Dejonckere gives a review of publications that concern themselves with electroglottography and its uses in the book of fundamentals in phoniatrics: Phoniatrics 1 [87]. Stroboscopy is discussed by Eysholdt in the book, together with other investigative procedures such as Voice Range Profile measurement and electroglottography—as basic methods for the classification of diseases of the voice [88]. The method of videostroboscopy is well suited for visualization of the vibrations of the vocal folds for the classification of disorders, not for displaying the phenomena of the functioning of the larynx, which are generally very difficult to understand [89,90,91]. As the function of the vocal folds is represented by electroglottography, it was worthwhile to use a combination of the two methods to obtain a more complete description of the vocal folds from the parameters.

Electroglottography complements stroboscopy. The problem of interpretation of the electroglottography curves (the amplitude and the precise relation to the individual portions of the curve to the phases of the vibration of the vocal folds) can be solved in a satisfactory manner. The first results of the combination of stroboscopy and electroglottography were already available when a lively discussion on the interpretation of the glottography curves took place at the International Conference of Logopedics and Phoniatrics in 1974 [92]. Schönhärl had carried out a systematic registration of the stroboscopic data from patients with voice disturbances, but a statistical analysis of the results of treatment was not possible [93].

We employed the first simultaneous application of stroboscopy and electroglottography, with an electroglottographic apparatus from the Danish company FJ Electronics in Copenhagen, to investigate music students (trained voices) and hospital workers (untrained voices) (Figs. 2.6, 2.7, 2.8, and 2.9) [94, 95]. A difference between the two groups could be found in the closing phase of the tone, where the trained voices of the music students showed a larger angular velocity and a shorter duration. In other respects, the synchronized images of stroboscopy and electroglottography for the two groups were comparable.

Fig. 2.6
A table with 5 columns and 24 rows. The column headers are as follows. Quotient phase, untitled, group 1, group 2, and group 3. The untitled second column lists the average %, S D, 95% single observation, and 95% of mean. The group 1 to group 3 columns list the corresponding values.

Averages and standard deviations from the estimation of electroglottograms; group 1 is hospital staff (untrained normal voices), group 2 is music students (trained voices), and group 3 is four music students with eight repeated measurements. The quotients a/e, a/b, and f/e are significantly different for the music students, compared to the test persons with untrained voices (see Fig. 2.8)

Fig. 2.7
An illustration of the design used by Anastopolo and Karnell comprises a laryngeal mirror with photocell, stroboscopic light, E I - glotograph, and oscilloscope. The trough and crest in the curve of the oscilloscope are labeled closed and open, respectively.

In order to secure the duty cycle, a photocell was coupled to the stroboscope connecting it to the electroglottograph

Fig. 2.8
A curve with a trough is labeled 2 on the trough and 3 and 4 on the top left and top right linear peaks. Below the trough are various-sized bars labeled a, b, c, d, e, and f. The bars a and e are the shortest and longest bars, respectively.

(I) Maximum opening of the glottis, (II) maximum closing of the glottis (stroboscopically determined and transferred from the electroglottography curve). (III and IV) represent the change in resistance during the transition between these two states. a-b closing phase; c-d opening phase; e entire duty cycle; f the area between the two points on the duty cycle where the vocal folds switch between being open and closed during phonation (cf. Fig. 2.6)

Fig. 2.9
A pair of six fluctuating non-sinusoidal electroglottography curves arranged in two rows. The first curves in both rows are one complete cycle long. The second and third pairs are two and three complete cycles long, respectively.

Examples of variants of the electroglottography curve. The maximum opening and closing phases were stroboscopically determined and marked on the electroglottography curve

The electroglottography curve for vocally trained boys corresponded to that of the music students in the lower register. Electroglottography is also suitable for the measurement of changes of register. These changes vary depending on the intensity and thus on whether the measurement is carried out from the low to the high register or from high to low register.

Anastopolo and Karnell have used the design in Fig. 2.7 as the basis for developing an apparatus that makes it possible to combine videostroboscopy and electroglottography [96, 97]. In this way, it is possible to compare various individual investigations and to compare average data, to interpret the results precisely. In addition, clinical use of the method has become possible. This method appears optimal for the representation of the movements of the edges of the vocal folds as described by Smith [74]. Herzel et al. discussed the nonlinear aspects of the movement of the vocal folds [98]. This is further analyzed in high-speed video and chaos software, but only in adults. The analysis of differences between the voices of family members has up to now shown no differences which are not frequency dependent, and this has also been demonstrated by muscular studies [99, 100].

In addition to its use for representing the individual vibrations of the vocal folds, electroglottography is also suitable for the precise registration of the fundamental frequency of the speaking voice [101]. We developed a computer program, by means of which this parameter could be calculated from 2.000 electroglottographic cycles. The measurements took place with a text from the International Phonetic Association, which had been phonetically correctly translated into Danish (“The North Wind and the Sun”) [102]. It was read with a conversational style. The mean value was given in Hz. The tonal range of the speaking voice could be found as the range in semitones. The signals were divided up into semitone windows from 60 to 684 Hz [103,104,105].

The developed electroglottographic software was presented by Kitzing in his thesis and was used in this book for the analysis of the fundamental frequency of the speaking voice [81]. The company Teltec developed a computerized variant of this apparatus. Roubeau et al. introduced electroglottography for the analysis of the fundamental frequency of the speaking voice for registers [106]. The variation in the fundamental frequency by simultaneous analysis of the histogram configuration was analyzed by Fourcin and Abberton in phonetics [78].

Reviews of methods for the measurement of the fundamental frequency show the use of manual estimation methods of electroglottography in scientific studies [107, 108]. Precise frequency analysis (in combination with jitter and shimmer), by computer-assisted evaluation, was performed by Askenfelt in 1980 [109]. The method and duration of the measurements were discussed by Karnell [110]. With computer-assisted speech perception, precise measurements can be made in the future. The possibility of determining the relationship between the fundamental frequency and function in the brain arises [111,112,113,114,115,116]. It will be possible to achieve a better understanding of the central control of voice.

A film with videostroboscopy of Danish boys, during puberty, performed with the Timcke stroboscopy apparatus from Medizinische Hochschule in Hannover, was presented at the Voice Symposium held in Manhattan School of Music, New York [117]. The setup could not capture the changes in the vibrations of the vocal folds during register change. For qualitative documentation of registers, Voice Range Profiles and electroglottography are suitable [118]. Both the last-named methods can be employed for the quantitative recording of changes in the register [119, 120].

Although the objective of this work was not primarily a tonal analysis of trained pubertal voices, the documentation of formant analysis in childhood nevertheless appears interesting [121]. Formant production during puberty is subject to several influences, such as the conditions for the investigation, physical and hormonal development, and vocal technique.

For boys, the changes in the register during puberty, like the fundamental frequency of the speaking voice and the lowest tone of the tonal range, depend on the testosterone level. For girls, no quantitative analysis of this phenomenon has been available. The relationships between hormonal changes and the development of the voice during puberty for girls have been investigated by our research group.

The literature related to the human fundamental frequency in speech is huge as referred to in the overview in the book Phoniatrics 1 [122]. We have presented the fundamental frequency measurements used for development in children and adolescents to be compared with pediatric and hormonal development. A related supplemental literature study has been added in the second edition of this book, along with comments on fundamental frequency in children.

2.3.2 Fundamental Frequency Studies

Fundamental frequency can be measured in many ways. An example of a careful method includes a relative fundamental frequency which considers that fundamental frequency during speech includes voicing of and onsets, and sonorant-voiceless consonant-sonorant constructs [123]. The voice was in the referred case recorded with Sonar artists (Cakewalk, Chicago, Illinois), and data analysis was made with MATLAB (version R2015b, MathWorks, Natick, Massachusetts). A soundproof room was used.

Other methods include that of Poulain et al. who used the DiVAS software (XION medical, Berlin, Germany) to measure fundamental frequency in children during speech, with softest speaking voice, conversational voice, classroom voice, and shouting voice [124]. They also examined young women and described the female voice pitch [68, 69]. The conversational speaking voice is the main interest in our study, as a stable factor usable in comparison with other biological factors. Nygren et al. used their speech range profiles (Soundsell and Phog, Neovius Data och Signalsystem AB, Lidingö, Sweden) for documenting trans-men treatment with reading of a standard text for 40 s [125].

With counting using DiVAS software (XION medical, Berling, Germany), Berger et al. managed to establish a normative curve of the fundamental frequency with the conversational voice in German children, from ages 6 to 18 years [67]. They included measures of Tanner stages in three groups (prepubertal, pubertal, and postpubertal). The pediatric stage results are comparable to our results in the groups.

2.3.3 Studies on F0 with Electroglottography

Videostroboscopy and electroglottography were combined during the therapeutic intervention in voice disorders by Singh et al. who found that complete glottal closure was seen in 93.3% after intervention as compared to 40% of cases during initial examination (p < 0.01) [126]. For electroglottography, they found that a soundproof room is not necessary, because only the first harmonic on a laryngeal level is measured.

A study was made to compare parameters of voice fundamental frequency between children and adults during connected speech and /a/. Objective assessment of noninvasive methods of evaluating vibratory kinematics in children was found to be extremely limited; the authors found that there was an absence of a “knee” on the decontacting slope on electroglottography (EGG) as a difference between children and adults [127]. Herbst and Dunn comment that the EGG signal is an ideal candidate for assessment of the (time-varying) F0 because it is influenced by neither vocal tract acoustics nor background noise [128]. They compared 13 algorithms for estimating F0 based on 147 synthesized EGG signals with varying degrees of signal quality deterioration, with few exceptions of simulated “hum,” frequency, and amplitude, and baselines drifts did not influence F0 results.

Cavalli and Hartley recommend the clinical application of EGG for children, among others, to measure mean fundamental frequency and speaking voice range (speech studio, laryngograph) [129]. Mecke et al. discussed closed quotients in children; the closed quotient data taken from EGG were higher than from inverse filtering, and differences were found compared to HSV [23].

An observational study compared fundamental voice frequencies between acoustical measurement (Piezotronics model 378B20), EGG (Kay Pentax CSL program model 6103), and accelerometer (Dytran instruments model 3225F1). There is a need for new studies with larger samples to get greater accuracy of vocal evaluation. With EGG, it was shown that training should be measured not only of phonation of mean sustained tones but also of the tonal range in the few fundamental frequencies used during speech. EGG waveform shapes appeared to remain essentially constant with F0 over 1 octave [130, 131].

2.3.4 Studies on Registers

There are two main vocal registers, chest and head register. The chest register is the lowest range, and the head register is the highest range, each with a distinctly different vibratory pattern of the vocal folds. The register shift is visible on electroglottography and HSV. On EGG, the exact point when the registers shift can be identified. The register shift is different in high and low intensity. In our measurement, we have used the register shifts as evaluated by the children themselves. The mixed voice is a combination of both the chest and head voices and is used by singers to seamlessly transition between the two, but the transition is still visible on EGG. There exists a whistle register which is used by, e.g., Thomanerchoir.

Mudd and Smith in their review ask for further standardization measures in children [132]. Diagnostic methods are expanding for benign vocal fold lesions in children, though they have not become widely used in practice. Fuchs, in the book Phoniatrics 1, comments that his overview clearly shows the lack of knowledge about normative values, particularly concerning small children [133]. Clarós et al. discusses the association between the benefits of singing in children’s choirs and the development of pediatric voice disorders [19].

2.4 Voice and Pediatric Stages and Hormonal Analysis

2.4.1 Findings on Voice and Pediatric Stages

The development of girls and boys is described in Brook’s Clinical Endocrinology, seventh ed. [134]. Over 30% of boys complete voice breaks by 14 years of age with self-recall. This result has been used for assessing the time of boys’ puberty, parallel to self-recall of menarche in girls.

Berger et al. have an overview of F0 in boys and girls from ages 6 to 18 of the conversational voices, which is presented in Fig. 2.10 [67].

Fig. 2.10
2 multi-line graphs for conversational voice level 2 plot fundamental frequency versus age. The graph for females on the left plots downward-sloping lines for P 97, 90, 50, 10, and 3 with P 97 having the highest set of values. The graph for male on the right plots falling curves for P 97, 90, 50, 10, and 3 with P 97 having the highest set of values.

Overview of F0, the conversational voice in girls and boys aged 6–18. “Reprinted from Journal of Voice, Volume 33, Berger T, Peschel T, Vogel M, Pietzner D, Poulain T, Jurkutat A, Meuret S, Engel C, Kiess W, Fuchs M, Copyright (2019), with permission from Elsevier”

Dienerowitz et al. give an overview of the total singing tone range in Hz during childhood and adolescence in Fig. 2.11 [70]. These measurements are compared with Tanner stages. They also present the total tone ranges in German children, development seen with age, as shown in Fig. 2.12.

Fig. 2.11
Two multi-line graphs for females and males plot f 0 max versus age with falling trends. Females have concave down decreasing curves whereas males have falling curves with fluctuations. P 97 has the highest set of values in both graphs.

Percentile curves of singing ranges in Hz during childhood. “Reprinted from Folia Phoniatrica et Logopaedica, Dienerowitz T, Peschel T, Vogel M, Poulain T, Engel C, Kiess W, Fuchs M, Berger T, Establishing Normative Data on Singing Voice Parameters of Children and Adolescents with Average Singing Activity Using the Voice Range Profile, Copyright (2021), with permission from Karger”

Fig. 2.12
Two multi-line graphs for females and males plot delta f versus age. Females linear trends. Males follow rising trends for P 97 and P 90 and falling trends for P 50, P 10, and P 3.

The natural development of tone range in semitones. The frequency range is around 24 semitones and stays stable over age in males and females in this study. “Reprinted from Folia Phoniatrica et Logopaedica, Dienerowitz T, Peschel T, Vogel M, Poulain T, Engel C, Kiess W, Fuchs M, Berger T, Establishing Normative Data on Singing Voice Parameters of Children and Adolescents with Average Singing Activity Using the Voice Range Profile, Copyright (2021), with permission from Karger”

The timing of puberty varies between the sexes: In females, the normal onset of puberty ranges from 8 to 13 years, averaging 9–10. Thelarche is the beginning of puberty with breast buds under the areola in Tanner stage 2. Pubarche is 1.5 years later with the onset of pubic and axillary hair. Menarche, the onset of menstruation, follows thelarche by 2.5 years (range 0.5–3 years). In males, the onset of puberty ranges from 9 to 14 years; gonadarche is the first visible sexual characteristic when testes volumes reach more than or equal to 4 mL or a long axis greater than or equal to 2.5 cm, in Tanner stage 2. Spermarche, the counterpart of menarche in females, is the development of sperm in males and occurs during Tanner stage 4 [135].

There is limited knowledge regarding the physiological changes of the voice mechanism during puberty that involves significant breathiness in girls and pitch break increase during puberty [136]. In our results, the vocal folds during puberty were always matte in that period on HSV.

Prediction of puberty is of interest with the use of the system MDVP (Key Elemetrics by Pentax) at ages of 8.17/8.83 years in girls and boys, respectively; they conclude that voice analysis may be used by pediatric endocrinologists and otorhinolaryngologists along with other secondary sex characteristics to predict too early puberty in girls [137]. Kent et al. suggest a pediatric reference base. In MDVP, the most sensitive parameters in children from 4 to 19 years are referred to [138]. Chernobelsky found in a longitudinal study that F0 in speech during reading of a standard text with the computer program PRAAT could be used for determining the onset of vocal mutation in singing boys [139].

Hur underscores the importance of genetic influences on pubertal timing in a twin study [140]. Nercelles et al. searched papers published between 1990 and 2019 in PubMed and LILACS; they only found eight pubertal studies on the acoustical modifications and vocal instability in that period [141].

Murray et al. found that children with less sensitive auditory pitch discrimination may be less adept at updating their stored motor programs [142]. Vocal pitch variability and latency of vocal response with event-related potential (ERP) differ as a function of age. P1 amplitude decreased with age, and N1 and P2 amplitude increased in a study of 4–30-year-olds [143]. Bonte et al. compared the cortical response of /a/,/i/,/u/ of age 8–9 years to 14/15 years and found progressive refinement of the neural mechanisms [144].

Radzig et al. underline the options for voice disturbances during normal puberty [145]. This is also in accordance with our HSV findings. A longitudinal study was presented of children below 10 years of age followed during puberty showing lesions on the vocal folds changing or disappearing [146]. Howard et al. present a longitudinal study of three pubertal girls and found positive results of musical stimulation [147]. Seventeen girls aged 9.9–16.11 years were evaluated for their singing ability in puberty, and a sufficient relation to demands was found [148].

Willis et al. made a 12-month longitudinal study in 18 pubescent boys comparing phonation gaps with speaking fundamental frequency (SF0) and weight. They found a certain relation including loss of ability to use the mid- and falsetto vocal range [149]. Bugdol et al. suggest recognition of girls menarcheal stages using voice signals [150].

Ma et al. examined 4–18-year-old children for voice onset stops finding some physiological differences [151]. Yu et al. used /pa/ and /pataka/ to study voice onset time in 4.1–18.4-year-olds, finding that younger children produce longer voice onset time with a higher level of variability. Higher voice onset time values and increased variability were found in boys from 8 to 11 years [152].

Hamdan et al. found a significant association between maxillary arch dimensions and the third formant along with the fundamental frequency [153]. Markova et al. showed sex differences in the morphology of voice-related structures during adolescence, with males displaying strong associations between age (and puberty) and both vocal fold and vocal tract length; this was not the case in female adolescents [154]. Story et al. tried to develop a sex-specific vocal tract of up to 12 years to document prepubertal acoustical differences [155]. In 114 children of 4–17 years, a consistent pubertal effect was observed in the levator muscle and velum [156]. Findings of the presence of the prepubertal sex differences in the oral region of the vocal tract may clarify in part the anatomical basis of documented prepubertal acoustical differences using magnetic resonance imaging and computed tomography [157].

Guzman et al. found that between 15 boys and 15 girls at 7–10 years had, among others, cepstrum and formant 3 on /a/ and shimmer and formant 3 on /i/ differentiated male and female voices [158]. Cartei et al. found differences in shifts in formant frequencies in 6–9 year-old girls and boys. Cartei et al. also found that low-frequency components, low pitch (F0),, and low formant spacing signal high salivary testosterone and height in adult male voices that are associated with high masculinity attributions by unfamiliar listeners in both men and women [159]. Willis and Kenny made a longitudinal study of 20 girls’ weight and voice range and found a contraction of vocal range between 47.5 and 52.4 kg [160].

The important function of the maculae flava was analyzed by Sato and Hirano [161]. Some interesting studies of sex dependency in the laryngeal musculature appeared [162,163,164,165]. Sato et al. clarified the histology of maculae flava during the growth and development of the human vocal fold mucosa [166]. They concluded that the maculae flava including vocal fold stellate cells were included in synthesizing extracellular matrix in the growth and development of vocal fold mucosa. Boudoux et al. discussed methods of, among others, optical coherence tomography for examining child vocal folds [167]. Benboujja et al. investigated the structural organization of the vocal fold microanatomy across gender and age groups using optical coherence tomography and presented a stratified structure from newborns to young adults [168].

Meurer et al. suggest standards of acoustic phono-articulatory facts for adolescents [169]. Fuchs underlines that vocal development during the vulnerable phase of voice change should be cared for especially for vocally intensive professions [170]. Hollien has made an overview of fundamental frequency and tone range during speech in pubescent voices and suggests a baseline for future research. He gave a survey of age-related development of speech fundamental frequency during speech in males and females from 0 to 90 years of age [171].

2.4.2 Findings on Voice and Hormonal Stages

The age of onset of prepuberty in the adrenarche varies from 8.9 (Chile) to 10.3 years (Italy) compared with Tanner stages 1–2. In Denmark, where the author resides, it is average 9.9 years. This is the background for the choice of 3rd to 12th school classes in our book [172].

A survey has been made on the hormonal development of girls back to the genetic beginning (Fig. 2.13) by Sultan et al. [172]. This is of great interest in genetic female voice pathology. In boys, surveys have been made of genetics, pubertal development, and hormonal analysis [134, 145, 173, 174].

Fig. 2.13
A flowchart for the genes involved in female puberty regulation is as follows. Nasal placode, hypothalamus, pituitary, ovary, and E 2, puberty, and A M H. It also flows through energy balance B M I and E D Cs before the hypothalamus.

An overview of the genes involved in female puberty regulation with the hypothalamus in the center. The development starts from the nasal placode in the fetus with the development and integration of GnRH neurons (gonadotropin-releasing hormone-expressing neurons “Reprinted from Best Practice & Research Clinical Obstetrics & Gynaecology, Volume 48, Sultan C, Gaspari L, Maimoun L, Kalfa N, Paris F, Disorders of Puberty, Copyright (2018), with permission from Elsevier”

Knowledge about hormonal predisposition and its consequence for health maintenance, disease development, and individual treatment is a great challenge in the area of voice research [72].

Measurements of sex steroids can be made on saliva as markers of puberty in boys during late childhood and adolescence, which is a progress to identify voice breaks and specially to predict deviations in development [175]. 110 prepubertal children, 58 girls and 52 boys, aged 3–10 years were recorded and evaluated for perceptual masculinity, by 315 adults, 182 women and 133 men, on the basis alone of the voices. Boys had higher salivary testosterone and were rated more masculine [159]. Salivary testosterone levels are higher in males than females in adolescence and in late childhood; in an examination of 9–12-year-olds versus 13–15-year-olds, the study was made in relation to the prosody of the children [176].

Zamponi et al. commented on the pubertal development of voice as related to androgens and estrogens in a big study of sex hormones and human voice physiology from childhood to senescence and described the significant sex-related modifications of the voice organ [177]. Schneider et al. showed that F0 remained high in transgender girls and central white matter did not increase with treatment [178]. In a review of puberty suppression from Tanner stage 2 in transgender children and adolescents, Mahfouda et al. commented that vocal mutation develops in response to testosterone and that the change is not reversible with pharmacological interventions [179]. Nygren et al. analyzed F0 and Voice Range Profiles in trans men from 18 years during testosterone therapy with voice problems; they recommend the substantial group of trans men to be voice assessed systematically during treatment [125].

Esquivel-Zuniga et al. found that hyperandrogenic disorders gave voice deepening in pubertal girls [180]. Stoffers et al. found that testosterone treatment led to a drop in voice in 85% of pubertal boys with gender dysphoria [181].

Yau et al. present an 11-year-old girl with hoarseness and mild laryngeal prominence, where the reason was a complete androgen insensitivity syndrome [182]. Busch et al. found average serum testosterone levels of 8.34 nmol/L, ranging from 0.1 to 26.8 nmol/L before self-evaluated voice breaks were detected [183]. At the time of voice break, testis size was 11.8 mL and genital stage was 3 (2–5). Busch et al. showed a correlation between pubertal development including self-evaluated voice breaks and BMI [184]. The participation rate of their population-based Danish cohort was only 25%.

In the research scope for references, many related papers start with 18-year-olds. Here is a reference where Arruda et al. comment on menstruation in adults with small variations in voices during the cycles [185]. Shoup-Knox et al. measured voice characteristics during the natural cycling of women and found that shimmer was significantly lower in high fertility recordings [186]. Prabhu et al. studied the roles of sex hormones produced during the menstrual cycle on brainstem encoding and speech stimulus [187]. Fouquet et al. found that individual differences in male voice pitch emerge before puberty, already at the age of 7, and it may be linked to prepubertal androgen exposure [188]. Markova et al. found that the lengths of vocal folds and fundamental frequency are a larger predictor of “maleness” than vocal tract length and formant position [154]. Hodges-Simeon et al. discussed the relationship between testosterone levels and fundamental frequency and phenotype [189].

Kirgezen et al. studied the androgen and estrogen receptors of the vocal folds and macula flava in cadavers; they found that they exist within several subunits of the vocal folds, mostly in the macula flava and vocal ligament [190]. Brunings et al. found estrogen receptors and progesterone receptors to a varying degree in vocal fold biopsies that included edema [191]. Grisa et al. found that the impact on F0 of early postnatal androgen exposure showed female tissue to be less sensitive to androgen exposure between birth and adrenarche than during other periods [192].

There are behavioral and neurobiological indicators of a more vulnerable communication system in boys [193]. Fuchs has an overview of psychological complaints of children and adolescents in Phoniatrics 1 [133].

In our introduction, the references have been searched with a view to professional subjects where our extended studies of the normal development of voice in combination with pediatric and hormonal development can be used as a reference factor, including diagnosis and treatment in pathology and comparable to other biological developmental factors.