Introduction

Vocal fold mass lesions are a main cause of dysphonia [1] and as such many histopathological findings such as polyps, nodes, cysts or oedemas frequently need medical therapy [1]. In some cases, traditional treatment such as pharmacotherapeutical approaches or voice therapy might be considered helpful. For others, however, phonomicrosurgery is often recommended [1].

Vocal fold mass lesions might induce changes to vocal fold stiffness and mass, which alter the oscillatory eigenmode and spatiotemporal regularity [2]. The consequent entrainment of both vocal fold oscillation patterns, which is influenced mainly by vertical vocal fold deflection [3], might be impaired, resulting in a disturbed structure of glottal air pulse generation. Furthermore, asymmetries might arise which influence the strength of the intraglottal vortices and, in turn, vocal efficiency [4]. In addition, some vocal fold mass lesions might block the closure of the membranous part of the vocal folds, resulting in persistent gaps and high glottal area waveform derived open quotients, which cause increased transglottic air flow, even during the most closed phase. On the one hand, this increases noise, and on the other hand, decreases the intensity of the voice source overtones due to the less abrupt interruption of the airflow [5,6,7]. Although vocal fold mass lesions might frequently cause dysphonia [8], not all mass lesions are necessarily associated with voice disorders. Some entities, such as swellings on the free edge of the vocal fold – frequently categorised as nodes – might develop as a consequence of vocal overuse, but do not necessarily result in dysphonic voice [9]. Neither do such swellings necessarily influence vocal fold oscillation patterns nor voice source production and are sometimes denoted as “functional” [9]. Such swellings have been observed in many professional singers without any impairment of vocal function [10, 11]. Thus, as far as there is no suspicion that these swellings are malignant, any indication for surgery should be based on functional aspects rather than on the visual mass lesion itself.

The impairment of vocal function stemming from mass lesions is sometimes not easy to detect because the voice – apart from any evaluation of rough or breathy vocal quality – can be evaluated using a number of different dimensions of vocal capacity [12, 13]. Besides vocal loading capacity, the dimensions of fundamental frequency (ƒo) range and dynamic range have been considered important and are established elements of the voice range profile [14]. Concerning the ƒo range, voice production should not be considered as an homogenous entity. At some points in the ƒo range, biomechanical properties change abruptly leading to changes in vocal quality [15, 16]. Such circumstances can contribute to the definition of vocal registers [17]. Registration events usually occur, according to different biodynamics, in critical regions. Therefore, vocal fold mass lesions frequently impair voice production to a larger extent than the usual speaking voice ƒo range, i.e. the modal or chest register [18].

Because of the changes in vocal fold stiffness and mass, it can be speculated that oscillation patterns would change, not only with regard to the ƒo range, but also under different loudness conditions. In this context, it has been shown that the phonation threshold pressure increased in patients with vocal fold mass lesions and decreased after phonomicrosurgery [19, 20]. However, greater loudness could itself have an effect on vocal fold oscillation patterns. For healthy voices, increasing loudness is associated with greater maximum flow declination rate [7], which depends on the maximum glottal area declination rate and skewing of the glottal area function [21]. It could be assumed that longer duration of collision results in better entrainment of the oscillating systems leading to stabilization of the voice source. However, such stabilization does not appear only in healthy voices. It has been shown by Brockmann-Bauser et al. that jitter values decreased with increasing loudness in patients with vocal fold mass lesions [22]. The influence of different loudness conditions on vocal fold oscillation patterns in patients with vocal fold mass lesions has, however, not yet been clarified.

This study aims to analyze the effect of gradual changes in vocal loudness on vocal fold oscillation patterns. Consistent with the quoted studies, it was hypothesized that (1) open quotient would decrease and (2) perturbation values of the glottal area waveform would decrease with increasing sound pressure level. Furthermore, due to the blockage resulting from vocal fold mass lesions it was hypothesized that (3) the agreement of the glottal area waveform derived open quotient with the electroglottographical open quotient would not be as high as in physiologically normal voices.

Material and methods

After approval from the local ethical committee (Medical Ethics Committee of the University of Munich, 18/769), eight adult patients were included in the study. In order to achieve the greatest contrast of the two vocal folds, patients with unilateral predominant vocal fold mass lesions were involved. Only mass lesions were included in which an extension to the epithelium and superficial lamina propria was expected. Non-surgical therapy (i.e. voice therapy and/or pharmacotherapy) was considered not helpful for all these patients, after multidimensional voice evaluation was undertaken by an experienced phoniatrician, and consequently, phonomicrosurgery was recommended. This criterion was chosen because, one the one hand, it indicates that the mass lesion was accompanied by a dysphony and, on the other hand, could offer data if a non-surgical therapy could – in contrast to the expectation given by the decision for surgery – be meaningful. Table 1 shows age, gender, pathology, Voice Handicap Index (VHI) in the German translation [24] and the Dysphonia Severity Index (DSI) [23]. Fig. 1 displays laryngoscopic images for each subject.

Table 1 Gender, Age, Pathology, Lateralization, Dysphonia Severity Index (DSI) [23], Voice Handicap Index (VHI [24]) and dynamic range (from Voice Range Profile, Lingwaves, Wevosys, Forchheim, Germany)
Fig. 1
figure 1

Laryngoscopic images of all subjects

The subjects were asked to perform, on the vowel /i/, with a ƒo of approximately 250 Hz for the female and 125 Hz for the male voices, an increase of vocal loudness from softest to loudest. During phonation the subjects were simultaneously recorded with transnasal high speed videoendoscopy (HSV), electroglottography and audio recording.

In a similar manner to previous investigations [25, 26] high-speed videoendoscopy (HSV) (Fastcam SA-X2; Photron, Tokyo, Japan) was performed using transnasal endoscopy using a flexible endoscope (ENF GP; Fa. Olympus, Hamburg, Germany) with a frame rate of 20,000 frames per second and a spatial resolution of 386 × 320 pixels. Simultaneous to the HSV recording, the audio signal was recorded using a IMK SC 4061 microphone (DPA microphones, Alleroed, Denmark) or Sennheiser ME 62 microphone (Sennheiser, Wedemark, Germany) and electroglottographic (EGG) signals (EG2-PCX2; Glottal Enterprises, Syracuse, NY) were captured. No anesthetic medication was applied for the transnasal endoscopic approach. The audio recording was calibrated with a sound level meter (Voltcraft, Hong Kong, China) using the Sopran software (Svante Granqvist, Karolinska, Stockholm, Sweden). The HSV videos were post-processed by means of rotation, Fast-Fourier-Treatment in order to remove the comb structure of the endoscope, and cropping as previously [25]described. Calculations of the glottal area waveform (GAW) and phonovibrograms from the HSV films were performed as previously described [27, 28].

For comparison, the signals were rasterized into 100 ms time windows. Mean values for glottal area derived open quotient (OQGAW), electroglottographical open quotient (OQEGG), sound pressure level (SPL), Closing Quotient (Closing Phase/Period, CiQ), Speed Quotient (Opening phase/Closing phase, SQ), and fundamental frequency (ƒo) were calculated for each window using Multi Signal Analyzer (Schäfer/Schlegel, FAU Erlangen-Nürnberg, Germany), as shown in Table 2.

Table 2 Measures and origin

In order to detect OQGAW a tolerance threshold of 5% was set, i.e. that the glottis was denoted as open when the GAW signal exceeded 5% from the baseline. The electroglottographic open quotient was calculated according to the Howard criterion [29]. With regards to frequency perturbation, Jitter for all three voice signals (GAW, EGG, and audio) and the Harmonic-to-Noise-Ratio (HNR) from the audio signal were measured.

In order to compare values for a lower and greater SPL for all subjects the same difference in SPL was identified for all subjects in the following way: The minimal SPL increase during the experiment was found in subject 2, with an increase of 6 dB. Therefore, for all subjects the 100 ms window with greatest SPL and the 100 ms window with greatest SPL minus approximately 6 dB (SPLmax-6) were compared.

The aperiodicity of vocal fold oscillation was found in many subjects at a window in between the minimum (SPLmin) and maximum SPL (SPLmax), and therefore the electroglottographical (EGG) sample entropy [30, 31] was used to detect the greatest changes in the EGG signals. In this respect, the window exhibiting the greatest sample entropy was denoted window 0. The 100 ms windows − 2, − 1, 0, + 1, + 2 relative to the window 0 were analysed.

The Pearson correlation test was used, but due to the small sample size comparative statistics were not considered meaningful.

Results

All subjects were able to perform the task with the different loudness conditions. However, the increase of SPL differed among the subjects. The difference between SPLmin and SPLmax varied from 6 dB (subject 2) to 22 dB (subject 8). Figure 2 shows the trace of SPL, ƒo, OQGAW, OQEGG and the sample entropy for all subjects over the time of the experiment recording. In subject 8 for the 100 ms window 6 there was a drop of OQGAW to zero which was caused by a near total ventricular fold adduction. This window was excluded from later examinations of the SPLmax and SPLmax-6 and the analysis of windows with regard to the greatest sample entropy.

Fig. 2
figure 2

Sound Pressure Level (SPL), fundamental frequency (ƒo), Sample Entropy, Glottal Area (GAW) and electroglottographical (EGG) derived open quotient for each 100 ms time window. The numbers on the x axis refer to each 100 ms window over the course of the experiment

For the 100 ms window exhibiting SPLmax, GAW related measures (OQGAW, SQ, CiQ) showed no large difference to SPLmax-6, Fig. 3; in contrast, OQEGG was greater for SPLmax. JitterGAW showed greater values for SPLmax whereas JitterAudio and JitterEGG showed no large difference to SPLmax-6. The HNR was higher for SPLmax in comparison to SPLmax-6. Figure 4 represents phonovibrograms for a 25 ms time interval at the mid-point of the 100 ms windows for SPLmax and SPLmax-6, respectively.

Fig. 3
figure 3

Box Plots for the window where the maximum SPL (SPLmax, right columns) and where the maximum minus 6 dB where measured (SPLmax-6) with respect to Glottal Area Waveform (GAW) and electroglottographical (EGG) open quotient, speed quotient, closing quotient, GAW, EGG and audio derived jitter and Harmonic to Noise Ratio (HNR)

Fig. 4
figure 4

Phonovibrograms (PVGs) and electroglottographical (EGG) signals of all subjects for a 25 ms window for SPLmax-6 (left) and SPLmax (right)

The expected ƒo, i.e. 125 Hz for male and 250 Hz for female voices, was not achieved by many of the subjects. Some subjects (subjects 4, 6 and 8 (increased ƒo during the experiment), subject 7 (decreased ƒo during the experiment)) showed greater deviations from the required ƒo. (Fig. 2). During the experiment, the greatest vocal instability was found between SPLmax and SPLmin for all but one subject. In the windows where the greatest sample entropy occurred, irregularities of the EGG signal and an increase in OQEGG were also found (Fig. 5). However, in the same windows, there were no large changes in the GAW; in addition neither OQGAW nor the Closing Quotient showed large changes in the 0 window in which the EGG based greatest sample entropy occurred.

Fig. 5
figure 5

Open Quotients for GAW and EGG, Closing Quotient, Sample Entropy and Jitter for GAW, EGG and audio for the − 2 to + 2100 ms windows with respect to the window in which the greatest sample entropy was measured (0 window)

There was no correlation (trend-line equation: y = − 0,0393x + 0,5643, r = 0,084) for OQGAW and OQEGG, Fig. 6.

Fig. 6
figure 6

Open Quotients for GAW versus EGG

Discussion

This study analyzed the effect of gradual loudness changes on vocal fold oscillation patterns. In general, for most subjects, the greatest irregularity was not found at the lowest SPL, but in between the minimum and maximum SPL. Consequently, the data presented here were not able to support the general assumption that the voice is generally stabilized with increasing SPL. Finally, there were indeed strong differences between GAW derived and EGG derived measures.

Vocal performance depends heavily on both frequency and dynamic range [1, 14]. These vocal dimensions are not only important for non-dysphonic voices but also for subjects with vocal impairments arising from vocal fold mass lesions. It has previously been shown that ƒo. might affect vocal performance in professional singer subjects with vocal mass lesions [18]. In contrast to the previous study, no professional singers were examined in the present study and this could be considered the main reason why the required ƒo was frequently not achieved. However, the increase in loudness was found to be accompanied by an increase in SPL for all of the subjects. It should be noticed, however, that the subjects failed to reach the same dynamic range as they did during the clinical testing of the voice range profile. There are many potential reasons for this. One is that the time of the experiment was limited to a recording time of 9 s, producing 32 GB of HSV data, whereas during the voice range profile it was possible to make many repetitions. Another reason is that the transnasal laryngoscope might have influenced voice production arising from increased tension.

The present study hypothesized that regularity of vocal fold oscillations would increase with increasing loudness. In this respect, Brockmann-Bauser et al. [22] observed lower perturbation values derived from audio signals for higher SPL in patients with vocal fold mass lesions as well as in subjects without dysphonia. The data presented here, however, failed to support these findings: The jitterAudio and jitterEGG were almost unchanged between SPLmax and SPLmax-6. Furthermore, for SPLmax, jitterGAW was increased. There are many possible influencing factors, which could contribute to the differences between the findings presented here and the observations made by Brockmann-Bauser et al. [22]. One is that – as noted previously –the dynamic range was lower during the experiment than in the clinical voice evaluation. Furthermore, the data presented refer to the dynamic range of 6 dB which was the lowest observed difference between the minimum and maximum SPL for subject 2. On the one hand, this provides comparability among the subjects. On the other hand, the difference of 6 dB could be considered too small to exhibit greater differences for patients who exhibited a larger dynamic range. Finally, Brockmann-Bauser et al. [22] analyzed audio signals in female voices, only. In the present study a greater number of additional signals were simultaneously analyzed which prevented a study using a larger number of subjects. Last, in the presented study two subjects (subjects 6 and 8) had a greater rise of ƒo during the experiment. Using sinusoidal tones, it has been shown before that a rise of ƒo could be associated with changes of jitter measurements [32]. At least for subject 6 this could in part explain greater jitter values for greater SPL. However, for subject 8 this tendency was present only for the jitterAudio but not for the jitterEGG and jitterGAW.

The greatest irregularities were found in between minimum and maximum SPL. With regards to changes in ƒo previous investigations [18] observed regions, i.e. the passaggio regions, were subjects with vocal fold mass lesions showed greater irregularity of vocal fold oscillations. In the present study, however, there were no clear criteria or regions where irregularity appeared more likely for changes in loudness and the physical value SPL.

HSV derived vocal fold oscillation patterns did not differ greatly between SPLmax and SPLmax-6 with respect to OQGAW, SQ and CiQ. Furthermore, as is seen in the phonovibrograms, there was no lateralization effect, i.e. the pathologic vocal fold did not behave differently to the healthy one. It is interesting that in contrast, OQEGG showed greater values for SPLmax. It should be noted that OQGAW and OQEGG are not equivalent. OQGAW is derived from a superior laryngoscopic two-dimensional view, whereas OQEGG represents the changes in impedance due to the three-dimensional vocal fold contact. it has been shown that, in physiologic voices, the concordance of EGG and GAW signals is greater for the ‘de-contacting’ than for the ‘contacting’ phase [32]. Furthermore, for OQGAW lower than .7, the agreement of OQGAW and OQEGG is high, but for values above 0.7 this agreement is rather low [26]. The data presented here show that, for patients with vocal fold mass lesions, the disagreement for both OQs is much stronger. It could therefore be speculated that impedance changes show an earlier contact of the vocal folds due to the contact of the mass lesion, although the laryngoscopic closure still reveals open parts alongside the mass lesion. Consequently, OQEGG has to be interpreted with caution in patients with vocal fold mass lesions. Furthermore, the EGG based sample entropy was used as a criterion to describe the greatest instability in the vocal fold oscillation patterns. This measure was first introduced by Selamtzis and Ternström for analysis of physiologic voices [30]. It has been shown in non-pathologic voices that registration events can be detected using this measure [31, 33]. However, the data presented showed that the GAW derived irregularities behave differently to the EGG derived data in the time domain. Therefore, any doubts are justified as to whether the EGG based sample entropy can be used for voice evaluation in patients with vocal fold mass lesions.

There are many key limitations of this study. The first limitation stems from the variety of different mass lesion entities which are present. In this study patients with polyp, cysts, node and edema were included. Since the histopathology of the Reinke space differs specifically, the effect on stiffness and vocal fold closure could be expected to be varied. However, it should be noted that for most subjects the greatest sample entropy was not found at the limits of the dynamic range. Also in this respect, only patients with an indication for phonomicrosurgery were included. It remains unclear whether results would be comparable in patients with vocal fold mass lesions, but with a lesser impact on vocal function and, therefore, with no indication for surgery. Also in this context, the study included only patients with predominantly unilateral vocal fold mass lesions. It cannot be excluded that bilateral mass lesions would exhibit different results. As previously noted, the patients were not vocally trained and, therefore, they were not able to achieve the ƒo required in each case. Rising ƒo is frequently associated with greater SPL [7, 17]. Therefore, for subjects exhibiting greater ƒo changes throughout the experiment, part of the differences observed could be related not only to SPL but also to differences in ƒo. Different loudness conditions frequently show different vocal tract shapes [34]; as such vocal tract/voice source interactions [35,36,37] could have influenced the observed vocal fold irregularities in different ways. Also in this respect, SPLmax and SPLmax-6 were used in to compare differences for the various measures. The reason to not use the minimal SPL was that the minimum SPL was frequently found in the voice onset, and that could have a greater impact on the GAW related measures. Furthermore, the signal to noise ratio is lower for lower SPL. However, it cannot be ignored that softer loudness might exhibit a different sensitivity to the measures used.

A further important limitation is that the increase in loudness was not standardized, i.e. the increase in loudness had to be performed over a specific time interval. It could be assumed that coordination and stabilization of the voice might be easier over a longer duration, and therefore would exhibit smaller irregularity. How much the different durations in such experiments influence any irregularity should be analyzed in future investigations. Furthermore, due to the extended recording and analysis setup only eight subjects could be included in this study, which prevented any statistical analysis. It is hoped that greater numbers of subjects can be included in future investigations in order to statistically verify any observed tendencies.

Conclusions

The amount of vocal fold irregularity changes with varying loudness. Therefore, an evaluation of voice under different loudness conditions should be recommended in patients with vocal fold mass lesions. With respect to perturbation values, this study failed to verify lower jitter values for greater SPL. The measures from electroglottographic signals and glottal area waveform differed – and therefore OQ – to a larger extent in patients with vocal fold mass lesions compared to physiologic voices.