Introduction

Head and neck cancers are among the most common malignancies of the body, and laryngeal cancer is the second most prevalent vocal tract involvement [1, 2]. Epidemiological studies have shown that this type of cancer, also known as laryngeal squamous cell carcinoma, approximately claims 30% of all head and neck malignancies and 1% of all cancers, with 150,000 new cases annually [3]. Recent studies have unveiled different aspects of the risk factors involved in the development and aggravation of this worldwide public health issue, including lifestyle, occupation, genetic, inflammatory, and infection factors [4,5,6]. This condition shows its macroscopic manifestations as red or white irregular local thickening resulting from keratosis aggregation on laryngoscopic examinations. This disease is also classified into five main stages in the Staging System designed by the American Joint Committee on Cancer (AJCC); The diagnosis of a patient with one of the first three stages (i.e., 0, I, and II), is early laryngeal carcinoma and with the last stages (III and IV), is known as the advanced laryngeal cancer [7]. Generally, after the first clinical manifestations emergence, the first goal of medical interventions is to prevent progression, eliminate cancerous tissue, and maintain survival. For this purpose, among the different medical approaches, transoral laser microsurgery (TLM) and Radiation Therapy (RT) are known as two options with considerable success rates in controlling this condition [8]. In radiation therapy, the genome of tumor cells is targeted through irradiating intense rays at different dosages. Thereby swelling, severe dryness, scar, and numerous ulcers in the mouth and gum occur as primary side effects, and muscle atrophy, persistent inflammation of the laryngeal tissue, and extensive changes in the genome as secondary [9]. While, TLM is associated with a short course of treatment, no bleeding, destruction of terminal nerves of cancer cells, a delicacy in the removal of malignancy, less damage to the healthy tissues around the tumor, no swelling, tissue dryness, and sticky secretions. However, complications such as Synechiae, fibrotic scarring, temporary loss of taste, and in some cases, infection and tissue burning have been mentioned as problems with this procedure [10, 11]. Along with the inevitable complications of medical treatments for laryngeal cancer, clinical voice findings also confirm the experience of different severity of dysphonia in patients after treatment [12, 13].

Given the fact that auditory-perceptual assessment based on experts' opinions is the gold standard, it is a credible method and an essential component of voice evaluation [14]. Dysphonia Severity Index (DSI) on the other hand, as an objective measure of voice quality with a multiparameter approach due to its replicability, non-invasiveness, objectivity, and ability to investigate the progression across different stages of treatment has been the best available mix method for acoustic—aerodynamic assessing of voice [15, 16]. Studies also have shown that DSI strongly correlates with the overall severity of dysphonia [16, 17]. Voice handicap index (VHI) as a patient self-report assessment provides greater detail about the voice-related quality of life and gets a better understanding of emotional, functional, and physical change of voice after medical treatment in the population with early laryngeal cancer. The Persian language, on the other hand, with the unique features of vowel-consonant and continuous speech, has different from other languages.

Assessing the quality of voice after each treatment modality would significantly assist in providing appropriate and well-timed voice rehabilitation programs and advance the investigation and understanding of the voice status of patients for researchers and specialists. Therefore, given the complex nature of the voice, this study aims to the multidimensional assessment of vocal function in patients with early laryngeal carcinoma that reflects the voice quality based on an integration of objective and subjective measurement.

Materials and Methods

Participants

In this cross-sectional comparative study, 120 patients (116 men and 4 women; age range, 39–65 yrs; mean age, 57.59 yrs) with early glottic cancer (tumoral characteristic: TisN0M0-T1N0M0 or T2N0M0) were divided into two groups according to oncologic treatment type; Patients who had undergone TLM (n = 60) or received RT (n = 60) were chosen based on the purposive sampling method during the routine visit of the ENT clinic of Amir A'alam hospital complex.

The inclusion criteria were as follows: Participants with the tumor size characteristics of Tis, T1, or T2 based on the AJCC criteria [7], no history of metastasis and relapse of the lesion, not consuming any drug abuse or alcoholic drinks in the last 3 months, and being in 6–24 months after completion of the oncologic treatment courses. Further, since this research was performed during the pandemic of COVID-19 and regarding the inevitable impact of the upper respiratory system on the voice quality, patients who had been infected with the coronavirus, common cold, or upper respiratory infections at the time of voice recording were excluded from the research. It should be noted that all patients were under multidisciplinary evaluation and consensus results of the direct laryngoscopy assessment, the written histological reports of lesion biopsy, head and neck CT scans, and the diagnosis by ENT specialist based on the AJCC staging system [7]. Participants' demographic data including age, gender, smoking habits, and primary tumor stage provided in Table 1.

Table 1 Patient characteristics

Voice Recording

In the present study, digital wave-file format voice recordings took place in an acoustic-treated room with a voice recorder (Zoom Corporation, model H1n; Sampling frequency 96 kHz) at an angle of 45° and 10 cm away from the mouth [18]. To ensure accurate signal recording for clinical analysis, background noise levels were less than 38 dB in the sound level meter. Moreover, the values of acoustic parameters were calculated using Version 6.1.56 of Praat software (available for free use at https://www.fon.hum.uva.nl/praat/) installed on a laptop (Fujitsu Lifebook AH531, Fujitso Inc., Tokyo, Japan).

Dysphonia Severity Index

The Dysphonia Severity Index (DSI) proposed by Wuyts et al. [16] is a weighted combination of several voice parameters:

  • Highest fundamental frequency (f0-high), in Hz: Each subject was instructed to phonate vowel /a/, starting at comfortable pitch and loudness and then going up to the highest and down to the lowest pitch.

  • Lowest intensity (I-low), in dB: Each subject was instructed to sustain phonation of the vowel /a/at habitual pitch and loudness and then reduce loudness gradually to the lowest possible intensity.

  • Maximum phonation time (MPT), in s: Each subject was instructed to sustain vowel /a/ three times as long as possible at habitual pitch and loudness after deeply inhaling. The longest MPT measured was used for further analysis.

  • Jitter (Jitt%), in percent: Each subject was instructed to 5 s of sustained phonation of the vowel /a/ at a habitual pitch and loudness. Then steady-state and the midvowel segment of each sample were considered for analysis.

This index is an overall measure of voice quality and is calculated using the following equation [16]:

$$ {\text{DSI}} = 0.13 \times {\text{MPT}} + \left( {0.0053 \times {\text{f}}_{0} - {\text{high}}} \right) - \left( {0.26 \times I{\text{ - low}}} \right) - \left( {1.18 \times {\text{jitter}}} \right) + 12.4 $$

Perceptual Evaluation

In the present research, the Persian version of the CAPE-V test [2] was used for the auditory-perceptual judgment of samples. The participants performed the vocal tasks included in this test (including sustained vowels /a/ and /i/ at habitual pitch and loudness, reading sentences, and 20 s of continuous speech with a free topic). Each voice signal was encoded in random order on a CD and analyzed separately using a headphone (AKG, model K52) by two experienced speech-language pathologists. The judges were asked to rate the “overall severity” of voice disorder based on the blinded rating from 0 = without problems to 100 = severe according to the CAPE-V instructions. Listeners had no limitation on requesting rest or replaying voice samples to ensure their judgment. Moreover, all raters were blinded to participants' scores, demographic and clinical information. Finally, the average rating for each sample was to be used in statistical analysis.

Voice Handicap Index

All participants complete the validated Persian version of the Voice Handicap Index—30 [19]. In this Likert-type patient-report survey, the overall score and score of the three physical, emotional, and functional subscales were calculated and compared between groups. In this questionnaire the highest value represents maximum level of perceived voice handicap. In this questionnaire, a higher score represents a greater sense of perceived voice handicap.

Ethical Approval

This study obtained approval from the ethics committee of the University of Social Welfare and Rehabilitation Sciences (No. IR.USWR.REC.1400.002) in May 2021. In the present study, all participants voluntarily signed an informed written consent after receiving explanations about the aim and process of the research.

Statistical Analysis

Statistical analyses of data were performed using the SPSS 20 software for Windows (SPSS Inc, Chicago, IL, USA). Mean and standard deviation was used to report the descriptive findings of variables. The Chi-squared test and the independent-sample test were used to compare two groups simultaneously. Pearson's correlation coefficient was used to investigate the relationship between DSI, the overall severity of voice disorder, VHI, and its subscales. The statistical differences in P values less than 0.05 were considered significant.

Results

In the TLM-treated, the mean age (range) was 56.70 (46–65) years, and the sex ratio (male: female) was 96.6:3.3%. Also, according to AJCC, in this group primary tumor stage (Tis/T1/T2) was 11/20/29. In the irradiated patients mean age (range) was 58.48 (50–65) years, and the sex ratio (male: female) was 96.6:3.3%. The primary tumor stage based on Tis/T1/T2 classification was 5/4/51. There was no statistically significant difference between TLM and RT groups in the age (P = 0.064), gender (P > 0.999), or smoking habits (P = 0.101) characteristics. However, there were significant differences in the primary tumor stage (P < 0.001). Other details are described in Table 1.

In Table 2, the results of DSI are presented in the groups treated with TLM and RT. There were significant differences in the DSI score of the voice signal (P < 0.001).Ö

Table 2 Results of DSI in TLM and RT group

Table 3 shows the comparing results of auditory perceptual analysis between the TLM and RT groups. No significant differences were found in the auditory-perceptual measure of voice quality (P = 0.196).

Table 3 Results of auditory—perceptual voice analysis based on CAPE-V in TLM and RT group

In Table 4, the results of the correlation between DSI and the overall severity of voice disorder in each group are presented.

Table 4 Results of the correlation between DSI and the overall severity of voice disorder

The results indicated that in both TLM (r =  − 0.295, P = 0.042) and RT (r = -0.613, P < 0.001) groups, there is an inverse and significant correlation between DSI and the overall severity of voice disorder. Table 5 shows the results obtained using the VHI questionnaire in both groups.

Table 5 Results of voice handicap index in TLM and RT group

There was no significant difference in the total (P = 0.227), physical (P = 0.813), and functional (P = 0.969) scores in comparing both groups. However, there were significant differences in the emotional score (P < 0.05). In Table 6, the results of the correlation between DSI, the overall severity of voice disorder and VHI in each group are presented.

Table 6 Results of the correlation between DSI and the overall severity of voice disorder

In the TLM group, DSI with the total score (r =  − 0.313, P = 0.030) and scores of physical (r =  − 0.485, P < 0.001) and functional (r =  − 0.361, P = 0.012) subscales of VHI has an inverse and significant correlation. While in this group, the emotional subscale has no significant correlation with DSI (r =  − 0.092, P = 0.535). Also, none of the VHI subscales correlate with the overall severity of the voice disorder (P > 0.05). The same results were repeated for the RT group. In this group, DSI with the total score (r =  − 0.382, P = 0.007) and scores of physical (r =  − 0.417, P = 0.003) and functional (r =  − 0.381, P = 0.007) subscales of VHI has an inverse and significant correlation. Furthermore the emotional subscale has no significant correlation with DSI (r =  − 0.091, P = 0.539). Also, none of the VHI subscales correlate with the overall severity of the voice disorder (P > 0.05).

Discussion

Considering the importance of voice, speech, and swallowing in different aspects of life, the necessity of rehabilitation programs after medical treatment in patients with early-stage laryngeal cancer is still under debate. The multidimensional study of postoperative vocal function after Transoral Laser Microsurgery and Radiation Therapy using acoustic, aerodynamic, auditory-perceptual, and stroboscopic approaches with particular attention to aspects such as quality of life helps to resolve Contradictions. Also, it is necessary to choose the treatment options appropriate for the patient's condition and increase awareness about the vocal function after oncological treatment. The choice of an appropriate treatment method requires consideration of several factors such as voice outcome, duration of treatment, oncological results, and cost. Different studies have noted similar oncologic outcomes of TLM and RT in cases such as Rate of Local Control, Laryngeal Preservation [20], and better results of TLM for Overall and Disease-Specific Survival [21]. In this regard, the present study investigates the multidimensional aspect of vocal function in patients at the most susceptible parts of laryngeal malignancy, i.e., glottic level.

In this study, an objective analysis of samples with the DSI indicated significant differences in favor of patients treated with radiotherapy. While subjective evaluation of voice quality includes sustaining vowels, reading sentences, and continuous speech samples based on CAPE-V showed no significant differences between the two groups. Consistent with this finding, the results of some studies point to better voice quality in patients following radiotherapy [22,23,24], while others showed that there was no significant difference and the two groups experience similar voice quality [25,26,27]. These findings suggest the necessity of providing the analysis of samples involving both vowel prolongation and continuous speech tasks to create a more accurate and real image of voice quality. Because the symptoms of dysphonia are more evident in continuous speech and the possibility of error decreases compared to analyzing vowel samples [28]. Furthermore, continuous speech analysis contains some natural instabilities, including a shorter duration of vowels and texture effect plus language loading, which play an important role in noting the auditory-perceptual assessment is as the gold standard for voice evaluations (14). Therefore, if our goal is to achieve an ecologically valid image that reflects the real voice of patients in their daily lives, it is imperative to use both vowel prolongation and continuous speech tasks in voice evaluation.

Moreover, a crucial point in the acoustic-aerodynamic analysis of TLM-treated patients was the decreased mean values of DSI, which indicates an increase in the severity of the voice disorder [16, 29]. This finding confirms that irradiated patients experienced better voice quality after completing oncologic treatment courses. A possible explanation for this difference seems related to the disruption of mechanical coordination and vibration of the vocal folds due to the removal part of the delicate structure of the TVC in laser surgery and restorative replacement with fibrotic scarring.

VHI as a subjective evaluation based on the patient's self-assessment provides valuable information about the impact of vocal function and its changes on the voice-related quality of life. In the present study, there was a significant difference in the emotional subscale of VHI, so the radiotherapy group scored lower on this subscale than the TLM group. This finding indicates that the emotional aspect of the voice-related quality of life in the RT group was more affected. Therefore, emphasis on improving the psychological aspect of patients and providing comprehensive treatment for this group is important [30]. This finding is consistent with the study of Peeters et al. [31] and Batalla et al. [32] However, meta-analysis studies of Greulich et al. [33] and Cohen et al. [34] showed similar voice handicap levels for patients with T1 glottic carcinoma undergone RT and TLM.

Other research findings include the significant and inverse correlation between DSI scores and the overall severity of voice disorder, as well as the significant and inverse correlation between total scores and scores on the VHI subscales—except for the emotional score—with DSI. This finding is consistent with the results of previous studies [35, 36]. A possible explanation is that the multiparametric approach is the best method to evaluate voice quality even in samples with high severity of hoarseness, due to the high correlation with harsh and aperiodic components of the voice signal [16, 37,38,39]. Moreover, studies show that the accuracy of auditory perceptual judgments is questionable [28, 40]. Therefore, it is recommended that, in addition to perceptual evaluation, the patient's experiences and quality of life should be analyzed and supplemented with more objective and multiparametric measures such as DSI analysis. It is also suggested that issues such as the vocal function of patients with advanced stages of laryngeal cancer or patients treated with combination modalities such as radiotherapy and chemotherapy as well as additional evaluation such as video stroboscopy and designing specific voice rehabilitation programs be considered in future studies.

Conclusion

The present study compares the voice outcome of patients treated with Transoral Laser Microsurgery or Radiation Therapy in a multidimensional approach. In conclusion findings of this study revealed significant differences in DSI values in favor of RT. This finding showed that TLM-treated patients with early laryngeal glottic carcinoma have severe voice disorder in comparison to irradiated patients. In addition, the VHI questionnaire as a subjective patient self-assessment shows a greater impact on the patients undergone radiotherapy. Therefore, in planning for comprehensive treatment of early laryngeal cancer individual needs such as psychological programs and voice therapy should be considered after the completion of oncological treatments.