Introduction

Voice impairment in patients after treatment for early glottic cancer has been reported in several studies, ranging from 14 to 92% of the patients [1, 6, 9, 16, 24, 25]. Furthermore, several studies on the influence of voice problems on quality of life revealed that in 27 up to 58% of the patients experienced difficulties in communication abilities leading to a disrupted social life [3, 8, 12, 17, 1922, 25]. To enable quick screening on voice problems, a short 5-item voice-screening questionnaire was developed and validated, which proved to be feasible in clinical practice [22]. A more detailed multidimensional voice analysis protocol is however recommended for monitoring voice intervention and for research purposes [24], including a structured questionnaire such as the Voice handicap index (VHI). The VHI is a validated 30-item questionnaire measuring psychosocial handicapping effects of voice disorders [7] and is used in several studies on patients after treatment for early glottic cancer with mean VHI scores ranging from 12 to 34 points [3, 8, 12, 19]. Most of these studies include patients with and without deviant voice quality and mean VHI data are therefore not informative on the amount of problems that patients with voice impairment after oncological treatment encounter in daily life. In a study on 23 patients with voice problems after oncological treatment, Van Gogh et al. [23] reported a mean VHI score of 35. However, interpretation of how cancer patients cope with voice problems compared to patients with voice problems due to benign laryngeal lesions and compared with the normal population is difficult because of some underexposed psychometric characteristics of the VHI: data from the normal population are limited, no clear clinical cut-off score is available, and information on clinical relevant difference scores is scarce.

The purpose of this study is to compare voice problems of patients after treatment for early glottic cancer with voice problems as reported by patients with benign voice disorders and subjects from the normal population. The study will also provide psychometric information of the VHI regarding internal consistency, reliability, normative data and a clinical cut-off score, and clinical relevant difference scores for use in individual patients and group study designs.

Materials and methods

Patients

The patient sample consisted of 232 subjects: 35 patients with voice problems after treatment for early glottic cancer and 197 patients with voice problems due to benign voice disorders.

Patients after treatment for early glottic cancer (carcinoma in situ, T1 and T2 tumours) were selected based on a validated voice-screening questionnaire; having a voice problem was defined as a score of 5 or higher (on a 10-point scale) on one of the 5 voice items [22]. Of these 35 patients, 33 were males, 2 females; the median age was 62 years (range: 41–81); mean post-oncological treatment time was 32 months (range: 6–135). Treatment included radiotherapy (n = 24) or endoscopic laser surgery (n = 11); mean VHI scores regarding treatment modality were comparable (37 vs. 36 points).

Patients with voice problems due to benign voice disorders were randomly selected from the patient population at our voice clinic. This cohort of 197 patients included 44 patients with vocal fold paresis, 84 with structural lesions (polyps, nodules, scarring, granuloma), 10 patients with Reincke’s oedema, 55 patients with laryngitis, and 5 patients with laryngeal trauma. Of these 197 patients, 82 were males and 115 females; median age was 46 years (range 18–90).

Controls

The group of 123 randomly selected controls from the normal population (employees from the hospital and (acquaintances of) relatives and neighbours of the researchers) consisted of 54 males and 58 females (gender was not indicated by 11 subjects); median age was 55 years (range 23–87).

Voice handicap index

The VHI is a validated questionnaire measuring psychosocial handicapping effects of voice disorders and was translated and validated in Dutch. The VHI consists of 30 statements on voice-related aspects in daily life (with 5 response levels, scored 0 to 4). Summarising the scores on the 30 statements leads to a total VHI score, ranging from 0 to 120. A higher score corresponds to a worse voice-related functional status. Furthermore, the VHI includes an overall question on the quality of the voice with four response levels ranging from 0 (good), 1 (reasonable), 2 (moderate), 3 (poor). All VHI questionnaires were collected at baseline (i.e. before logopedic, surgical or medical voice treatment). To assess test–retest reliability, a subset of 30 patients (11 cancer, 13 structural lesion, 2 Reincke’s oedema, 2 laryngitis, and 2 pareses) filled out the VHI twice, with a mean interval period of 3.5 months (range 1–6 months) without any voice intervention.

Statistical analyses

Because of the skewed distribution of the VHI scores of the control group (the patient group showed normal distribution), independent Mann–Whitney tests (U test) and Kruskal–Wallis analysis-of-variance-by-ranks tests (H test) were used with a two-sided probability level of ≤0.05 to compare subject groups and to assess the association of VHI scores with age, gender, and self-reported voice quality.

The relations between VHI scores and case of voice impairment was evaluated with Receiver Operating Characteristics (ROC) analyses, using the area under the curve (AUC) as a summary measure of the overall discriminative ability of the VHI. In addition to ROC analyses, the sensitivity and specificity were calculated at various cut-off scores.

Internal consistency of the VHI was assessed by Cronbach’s alpha. Test–retest stability was determined by Spearman’s correlation coefficient between the first and the second (repeated) ratings. The clinical relevant difference score to be used in individual patients was defined as the maximum deterioration or improvement between test and retest scores. The clinically relevant difference score to be used in group study designs was defined based on an effect size (ES) of 0.80, being defined as the difference between the experimental group mean minus the control group mean divided by the standard deviation of the control group.

Results

Reliability

Internal consistency of the VHI proved to be good with Cronbach’s alpha ranging from 0.87 (123 subjects from the normal population), 0.90 (35 glottic cancer patients), to 0.92 (196 voice-impaired patients), and 0.96 for the total group. Test–retest scores of the 30 patients who filled in the VHI twice over a mean period of 3.5 months (range 1–6 months) attested high test–retest stability with Spearman’s rho of 0.95 (P < 0.01).

Voice-impaired patients and the normal population

Within the normal population 16% subjects judged their own voices as not good (score > 0 on the overall question on the quality of the voice) versus 93% of the patients with benign voice disorders and 94% of the cancer patients.

Voice handicap index scores of glottic cancer patients were similar to those of patients with voice problems due to benign lesions (P = 0.64), but clearly deviant from the normal population (P < 0.01) as were the scores of the total group of patients with benign voice disorders (P < 0.01). An overview is given in Fig. 1. Because of this similarity between voice patient groups, further analyses were carried out on the total group of voice-impaired patients (n = 232).

Fig. 1
figure 1

Boxplots presenting Voice handicap index scores for various subjects groups: normal population, patients with vocal fold paresis, larynx traumata, structural vocal fold lesions, Reincke’s oedema, laryngitis, and patients with voice problems after treatment for early glottic cancer

Sensitivity and specificity of the VHI in detecting voice-impaired patients using a range of cut-off points is shown in Table 1. The AUC was 0.98 (95% CI: 0.97–0.99) indicating good overall discriminative ability of the VHI. Table 2 shows that sensitivity and specificity is good with a cut-off point between 13 and 17. A cut-off point of 15 (or higher) on the VHI scale is proposed to identify patients with voice problems in daily life, because of a good degree of sensitivity and a sound (16% of the normal population judged their own voices as not-good) degree of specificity.

Table 1 Overview of various Voice handicap index cut-off points regarding sensitivity and specificity
Table 2 Overview of various Voice handicap index difference scores regarding effect size (with standard deviation of 19.40 as found in the total group of voice-impaired patients)

Age, gender, and voice quality

No association between the VHI scores with gender was found for the normal population (P = 0.86) or for the voice-impaired patients (P = 0.59).

Regarding age, no clear associations were present either in the normal population (r = 0.03, = 0.97) or the voice-impaired patients (r = 0. 01, P = 0.99).

Self-ratings of voice quality appeared to be clearly related to VHI scores with Spearman’s rho ranging from 0.32 for the normal population to 0.48 for the voice-impaired patients (P < 0.001).

Difference scores for individuals

The difference score between the first and second rating appeared not to be dependent (Spearman’s r = −0.005, P = 0.98) on the height of the VHI score (Fig. 2). Individual difference scores between the first and second ratings remained within ten points, ranging from −9 to +10 points. Therefore a 10-point shift can be defined as a clinical relevant difference score to be used for single individual patients.

Fig. 2
figure 2

Scatter plot showing the (absent) relation between the first VHI score and the difference score between the first and second repeated VHI score as reported by 30 patients (Spearman’s r = -0.005)

Difference scores for study designs

To define a relevant difference score for study designs with groups, determination of the effect size (ES) was used. ES above 0.80 represents a large statistical and clinical difference. From this study, standard deviations of the groups of voice-impaired patients at baseline ranged from 10.60 (trauma), 15.98 (early glottic cancer), 18.43 (paresis), 19.78 (structural lesions), to 20.50 (oedema and laryngitis); the standard deviation of the total group of voice-impaired patients (n = 232) was 19.40. Table 2 represents an overview of effect sizes regarding various difference scores with a standard deviation of 19.40 as representative for the total group of voice-impaired patients. The results show that a difference score of 15 points or more is clinically relevant in comparing groups of patients.

Discussion

The results of this study demonstrated a significant difference in mean VHI scores between patients with either benign voice pathology or voice pathology following treatment for glottic malignancy as compared to the normal population, which is in concordance with various previous studies. These studies, all about benign organic and/or functional voice disorders reported mean VHI scores varying from 11 to 47 which were found to differ significantly from controls with normal voices [4, 5, 1113]. Nawka [11] was the first to report a significant difference between 9 patients with voice problems due to a malignant tumour (mean VHI score 34 points) and 16 normal control subjects (mean VHI score 7 points); moreover they also did not find a difference in VHI score between various diagnosis groups (benign organic or functional voice disorders (n = 159), neurogenic voice disorders (n = 32) or malignant voice disorders (n = 9)). From our results and the results as reported by Nawka et al., it is clear that voice problems in daily life of cancer patients are similar to those of patients with benign voice impairment. One could find this result remarkable because it might be expected that patients being cured of a malignancy experience the inherent voice impairment in a less negative way than patients cured of a benignancy.

The secondary aim of this study was to assess some underexposed psychometric characteristics of the VHI. Internal consistency proved to be good, as was test–retest stability. Regarding identification of voice-impaired patients, several authors used controls (subjects from the normal population without voice problems) in their randomised controlled studies on VHI change and reported mean “normal” values varying from 2.3 to 10.5 points but neither of them made a reliable effort to define a cut-off point [4, 5, 11, 12, 13]. The present study revealed a cut-off point of 15 to identify patients with voice problems in daily life.

Regarding clinical relevant difference scores, we found a difference score of 10 points to be useful for individuals in clinical practice and 15 points to be useful in study group designs. Jacobson et al. reported a shift of 18 points as a valuable difference score to measure efficacy of specific voice treatment techniques, but no clear analysis description was given [7]. Another non-statistical approach to define a clinically relevant difference score to be used in group design studies is to line up published studies on the efficacy of voice treatment and assess difference score appearing to be significant or non-significant. Four studies on the efficacy of voice therapy in patients with several benign voice pathologies or voice pathologies following treatment for glottic malignancy, showed significant improvement of the mean VHI with a range of 12 to 18 points [10, 14, 15, 23]. On the contrary, Speyer [18] reported a non-significant median improvement of 6 points after voice therapy in patients with a diversity of chronic benign voice disorders. Other studies on the effect of several medical treatment modalities for different benign voice disorders show a mean VHI improvement ranging from 13 to 46 points [2, 13, 26]. All these studies on the efficacy of voice intervention on various voice patient groups reveal that a statistical difference score is at least 12 points. A meta-analysis could provide further information but it seems too early to perform such as study because of the limited number of studies on efficacy of voice treatment at this moment. In the mean time, we propose a difference score of 15 points signifying a statistical and clinical high effect size.

The proposed cut-off point and the clinical difference scores in this study are not meant to be conclusive, mainly because of the Dutch origin of the data, which may have influenced the results. Currently, a European VHI Study Group is working on comparison of various translations of the VHI to assess equivalence. The first preliminary results reveal that there are only minor differences between the included versions, but further data exploring is ongoing.

Conclusion

Patients with voice problems after treatment for early glottic cancer encounter the same amount of problems in daily life as other voice-impaired patients and therefore require the same attention and care for this sequel to their initial cancer treatment. Furthermore, the VHI proved to be an adequate tool for baseline and effectiveness measurement of voice.