Introduction

Haemorrhoidal disease is a common condition, and can present with symptoms including bleeding, pain and anal leakage. There is a range of surgical interventions available including rubber band ligation (RBL), haemorrhoid artery ligation (HAL), stapled haemorrhoidopexy and excisional haemorrhoidectomy, some of which have been assessed in randomised trials [1]. Although there have been attempts to amalgamate these trials and produce guidance regarding optimal treatment pathways, all are subject to interpretation and on occasion guidelines differ substantially [2,3,4]. One of the major challenges in the comparison of these different studies is the lack of standardised outcomes. Research in haemorrhoidal disease is often confounded by the poly-symptomatic nature of the disease process. Clinician-reported outcomes in this setting show low levels of inter-rater agreement, making them unreliable [5].

In benign conditions, quality of life is an important outcome measure, but is frequently not reported. Where it is reported, generic quality of life tools are used that may not reflect specifically on haemorrhoidal disease [6]. Other studies may use outcomes that are specific solely to aspects of haemorrhoidal disease such as prolapse or incontinence, and fail to capture changes in other outcomes related to haemorrhoids [7]. A validated haemorrhoid-specific outcome tool, which takes into account both the poly-symptomatic nature of the disease along with the effect of these symptoms on quality of life, is required to be able to compare interventions and guide optimal therapy. Attempts have been made to produce such a tool using a multi-symptom approach, the haemorrhoid severity score (HSS) introduced by Nystrom [6]. Whilst reflecting the appropriate symptomatology, this scoring system has not gained wide acceptance due probably to a lack of robust validation. Others have developed this system and validated it (the Sodergren score) but validation was based on a very small sample of patients [8].

Responsiveness is an essential quality of any health-related quality of life (HRQoL) measure and refers to the ability of an instrument to detect change over time, if a true change in the patient’s health status has occurred before and after an intervention or treatment [9]. It also provides evidence of an instrument’s validity as it should confirm that anticipated responses arise in accordance with corresponding changes in health [10]. The responsiveness of an instrument is ideally evaluated using a therapy of known effectiveness, such as one evaluated in a clinical trial.

The aim of this study was to establish the responsiveness of the HSS for use in the evaluation of patients’ haemorrhoids and determine the suitability of the instrument as an outcome measure in this context.

Materials and methods

As the responsiveness of an instrument is ideally evaluated using a therapy of known effectiveness [10, 11], this validation was planned as a secondary analysis of HuBBLe trial data (ISRCTN41394716) [11]. As part of this study, all patients had provided informed consent for their data to be used for analysis.

The Hubble trial was a multi-centre, parallel group randomised controlled trial with a 1:1 allocation ratio comparing HAL with RBL in adults aged 18 years or over with symptomatic second- or third-degree haemorrhoids. The primary outcome was defined as the proportion of patients with recurrent haemorrhoids at 12 months post-procedure, as derived from the patient’s self-reported assessment in combination with general practitioner and hospital records. Recurrence was defined using a simple question 12 months after randomisation [5]. ‘At the moment, do you feel your symptoms from your haemorrhoids are: (1) Cured or improved compared with before starting treatment; or, (2) Unchanged or worse compared with before starting treatment?’. Secondary endpoints assessed at baseline, 6 weeks and 12 months included: the haemorrhoid symptom severity score as well as the Vaizey incontinence score [12], a VAS pain score and health state utility based on the EuroQoL-5D. The study randomised 185 patients to HAL and 187 to RBL, and showed 1-year recurrence rates of 49% and 30%, respectively. Six-month questionnaire data including HSS were captured for 137 RBL and 144 HAL patients, and 1-year questionnaire data for 125 and 131, respectively.

Statistical analysis

The HSS comprises five items. All items included in a domain are scored between 0 and 3 (0 indicating best and 3 worst health status). A total score is obtained by summing the answers to each item. Lower scores indicate better haemorrhoidal health.

The Vaizey incontinence score questionnaire is a seven-item measure shown to outperform others in detecting faecal incontinence [12]. The Vaizey consists of seven items, three of which ask about the frequency of incontinence on a 4-point scale ranging from 0 = Never to 4 = Daily, followed by a single item about the extent to which symptoms alter lifestyle (using the same 4-point scale). The final three items are concerned with the severity of incontinence using a dichotomous No/Yes response scale (No = 0, Yes = 2 for items five and six, and 4 for item seven). The Vaizey score is calculated by summing responses across the seven items. A lower score indicates less faecal incontinence (e.g. 0 = perfect continence, 24 = totally incontinent).

Numerous methods are available to determine the responsiveness of an instrument. As there is no gold standard approach, it has been recommended that multiple methods are employed [9, 10]. Four different statistical analyses were used to evaluate the responsiveness of the HSS, including: (i) effect size, (ii) standardised response means, (iii) significance of change, and the (iv) responsiveness statistic. All statistical analyses were performed using SPSS version 23 (IBM Corp, Armonk NY,USA).

Effect size

The effect size (ES) is an estimation of the magnitude of change. It is calculated by measuring the difference between the means pre- and post-treatment, and dividing this value by the standard deviation of the pre-treatment score [13]. The changes in health status are translated into a standard unit of measurement to aid interpretation. Generally accepted ES values are 0.20 (Small), 0.50 (Moderate) and 0.80+ (Large) [14]. A small effect size implies that treatment has little influence on the health status of patients as measured by that specific questionnaire or domain.

Standardised response mean

The standardised response mean (SRM) is similar to ES. However, to calculate the SRM the mean change in scores (i.e. between baseline and follow-up) is divided by the standard deviation of change in score [10].

Significance of change

The mean changes in domain scores were calculated for patients based upon their answers to the anchor question in the primary outcome of the trial (whether they felt ‘(i) Cured or improved compared with before starting treatment; or, (ii) Unchanged or worse compared with before starting treatment?’). These data were collected at 6 weeks and 1 year post-treatment.

Responsiveness statistic

The responsiveness statistic compares subjects who report improvement following intervention using the two questions above, with those who report no improvement. It is calculated by dividing the mean change in score for patients reporting improvement by the SD of scores from those who report no improvement [13]. A responsiveness statistic value ≥ 1 indicates that an instrument is highly responsive to change, and a value of between 0.20 and 1 indicates an acceptable level of responsiveness [14].

Results

Demographics from Hubble

The RBL group included 176 participants, 172 of whom underwent the intervention. This included 99 (56%) male participants and had a mean age of 49.0 years (S.D. 12.9). Grade II haemorrhoids were present in 115 (65%) participants, grade III in 60 (34%) and grade was missing for 1 participant. The recurrence rate at 1 year was 49% (87 cases).

The HAL group included 161 participants, 158 of whom underwent the intervention. This included 85 (53%) male participants and had a mean age of 48.5 years (SD 13.5). Grade II haemorrhoids were present in 92 (57%) participants, grade III in 68 (42%) and grade was missing for 1 participant. The recurrence rate at 1 year was 30% (48 cases).

Haemorrhoid outcomes

Most patients felt ‘cured or improved’ following the interventions (81% at 6 weeks; 72% after 1 year) although some patients reported that they felt ‘worse or unchanged’ 6 weeks and 1 year after treatment. Ninety-two percent of HAL group patients felt cured or improved after 6 weeks compared to 71% of those in the RBL group. There was little difference between the intervention groups at 1 year. Again, 71% of the RBL stated that they felt cured or improved compared with 73% of the HAL group (representing a decrease in positive appraisals from the 6-week self-reported recurrence). Responses to the recurrence question are summarised in Table 1.

Table 1 Recurrence in RBL and HAL treatment groups at 6 weeks and 1 year post-interventions

Effect size and SRM—from baseline to 6 weeks (Table 2)

Table 2 Mean scores, effect sizes, and significance of change (paired t test) between baseline and 6 weeks on the Vaizey and HSS for the two treatment groups and overall

Using the Vaizey score, in the HAL group mean scores from baseline to 6 weeks decreased indicating an improvement (5.23–4.29). However, these changes were non-significant (p = 0.075). Effect size and SRM calculations indicated a small amount of change (0.20 and 0.16, respectively). Using the HSS, a significant change was identified between scores at baseline and week 6 (6.48–3.02, p < 0.001). The effect size and SRM also demonstrated a large magnitude of change (1.12 and 1.01, respectively).

In the RBL group, using the Vaizey, mean scores between baseline and 6 weeks in the patient sample improved (5.70–3.79) which was significant (p < 0.001), with small to moderate effect size and SRMs (0.36 and 0.39, respectively). However, using the HSS, whilst a significant change was also observed (6.35–4.05, p < 0.001), the magnitude of change was greater as demonstrated by the effect size (0.75) and SRM (0.72).

Effect size and SRM—from baseline to 1 year (Table 3)

Table 3 Mean scores, effect sizes, and significance of change (paired t test) between baseline and 1 year on the Vaizey and haemorrhoid severity scores for the two treatment groups and overall

Using the Vaizey score, for the HAL group mean scores from baseline to 1 year significantly decreased indicating an improvement (5.63–4.61, p = 0.04) but the effect size and SRM indicated only the smallest amount of change (0.21 and 0.21, respectively). Using the HSS, a significant change from baseline to 1 year was observed (6.18–3.63, p < 0.001). The effect size and SRM also demonstrated a large magnitude of change (0.85 and 0.76, respectively).

In the RBL group, using the Vaizey, mean scores between baseline and 1 year the patient sample improved (4.84–3.60) which was significant (p = 0.013), with small effect sizes and SRMs (0.28 and 0.26, respectively). However, using the HSS, whilst a significant change was also observed (6.03–3.62, p < 0.001), the magnitude of change was greater as demonstrated by the effect size (0.76) and SRM (0.64).

There was no significant difference in either group (HAL or RBL) from pre-randomisation score to baseline score (Table 4).

Table 4 Mean scores, effect sizes, and significance of change (paired t test) between pre-randomisation and baseline on the Vaizey and haemorrhoid severity scores for the two treatment groups combined

Significance of change: cured after treatment

The effect size and significance of change scores between baseline and 6 weeks were derived for patients who had undergone HAL and RBL treatment, and reported themselves to be ‘cured or improved’ or ‘unchanged or worse’ at 6 weeks as based upon their own self-reported answers to the recurrence question (Table 5). For the patients who rated themselves as ‘cured or improved’ that received the HAL intervention, this change was only significant using the HSS questionnaire (p < 0.001). The effect size and SRM revealed the magnitude of change to be greater using this questionnaire (1.18 and 1.11 respectively) compared to the Vaizey Questionnaire which detected only a small amount of change (0.24 and 0.21, respectively).

Table 5 HAL mean scores, effect sizes, and significance of change (paired t test) between baseline and 6 weeks on the Vaizey and haemorrhoid severity scores for the two treatment groups for patients who reported themselves to be “cured or improved” and for patients who reported themselves to be “unchanged or worse” at 6 weeks

For ‘cured or improved’ patients who underwent the RBL intervention, a similar trend was observed. Whilst the mean scores demonstrated an improvement in line with the self-report answers, the effect sizes and SRMs revealed only a moderate amount of change using the Vaizey (0.40 and 0.43) compared to the HSS which detected a large magnitude of change (0.97–0.95, respectively).

In patients who reported themselves to be unchanged or worse, no significant differences were observed for either the HAL or RBL treatment groups and the magnitude of change was small as indicated by the effect sizes and SRMs. In the HAL group, the mean scores on the Vaizey indicated the patient group had got worse (4.20–6.40) but the mean scores on the HSS revealed they had the stayed the same (7.50–7.50).

Significance of change: cured after treatment (from baseline to 6 weeks—cured at 1 year)

The effect size and significance of change scores between baseline and 6 weeks were calculated as above for patients in either intervention group who self-reported that their condition was ‘cured or improved’ or ‘unchanged or worse’ at 1 year (Table 6). For the ‘cured or improved’ patients in the HAL intervention group, this change was again only significant using the HSS Questionnaire who rated themselves as ‘cured or improved’ (p < 0.001). The effect size and SRM revealed the magnitude of change to be greater using this questionnaire (1.03 and 0.99, respectively) compared to the Vaizey Questionnaire which detected only a small amount of change (0.16 and 0.13, respectively).

Table 6 Mean scores, effect sizes, and significance of change (paired t test) between baseline and 6 weeks on the Vaizey and HSS for the two treatment groups for patients who reported themselves to be “cured or improved” and for patients who reported themselves to be “unchanged or worse” at 1 year

This finding was replicated for RBL patients self-reporting ‘cured or improved’. Whilst the mean scores demonstrated an improvement in line with the self-report answers, the effect sizes and SRMs revealed only a small to moderate amount of change using the Vaizey (0.28 and 0.28) compared to the HSS which detected a large magnitude of change (0.94–1.01, respectively).

Due to the incongruence between the timing of patient ratings of changes in health status (cured or not at 1 year) and the significance of change period (from baseline to 6 weeks), patients in both intervention groups who reported ‘unchanged or worse’ condition after 1 year showed a decrease in Vaizey and HSS scores at 6 weeks (i.e. an improvement). This change in the HAL group was only significant (p < 0.001) when using the HSS and not the Vaizey. The effect size and SRM suggest a large magnitude of change (1.13 and 0.93, respectively).

For patients who reported themselves to be ‘unchanged or worse’ in the RBL intervention group, effect sizes and SRM were very similar, and small to moderate, for the Vaizey (0.31 and 0.40, respectively) and HSS (0.31 and 0.35, respectively). The change detected by the Vaizey was significant (p = 0.05).

Significance of change: cured after treatment (from baseline to 1 year—cured at 1 year) (Table 7)

Table 7 Mean scores, effect sizes, and significance of change (paired t test) between baseline and 1 year on the Vaizey and HSS for the two treatment groups for patients who reported themselves to be “cured or improved” and for patients who reported themselves to be “unchanged or worse” at 1 year

The effect size and significance of change scores between baseline and 1 year were also calculated for patients in either intervention group who self-reported that their condition was ‘cured or improved’ or ‘unchanged or worse’ at 1 year (Table 7). For the patients who rated themselves as ‘cured or improved’ who underwent the HAL intervention, this change was significant using both the Vaizey (p < 0.01) and HSS (p < 0.001) Questionnaires. Effect size and SRM revealed the magnitude of change to be greater using the HSS (1.14 and 1.13, respectively) compared to the Vaizey Questionnaire which detected only a small to moderate amount of change (0.32 and 0.32 respectively).

For ‘cured or improved’ patients who underwent the RBL intervention, a similar trend was observed. Whilst the mean scores demonstrated an improvement in line with the self-report answers, the effect sizes and SRMs revealed only a small to moderate amount of change using the Vaizey (0.33 and 0.31) compared to the HSS which detected a large magnitude of change (1.01–0.98, respectively).

The findings suggest that patients in both intervention groups who at 1 year rated themselves as ‘unchanged or worse’ at 1 year may have experienced a short-lived improvement post-intervention (from baseline to 6 weeks). The HSS but not Vaizey showed a significant large magnitude change for the HAL intervention group, while both HSS and the Vaizey detected small improvements for RBL patients. This supposition is supported by the significance of change results between baseline and 1 year for patients cured or not also at 1 year (Table 7) showing no significant changes for ‘unchanged or worse’ patients.

For patients who reported themselves to be ‘unchanged or worse’, no significant differences were observed for either the HAL or RBL treatment groups and the magnitude of change was very small as indicated by the effect sizes and SRMs.

Responsiveness statistic

The responsiveness statistic was calculated for the Vaizey and HSS for both the HAL and RBL treatment groups (Table 8). Values for the responsiveness statistic using the Vaizey ranged from 0.23 to 0.38. However, this was much higher using the HSS Questionnaire, where the responsiveness values ranged between 1.02 and 1.45, indicating that this measure was highly responsive to change.

Table 8 Responsiveness statistic for the treatment groups at 6 weeks and 1 year

Discussion

This study was a planned secondary analysis of trial data undertaken to determine the responsiveness of the HSS in the context of RBL and HAL as treatments for haemorrhoids. The results indicate that the HSS is more responsive to change in patients’ health status than the Vaizey scale for both procedures as measured by effect sizes, SRMs, significance of change scores and the index of responsiveness. The instrument, therefore, appears to be suitable for use as an outcome measure in this context.

This is perhaps not surprising. The Vaizey scale was developed to assess the severity of faecal incontinence and not haemorrhoids specifically [12], and so it is reassuring that the results of the responsiveness analyses confirmed that the HSS measure was more sensitive in detecting changes in patients’ health status following treatments for haemorrhoids than the Vaizey measure. This adds support to the validity of the HSS measure although more tests to determine the validity of the measure are needed. The validity of the HSS is further supported as the findings of the measure match the clinical experience in the use of HAL in the HuBBLe trial [11].

There is a wealth of literature about the treatment of haemorrhoids. Whilst many publications are case series or reviews, there are over 400 randomised controlled trials and over 40 meta-analyses. One would, therefore, assume that the correct therapy for haemorrhoids should be well defined and yet this is not the case. There are a number of shortcomings in this ‘high-quality literature’ making it difficult to determine what is the optimal treatment [15]. Significant variations in recent guideline recommendations are testament to how the evidence can be interpreted differently [2,3,4,5]. One major issue with trial data interpretation is the lack of standardisation of outcome measures. A systematic review by van Tol details over 59 different outcome measures that have been used. They have shown varied definitions of outcomes; in many cases, outcomes were not defined at all [15].

There has been a previous attempt at validating a haemorrhoid severity score [7, 8]. However, this analysis was based on a very small cohort. External validity was demonstrated by showing those with higher scores were more likely to undergo surgery. There was no demonstration of change with time or after treatment. In contrast, we have demonstrated in two large cohorts from a carefully designed randomised trial that the HSS is highly responsive to intervention and represents a much more robustly validated tool for the assessment of haemorrhoidal treatment that will facilitate comparative studies and allow more meaningful synthesis of research data.

More recently, a new patient-reported outcome measure has been developed to capture the burden associated with haemorrhoidal disease and anal fissures upon quality of life (HEMO-FISS-QoL) [7]. Whilst some psychometric properties have been determined (e.g. acceptability, construct validity and reliability), the responsiveness of this measure is yet to be determined. The measure is also not specific to the haemorrhoid population as it is also designed to report outcomes following treatment of anal fissures. This may affect its performance when compared to the HSS.

Strengths and limitations

This analysis does have some limitations. We have only captured quantitative data. Additional qualitative information on participant interpretation of questions as well as ease of completion would have been informative. However, the form, consisting of five simple questions, is by no means onerous. Another limitation is the fact that the HSS does not include a global satisfaction domain. Just because the HSS score is improved, this does not indicate that the patient is satisfied with the outcome and the intervention can be classified as a ‘success’. An example would be a patient who complains of bleeding but also has prolapse. The intervention cures the bleeding but fails to cure the prolapse. Is the patient ‘cured’? Nystrom recommended classifying cure as an HSS score of 0 or 1 [6]. This is perhaps inadequate to capture those who still have symptoms but who are adequately improved so as to be content with the outcome. We, therefore, recommend including an additional global satisfaction score in any haemorrhoidal disease research, such as that suggested by Shanmugan et al in a Cochrane review [16].

Implications for future research

To be able to combine comparative trials in a scientifically valid way, a core outcome set of outcomes must be developed. Van Tol and colleagues are seeking to develop these core outcomes via a Delphi exercise [17]. It is very likely that the patient-reported outcome domain of this core outcome set will include the components contained within the HSS. Given the poly-symptomatic nature of haemorrhoidal disease, a way of combining these patient-reported outcomes in a validated and responsive format is required. Our analysis clearly shows that the HSS meets these requirements.

Conclusions

The HSS is a highly responsive tool for the detection of changes in haemorrhoid symptoms, and should be recommended for use as a patient-reported outcome measure in all future clinical trials investigating haemorrhoidal disease.