Introduction

The devastating financial impact of the coronavirus pandemic will herald significant change in how healthcare will be delivered in the future. Economies of scale will undoubtedly take place with an increasing focus on a hub-spoke model of care. An emphasis on quality of care will still be at the vanguard, with patient outcomes being a driver for change.

Currently, the National Health Service (NHS) in England has a focus on the consolidation and delivery of surgical care by creating a framework of excellence in delivery, research, training, and education. The concentration of specified operations within high-volume centres and surgeons would be attractive in achieving this aim [1, 2].

There is abundant extant literature supporting the premise that larger surgical volumes lead to more favourable clinical outcomes, which benefits patients and the healthcare economy [3]. However, reorganisation is by its nature, disruptive, not without cost and may negatively impact on patient service access, training and the sustainability of departments and smaller hospitals [4].

The Getting It Right First Time (GIRFT) programme, funded by the Department of Health and Social Care in England, has a remit to reduce unwarranted variation in healthcare provision. The GIRFT programme has studied the impact on patients and healthcare resources of low-volume surgery. It has sought to provide evidence as to whether patient outcomes are linked to surgical volume to better inform national decision-makers.

For thyroid surgery in England, Nouraei et al. [5], analysing Hospital Episode Statistics (HES) data over 6 years, identified fewer medical and surgical complications, shorter length of stay, and lower rates of vocal cord palsy as associated with surgeons with larger annual volumes of thyroidectomy. Greater than 30 procedures per annum consistently showed to be protective. Aspinall et al. [6] in 2019 published data on 25,038 thyroidectomies from the United Kingdom Registry of Endocrine and Thyroid Surgery (UKRETS), which is organised by the British Association of Endocrine and Thyroid Surgeons (BAETS). They identified a link between surgeon volume and both permanent hypoparathyroidism and recurrent laryngeal nerve palsy. The authors proposed a minimal annual volume of 50 thyroidectomies for each surgeon. This differed from the recommendations made from the majority of previous studies on volume outcomes from the USA. Adkisson et al. [7] identified 30 as a cut-off for annual surgeon volumes with regard to outcomes for completeness of resection and the appropriateness of the operation undertaken. Adam et al. [8] identified less than 26 procedures per annum to be associated with an increased complication rate. In contrast, a recent study showed that although larger surgeon volumes were associated with lower rates of vocal cord palsy and hypoparathyroidism, the absolute effect of volume on outcomes was modest, especially in hospitals with low complication rates [9]. Although the evidence for a volume-outcome relationship in thyroidectomy appears strong, other confounding surgeon and hospital-specific factors may also play an important role [10]. Nevertheless, the American Thyroid Association and the American Association of Endocrine Surgeons both recommend that thyroidectomy should be performed by high-volume surgeons where possible, defined as a minimum of 20 thyroidectomies annually [11, 12].

The aim of this study was to use the HES database to gain insight into the nature and extent of the volume-outcome relationship for thyroidectomy in England. To increase the homogeneity within our dataset, we used total thyroidectomy as an exemplar condition.

Methods

Study design

A retrospective analysis of data from HES. HES data are collected by NHS Digital for all patients treated at English hospitals that are funded by the NHS. In England, healthcare coverage is universal, free at the point of delivery, and funded from general taxation. As such, all but a small minority of thyroid surgery is conducted within the NHS system, with very few patients seen by private providers.

To investigate volume-outcome relationships, a relatively homogenous dataset with regard to procedure type is required. Total thyroidectomy was chosen as an exemplar condition. Total thyroidectomy is performed in sufficiently large volumes nationally to facilitate a meaningful analysis, and outcomes are not impacted, in general, by previous thyroid surgery. Furthermore, complications for total thyroidectomy are greater than those of partial thyroidectomy [13].

Ethics

Consent from individuals involved in this study was not required. The analysis and presentation of data follows current NHS Digital guidance for the use of HES data for research purposes and is anonymised to the level required by ISB1523 Anonymisation Standard for Publishing Health and Social Care Data [14, 15].

Case identification in HES

Setting: hospital admissions in England.

Timing: financial years 2012–2013 to 2017–2018.

Admission method: elective or day-case.

Age: ≥ 16 years.

Diagnosis: The Classification of Interventions and Procedures version 4 (OPCS-4) codes listed in Table 1 were used to identify thyroidectomy procedures.

Table 1 Diagnostic and procedural codes used to identify thyroidectomy and subsequent complication

Exclusions from modelling of volume-outcome relationships in total thyroidectomy: Laryngectomy for head and neck cancer was excluded from the analysis. The OPCS codes used to identify laryngectomy and the International Statistical Classification of Disease and Related Health Problems 10th revision (ICD-10) codes used to identify patients with a diagnosis of head and neck cancer are detailed in Table 1.

For the purpose of statistical modelling, where a patient had multiple admissions, only the chronologically first admission was retained. This ensured that all admissions were independent of one another at a patient level. See the “Data management and statistical analyses” section for more details.

Data extraction

Exposure: volume of surgery per trust and per surgeon per annum.

Primary outcome: emergency hospital readmission within 30 days or length of hospital stay > 2 days, or both. This primary outcome was chosen as it was felt to be the best reflection of a complication of surgery, with a need for either an extended hospital stay or an emergency readmission soon after surgery. The choice of 2 days was based on the experience and judgement of the authors but was supported by the data (see the “Results” section).

Secondary outcomes: vocal cord palsy within 1 year, post-procedural hypoparathyroidism within 1 year, stridor diagnosis within 1 year, tracheostomy (either permanent or temporary) within 1 year and length of stay > 4 days. In addition, length of stay > 2 days and emergency hospital readmission within 30 days were considered independent outcomes. The codes detailed in Table 1 are used to identify subsequent hospital admissions where the specified diagnosis was present or where the specified procedure was undertaken. These codes are based on those used by Nouraei et al. [5]. Where a patient had multiple admissions with the same code, only the first instance where the code was used was retained. This admission was then used to calculate the time to the complication being present.

Covariates: age, sex, Hospital Frailty Risk Score (HFRS) band [16], financial year of admission, day-case surgery (not used for outcomes involving length of stay), diagnosis of thyroid cancer, and surgeon speciality (ear, nose and throat, general surgery, other).

Other data extracted: all-cause mortality for 1-year follow-up from the UK Office for National Statistics (ONS).

Data management and statistical analyses

Data were extracted onto a secure encrypted Structured Query Language (SQL) server controlled by NHS England and NHS Improvement. Analysis within this secure environment took place using standard statistical software: Microsoft Excel (Microsoft Corp, Redmond, WA, USA), SAS (SAS Institute Inc., Cary, NC, USA), and Alteryx (Alteryx Inc., Irvine, CA, USA). Standard descriptive statistics (e.g. mean, median, frequency) are used as appropriate to the level of the data. Age was broadly normally distributed.

Calculation of annual volumes

Annual hospital and surgeon volume were calculated by taking the total volume of all thyroidectomy procedures across the 6-year study period and dividing it by the total number of years the trust or surgeon contributed data. For exploratory descriptive analysis and identification of a possible volume threshold, volume data were categorised into groups of 1–4, 5–9, 10–15 etc. per annum up to 45–49 then 50–59, 60–69, 70–79 etc. per annum for surgeon volume and 1–19, 20–39, 40–59 etc. per annum for hospital volume. Larger volume categories were combined to ensure there were at least five hospital trusts and surgeons represented in each volume category.

Modelling of factors associated with outcomes for total thyroidectomy

Initially, the relationship between volume and the primary outcome was modelled using restricted cubic splines to investigate whether the relationship between volume and the log odds of the outcome was linear. Hospital volume and surgeon volume models with five equally spaced knots were developed. In neither of the two models were any of the three cubic terms significant and it was concluded that the relationships were approximately linear in both cases. Subsequently, conventional multilevel, multivariable logistic regression modelling was used for the analyses presented.

Multilevel modelling allowed for the hierarchical nature of the data and was implemented using the GLIMMIX protocol in SAS. Two-level (patients nested within trusts) intercept-only models were constructed using a logit link function. Volume and age were treated as continuous variables and the assumption of linearity to the logit checked using the Box-Tidwell test. Adjustment was made for the covariates listed above with only significant covariates retained in the final model. The results of these analyses are presented in terms of odds ratios (ORs) and confidence intervals (CIs).

Calculation of population attributable risk

For the primary outcome measure, the population attributable risk (PAR) was calculated as the difference between the probability of the outcome in the entire population and the adjusted probability (from the multilevel models) of the outcome in those surgeons with volumes above a given value. As an example, if the probability of an outcome were 0.3 (30%) in the entire sample and 0.2 (20%) in surgeons with a volume ≥ 60 per annum, the PAR for surgeon volume < 60 per annum would be 0.1 (10%).

The population impact number (PIN) was calculated as the reciprocal of the PAR. The PIN can be interpreted as the average number of patients in whom one adverse event can be attributed to surgery at a below threshold volume [17]. In the example above, the PIN would be 10 (1/0.1) at volumes < 60, indicating that, for volumes below this threshold, on average, one adverse event in every ten would be attributable to surgery by a below threshold surgeon. For further details on the calculation of the PAR and PIN, see Supplementary Material.

Role of the funding source

This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors. The corresponding author had full access to all the data in the study and had final responsibility for the decision to submit for publication.

Results

Data for all thyroid procedures

The study flow diagram (Fig. 1) shows how data were extracted from HES. Data were available for 68,368 thyroidectomies across 144 hospital trusts over the 6-year study period. There were 22,823 total thyroidectomies (33.4%); 41,413 lobectomies (60.6%); and 4132 other procedures (6.0%). Details of the demographic characteristics, clinical presentation, and outcomes for these procedures are shown in Table 2. Patients undergoing total thyroidectomy were slightly younger and had a longer stay, higher 30-day readmission rates, and higher rates of hypoparathyroidism at 1 year than other procedures. Those undergoing lobectomy were more likely to be seen by an ENT surgeon than those undergoing total thyroidectomy, and 3189 (7.7%) lobectomies were second lobectomy procedures in the same patient. Only 0.9% of total thyroidectomies were performed as day-cases compared to 6.9% for lobectomy. To investigate the validity of considering length of stay > 2 days as a proxy for post-operative complication, we investigated the association between hypoparathyroidism/hypocalcaemia (one of the most common immediate post-operate complications) recorded during the index admission and length of stay > 2 days. Of 1512 patients with hypoparathyroidism/hypocalcaemia recorded during the index admission, 1124 (74.3%) had a stay > 2 days.

Fig. 1
figure 1

Study flow diagram

Table 2 Summary of patients’ demographic, clinical characteristics, and outcomes for each procedure

Changes in practice and outcomes over time for total thyroidectomy

For total thyroidectomy, details of changes in practice and outcomes over the study period are summarised in Table 3. The number of total thyroidectomies conducted per year was relatively stable with a modest fall in the number of hospital trusts performing thyroidectomy over the study period. Although the number of surgeons performing total thyroidectomy was stable, there was a clear trend towards a larger proportion of procedures being performed by ENT surgeons and fewer by general surgeons. Analysis of outcomes revealed that only length of stay showed a trend over time with a significant decline in stays longer than 2 and 4 days. There was no obvious change with regard to day-case rate over time; this was very low throughout the 6-year period.

Table 3 Summary of patients’ clinical characteristics and outcomes for total thyroidectomy over time

Volume-outcome relationships for total thyroidectomy

There was a very strong correlation between hospital (Spearman’s r = 0.803, p < 0.001) and surgeon (Spearman’s r = 0 0.669, p < 0.001) mean annual volume for total thyroidectomy and all other thyroid procedures.

Figure 2 shows the variation in the primary outcome with hospital and surgeon volume categories. On visual inspection, there is a broadly linear relationship between surgeon volume and poorer outcome although there is no clear relationship between hospital volume and outcome. Supplementary Material Tables S1 and S2 summarise outcome data for each trust and surgeon volume category for all outcomes investigated.

Fig. 2
figure 2

Variation in portion of patients with complications, defined as length of stay > 2 days or readmission within 30 days, by surgeon and trust volume categories

Table 4 shows the findings from the multilevel logistic regression models investigating the association between volume and outcomes, after adjusting for significant covariates. A larger surgeon volume was significantly associated with lower rates of all primary and secondary outcomes investigated. A larger hospital trust volume was significantly associated with lower rates of 30-day emergency readmission and the presence of hypoparathyroidism at 1 year.

Table 4 Adjusted odds ratios for volume measures as predictors of outcome

PAR and PIN values for the primary outcome measure are presented in Table 5. The PAR increased in a broadly linear fashion with increasing surgical volume with no point of inflexion at any particular volume threshold.

Table 5 Population attributable risk and population impact number based on adjusted probabilities of the primary outcome at various surgeon annual volume thresholds

Comparison of annual volumes reported in HES and UKRETS for total thyroidectomy

From HES, of the 735 surgeons performing total thyroidectomy, 256 (34.8%) had an annual volume of all thyroidectomy procedures ≥ 20, 103 (14.0%) had annual volumes ≥ 40, and 43 (5.9%) had annual volumes ≥ 60. Within the UKRETS dataset, data were available for 41,397 thyroid procedures conducted during the same 6-year period. Of these, 13,187 (31.9%) were total thyroidectomy. Of the 243 surgeons conducting total thyroidectomy, 147 (60.5%) had an annual volume of all thyroidectomy procedures ≥ 20, 74 (30.5%) had annual volumes ≥ 40, and 33 (13.6%) had annual volumes ≥ 60.

Discussion

This study supports the findings of previously published literature that there is a surgeon volume-outcome relationship for total thyroidectomy performed in England [5, 6]. A relationship between hospital volume and outcomes was harder to detect and was only observed for hypoparathyroidism and 30-day emergency readmission.

The trend for improved outcomes with a larger surgeon volume was approximately linear for the primary outcome measure chosen. It was not possible to provide guidance on a minimum annual volume threshold. As such, we suggest an approach which seeks to minimise low-volume thyroid surgery as an initial priority. In this regard, we support the BAETS recommendation that in order to maintain expertise, surgeons performing thyroidectomy should perform at least 20 such procedures per year [18]. This volume is at the lower end of previous suggestions for minimum volume and may be conservative [5,6,7,8,9]. Regular audit and review of patient outcome data in relation to surgeon volumes, and the impact of such changes on local service provision, service viability, costs, and patient satisfaction, should help national decision-makers to make informed choices with regard to the ongoing suitability of this threshold.

The consistent finding of improved patient outcomes being linked to larger surgical volumes has been shown in many previous studies [5, 8]. Nouraei et al. [5], using HES data, discovered poorer outcomes at very low volumes, followed by a more gradual trend towards better outcomes with progressively larger volumes. A recent analysis of 16,954 patients from the US Healthcare Utilization Project Nationwide Inpatient Sample (HCUP-NIS) dataset identified a threshold of 26 total thyroidectomies per annum as a possible threshold based on the occurrence of any complication [8]. A study using UKRETS data by Aspinall et al. [6] noted a similar trend for permanent hypoparathyroidism with a similar slight increase in complications at annual volumes above 100 as seen in this study.

Recent American Association of Endocrine Surgeons guidance made a strong recommendation that: “When possible, thyroidectomy should be performed by a high-volume thyroid surgeon”, although they did not specify how high volume should be defined [11]. Earlier guidance from the American Thyroid Association [12] cited the work of Kandil et al. [19] (using HCUP-NIS) and their definition of high-volume surgery at > 100 annual thyroidectomies. The authors recognised that the relatively low number of surgeons performing this volume (18.5% of surgeons in their sample) and the concentration of these surgeons in large urban centres made this an unrealistic target [12]. Results from the present study, although not definitive, provided some evidence of a plateau in the improvement in outcomes when surgical volumes exceed 90 per annum.

A recent positional statement by the European Society of Endocrine Surgeons also concluded that a surgeon volume-outcome effect exists for thyroid surgery [20]. They identify > 50 procedures per annum per surgeon as an appropriate threshold for high volume and < 25 procedures per annum per surgeon as an appropriate threshold for low volume. However, the authors note the lack of high-quality evidence within the research literature, the dominance of studies form the USA and the potential for the findings from observational studies to be biased by unrepresentative patient cohorts, local healthcare funding and organisational models and selective referral.

A volume-outcome relationship can arise where larger volumes lead to more skilled and more efficient practice. In a setting where healthcare provision is based on health insurance schemes, selective referral can make it more likely that a volume-outcome relationship will be observed [21, 22]. Centres achieving better outcomes may receive more referrals, a guarantee of larger volumes [23]. The English public does not have great flexibility of choice of their healthcare provider [24]. As such, selective referral is unlikely to drive any observed volume-outcome relationship in our study. As our study includes all patients seen by the NHS in England, our cohorts are representative of the situation in England, although extrapolation to other settings should be undertaken with caution.

The link between volume and secondary outcomes provides some insight as to specific complications arising from thyroidectomy. At 1-year follow-up, the presence of hypoparathyroidism, vocal cord palsy, and stridor and the requirement for tracheostomy were significantly associated with a lower surgeon annual volume. That we identified a relationship for all outcomes investigated is striking compared to previous research [5,6,7, 9, 19]. All these complications are life-changing, and our findings emphasise the importance of the volume-outcome relationship at a patient level. Hypoparathyroidism, vocal cord palsy, and tracheostomy occurred in 11.3%, 2.4%, and 1.4% of patients respectively at 1-year follow-up. It is likely that vocal cord palsy is underreported in HES, being either not recognised or not reported in the immediate post-operative period. Stridor however will be both diagnosed and documented at an early stage due to patient breathing difficulty.

Permanent hypoparathyroidism, vocal cord palsy, stridor, and tracheostomy were more common following total thyroidectomy than lobectomy which supports the observation by Hauch et al. [13]. Our decision to focus on outcomes for total thyroidectomy provided a relatively homogenous dataset whilst also focussing on the procedure with the greatest potential patient impact.

A consistent relationship between hospital volume and outcomes was not identified. Although the impact of institutional volume on outcomes of thyroid surgery has been less well studied, this supports previous research [19]. This suggests that it is the judgement and skill of the operating surgeon rather than the organisation of the wider surgical team that contributes most to patient outcomes for total thyroidectomy in England. There has been some debate as to which factors are responsible for the hospital volume-outcome effect (e.g. organisation structure, staff training, equipment availability, and the directionality of the relationship (with selective referral leading to larger volume)) [22, 25]. For surgeon volume, the reason for better outcomes in larger volume surgeons are, perhaps, more clearly related to experience and evidence of practice making perfect [25].

In comparison with data from the UKRETS audit, average annual recorded volumes for total thyroidectomy were much smaller in HES. However, the number of total thyroidectomies recorded in UKRETS was less than 60% of that recorded in HES and the number of surgeons recording total thyroidectomies in UKRETS was only one-third of the number of surgeons identified as conducting total thyroidectomies in HES. Thus, it is reasonable to conclude that UKRETS contains a disproportionately large number of high-volume surgeons relative to HES and that average outcomes reported in UKRETS will be better than those described here. Reporting of thyroid surgery to UKRETS is now required of surgeons performing thyroid procedures in England. Although our data suggest that many surgeons are not compliant with this requirement, if enforced, it could be a key driver to reducing low-volume thyroid surgery in England.

Strengths

In addition to the strengths already discussed, our study uses recent data from all thyroid surgery in England. The vast majority of thyroid surgery in England is funded by the NHS and the HES database represents one of the most complete healthcare datasets globally. To maximise the homogeneity of this dataset, total thyroidectomy was used as an exemplar condition. Total thyroidectomy has a higher complication rate than lobectomy, so more suited to the study of volume-outcome relationships [13].

Limitations

The HES dataset was unable to adjust for aspects of clinical presentation, such as the stage of cancer at the point of the procedure, and it is possible that presentation could vary by volume. However, it is difficult to assess in which direction any bias may be and whether it would be systematic. It could be argued that larger volume surgeons may see more complicated cases, but conversely that patients with multiple morbidity may be more likely to be operated on by surgeons from different specialities, conducting low volumes of thyroidectomies.

By removing laryngectomies and head and neck cancer patients from our dataset, we have tried to minimise these impacts. We did not adjust for the presence of pre-surgery vocal cord palsy, stridor, or hypocalcaemia; however, they are likely to be rare. HES also contains only limited information on the procedure undertaken, such as the use of intraoperative neuromonitoring. Coding errors within HES are known to exist [26]. For this study, this is likely to result in an underestimation of complication rates. Non-clinical coders, who enter data into HES do not identify hypocalcaemia unless it is specifically written into the medical notes. Furthermore, stridor, vocal cord palsy, hypocalcaemia, and hypoparathyroidism would only be recorded if they resulted in a hospital admission. Thus, they are likely to represent a substantial underreport of the true rate of occurrence of these complications, as experienced by patients [27]. Nevertheless, they will represent the incidence of the most severe forms of these conditions. For this reason, we chose a surrogate measure for complications as our primary outcome measure. We considered that this would more accurately reflect the occurrence of a complication following total thyroidectomy.

Although both length of stay and early emergency readmission may be a function of post-operative management processes, poor post-operative care planning, leading to early readmission or extended stay, is also a valid marker for poor outcomes for both the patient and the health service. Updated coding guidance for HES and improved compliance with the UKRETS clinical audit will help improve data quality.

Conclusions

Our study supports the view that consolidation of total thyroidectomy surgery within larger volume surgeons would improve patient outcomes in England. A clear volume threshold has not been identified but reducing low-volume surgery should be a priority. There was limited evidence of an association between hospital volume and outcomes, consistent with previous research.