Introduction

The hospital volume-outcome relationship in surgery has been extensively studied over the last decade. A significant relationship has been evidenced for various surgical procedures [1,2,3,4]; in all cases, a higher operating volume was associated with better patient outcomes. Given the consistency of this relationship from one setting to another, some researchers have recommended the creation of minimum volume thresholds in order to limit the number of centres with low levels of activity [5,6,7]. This recommendation is also in line with the guidelines issued by the Expert Panel on Weight Loss Surgery [8]. These research findings prompted the French health authorities to consider the establishment of thresholds for oncological surgery in 2007 [9].

Even though the volume-outcome relationship appears to be relevant for a variety of surgical procedures and has prompted greater centralization [10], Morch et al.’s recent systematic review highlighted marked methodological differences between the studies in this field and suggested that further research should focus on the features used to assess the volume-outcome relationship [4]. These methodological disparities have been confirmed in a few publications; the significance of the volume-outcome relationship may depend on the way the outcome was explored, the covariates included in the model, the qualitative or quantitative categorization of the volume, and/or the type of statistical test applied [11, 12]. Hence, a study’s methodology can have a direct impact on its conclusion [13, 14].

Most studies of the volume-outcome relationship have assessed mortality as the primary indicator. Although this is commonly assumed to be an essential outcome, mortality alone might not be sufficient for setting thresholds on surgical activity or for closing down low-volume centres - decisions that can have dramatic impacts on inequalities in health status and access to care [15]. In contrast, the potential lack of a significant relationship with volume does not mean that mortality is not of relevance for policy makers; it is acknowledged that this variable is positively associated with the length of hospital stay [16], recovery time [17], cost of the stay [18], related morbidity [19, 20] and (for cancer surgery) disease-free survival [21, 22]. Lastly, the identification of a positive volume-outcome relationship may not be enough to set thresholds. This doubt limits the reliability of this information as a basis for decision-making and the potential modification of organizational structures.

The above observations prompted us to consider that the volume-outcome relationship should be investigated more broadly. The objective of the present scoping review was to describe features that can be used to assess the volume-outcome relationship: the type of data analyzed, the study population, the study outcomes, the covariates and confounders considered, the hospital volume, and the interpretation of the results. Hence, this review of the volume-outcome relationship is intended to help researchers to choose outcomes and covariates of interest or even to identify new variables for investigation. Ultimately, this overview might help policy makers to understand the abundancy of the scientific literature and the breadth of this issue [23, 24].

Method

The present review’s methodology (including the search and selection strategies and the analysis steps) is described elsewhere [25]. The review was conducted in six stages, as proposed by Arksey and O’Malley [26] and as subsequently modified by Levac et al. [27]. The present report complied with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (extension for scoping reviews) [28]. The main research question was as follows: how is the hospital volume-outcome relationship assessed in the field of surgery?

Suitable publications were identified according to the methodology developed by the Joanna Briggs Institute [29] (Table 1).

Table 1 Inclusion and exclusion criteria for studies of the volume-outcome relationship in surgery

The PUBMED and Scopus databases were searched with the query shown in Table 2.

Table 2 Keywords and query used to search PUBMED and SCOPUS

The publications were screened, selected and reviewed independently by two authors: a resident in public health (ML) and a medical informatics specialist (AL), both of whom had helped to draft the study protocol.

The literature was screened first by title and then by abstract, according to the inclusion and exclusion criteria (Table 3). Publications were included if they met all the inclusion criteria and none of the exclusion criteria. In each stage of the review, this method was tested on 10 publications. The two reviewers then met and checked that they agreed on the inclusion and exclusion decisions. All publications selected by either of the reviewers went through to the next step. The two reviewers’ selections were not compared at the end of the title or abstract screening steps.

Table 3 Inclusion and exclusion checklist

Lastly, the full text of selected publications were assessed for inclusion (Fig. 1). In the event of disagreement between the two reviewers, the final decision on inclusion was referred to a third reviewer (LL, who was also helped to design the study).

Fig. 1
figure 1

Flow chart

The reference lists of all selected publications were screened for additional studies meeting our inclusion and exclusion criteria.

The study data were extracted independently by the two reviewers, using a specific form (Supplementary Table 1). In the event of disagreement, the decision was referred to the third reviewer. After the data extraction form had been tested on the first 10 studies by both reviewers, it was validated as described in the study protocol [25]. No difficulties were encountered by either of the reviewers.

For each study, the following key data were extracted: first author, year of publication, country, study design, study objectives, the type of surgery, the database used to include patients, the inclusion and exclusion criteria, outcomes, confounders, statistical analyses, qualification of the volume variable, and conclusions. Using an inductive approach, the reviewers sorted the extracted data into the meta-categories listed in Table 4.

Table 4 The information extracted using the specific form, and references for each criterion

Results

Description of the publications included

We identified a total of 1010 publications in the Scopus database, and 1370 in the PUBMED database. After the removal of duplicates, 1621 publications remained. Next, 965 publications were excluded on the basis of the title, 81 were excluded on the basis of the abstract, and 172 were excluded on the basis of the full text (Fig. 1). No additional publications were included after screening the reference lists of those found in Scopus or PUBMED.

Four hundred three publications from 188 different journals were included in the review. The studies were performed in 20 different countries, with more than half performed in the United States (54.9%; n = 221).The countries in which more than 3% of the studies were performed are represented in Fig. 2. Only 1 to 3% of the included studies were realised in Australia, Sweden, Finland, France, Korea, Spain, and this rate is lower than 1% for Italy, Belgium, Norway, South Korea, Brazil, Italia and Switzerland.

Fig. 2
figure 2

Proportions of studies performed in each country

There were very few multinational studies (2.3%; n = 9).

The number of studies increased over time: a total of 24 papers were published during the period 2009-2011, whereas over 50 per year were published in 2018, 2019, and 2020.

Data sources

One hundred ninety-six different databases were used to study the volume-outcome relationship. More than half of these (54.75%; n = 221) were administrative databases, as defined by Levac et al. [31]. Nearly a third of them were patient or disease registers (29.25%; n = 118), followed by claims databases (11.00%; n = 44), health surveys (2.25%; n = 9), and clinical trials data (1.75%; n = 7). Less than 1% of the included studies were based on data extracted from electronic health records.

The surgical disciplines and procedures investigated

Among the 403 studies reviewed, the most represented surgical discipline was visceral and digestive tract surgery (37.75%; n = 152), followed by thoracic and cardiovascular surgery (11.25%; n = 45), urology (10.0%; n = 40), orthopaedic surgery (9.50%; n = 38), vascular surgery (8.0%; n = 32) and paediatric surgery (5.5%; n = 22). Other specialties were explored in less than 5% of the studies.

Ninety distinct types of surgical activity were explored. Almost half of the studies concerned oncological indications (191, 47.5%). More than 5% of the publications studied pancreatic surgery (5.24%; n = 21) followed by gastrectomy (3.74%;n = 15), esophagectomy (3.74%; n = 15), aortic and mitral valve surgery (3.74%; n = 15), rectal surgery (3.74%; n = 15), hip surgery (3.74%; n = 15), lung surgery (3.74%; n = 15), abdominal aortic aneurysms (3.49%; n = 14), and cystectomy (3.49%; n = 14). Other types of surgical activity were found in less than 3% of the studies.

In order to identify the patient populations undergoing surgery, 73.5% (n = 296) of the publications used a version of the ICD. The majority of the publications that did not use the ICD (65.1%) were based on patient or disease registers.

Outcomes and hospital volume

Hospital volume was expressed as a categorical variable only in 80.2% of the publications, as a continuous variable only in 4.3%, and as a continuous variable and a categorical variable in 15.5%.

Among the studies of volume as a categorical variable, nearly half (49.2%) used quantiles. The other studies used literature definitions (19.8%), statistically defined cut-offs (5.8%), or other methods (18.5%). 6.6% of the studies assessed the volume as a categorical value in two or more ways.

Most of the included studies had more than one outcome. Mortality was the most frequently explored outcome (79.9%; n = 321), followed by length of stay (32.0%; n = 129), hospital readmission (16.6%; n = 67), and cost (16.1%; n = 65). 61.0% (n = 246) of the studies explored an outcome other than the four just mentioned. The most frequent of these were complications rates (32.1%; n = 79), followed by failure-to-rescue (death after a major complication) (7.0%; n = 17), specific oncological issues (5.5%; n = 17), morbidity (3.5%; n = 9), and discharge status (2.7%; n = 7).

All 61 outcome variables are listed in Table 5. They were grouped into nine families: length of stay, mortality, readmission, oncological issues, cost, characteristics of the hospital stay, quality indicators, surgical complications, and medical complications.

Table 5 Outcomes explored by the reviewed studies

We have not reported the proportions for each outcome because many studies used several of these (e.g. 30-day mortality and in-hospital mortality).

Covariates included in the model, and assessment of the initial severity

The 72 types of covariates used at least once for adjusting statistical models (listed in Table 6) were grouped into the following eight families, using an inductive approach: the patient’s characteristics, the hospital’s characteristics, clinical conditions, severity assessment, details of the disease, details of the surgery, details of the hospital stay, and post-operative events. Twenty five publications did not take into account any confounders when analyzing the hospital volume-outcome relationship. Five of the 25 studies (20%) did not find a significant hospital volume-outcome relationship. In contrast, only 12.6% of the studies that took account of potential confounders did not find this relationship.

Table 6 covariates used to adjust statistical models in at least one study

Statistically significant, positive volume-outcome relationships

A statistically significant relationship between hospital volume and outcome was found in 86.6% (n = 349) of the reviewed studies. Regardless of the volume modality, the type of outcome and the covariate(s) included in the model, 86.2% (n = 347) of the studies that assessed mortality found a significant relationship. Depending on the way that the volume was assessed, either a greater hospital volume was significantly associated with a lower mortality rate or a group of hospitals with a higher volume had a lower mortality rate that a group with a lower volume. Furthermore, volume was significantly related the length of stay (in 89.1% of the studies; n = 359), cost (89.1%; n = 359), and hospital readmission (79.1%; n = 319). A hospital volume-outcome relationship was also found in 87.3% (n = 352) of the studies that explored at least one outcome other than those just listed.

This relationship was found only in 66.7% (n = 269) of the studies performed in Korea, with values of 70.0% (n = 282) in Australia, 73.7% (n = 295) in the Netherlands, and 75% (n = 302) in Canada. For all other countries, the proportion of studies having found a statistically significant volume-outcome relationship was above 85%.

The proportion of studies having found a statistically significant, positive volume-outcome relationship was similar for cancer indications (88%) and other indications (85%). The proportion was lower for paediatrics (68.2%) and plastic surgery (75.0%) but greater than 80% for other specialties (Fig. 3). A volume-outcome relationship was not evidenced for five types of surgery: benign prostate hyperplasia, cholangiocarcinoma, intra-arterial stroke treatment, intracranial aneurysms, and necrotizing enterocolitis. Four types of surgery (appendicectomy, colorectal resection, infantile hypertrophic pyloric stenosis or pancreas transplantation) featured a volume-outcome relationship in less than 50% of the studies, and 6 types (liver transplantation, hysterectomy for cancer, congenital diaphragmatic hernia, nephrectomy, total joint arthroplasty, and abdominal aortic aneurysm) featured a volume-outcome relationship in between 50 and 75% of the studies (Supplementary Figure 1).

Fig. 3
figure 3

The percentage of studies having found a significant volume-outcome relationship, as a function of the discipline of surgery assessed

Discussion

The objective of the present scoping review of the literature was to assess the ways in which the volume-outcome relationship was studied. Our main findings highlighted the diversity of the types of surgery, the types of outcome explored, and the method for exploring the volume-outcome relationship. The 403 studies included in the review variously assessed 90 types of surgery, 61 types of outcome, and 72 potential confounders.

Most of the studies (87.5%) of the volume-outcome relationship had been performed in Western countries (as defined by Huntington [43]). More than half of all the included studies were based on administrative databases (54.8%), even though the latter do not always describe all the patients treated in a given centre. In fact, some administrative databases only describe patients with social security coverage or other types of health insurance, and some (particularly in the USA) even describe only patients covered by a particular private healthcare provider. A proportion of the patient population specifically concerned by the volume-outcome issue might therefore have been excluded from these studies. In countries with low success rates, it would be interesting to look at why quality varied. The high proportion of Western countries may limit the degree to which the studies’ data can be extrapolated.

Nearly 50% of the studies assessed cancer surgery (47.5%), and a third assessed visceral or digestive tract surgery (37.8%). This distribution might not reflect actual levels of activity. By way of an example, only 8.1% of hospital stays for surgery in France in 2019 were for an oncological indication (vs. 47.5% in the present review). The corresponding values are 27.2% for orthopaedic surgery (9% in the review), 17.9% for ophthalmology (less than 5% here), 13.1% for digestive tract or visceral study (37.8% here), 9.1% for urology (10% here), and 5.5% for cardiovascular surgery (11.3% here); hence, the proportions found here do not match the activity data [44, 45]. Even when comparing our review’s results with the activity in the US reported by Stanford HealthCare in the United States in 2009, only 12.7% of operations concerned the digestive tract (37.8% here), with 2.3% for the urinary tract (10% here) and 15.2% for the cardiovascular system (11.3% here) [46]. Studies that found a positive volume-outcome relationship for rare, complex, specific types of surgery must be interpreted with caution, since they may not reflect surgical activity in general.

Although 86.6% of the reviewed studies found a statistically significant volume-outcome relationship, the results differed from one type of surgery to another. For example, a significant relationship was found in 70% of the studies of paediatric surgery and not at all for five specific types of surgery (benign prostate hyperplasia, cholangiocarcinoma, intra-arterial stroke treatment, intracranial aneurysms, and necrotizing enterocolitis).

Our review highlighted a high degree of diversity among the outcomes measured and the covariates included in statistical analyses. Even though almost 80% of the studies investigated mortality as one of their outcomes, the way it was assessed modified the end results. For example, some studies looked at 5-year mortality among a population of elderly patients in which life expectancy can be a major source of bias, whereas other looked at 1-day mortality. Cost (explored in 16.1% of the studies) always has a particular context and depends on the country in which it is studied. Indeed, the share of a given cost paid by the patient may differ markedly in the USA vs. France. Moreover, patient outcomes may be interlinked because nursing facilities in some countries (but not in others) have incentives to hospitalize residents [47].

This heterogeneity can be viewed as both a strength and a limitation, and a few studies have shown the results depend on the variable or analytical method used [13, 48]. In 2015, Yu et al. showed that categorization of volume as either a continuous variable, in quartiles or as k-means yielded different relationships with the outcome [14]. In 2018, Bernard et al. reported that four different regression models gave significantly different results for the same datasets [49].

Covariates also have a major impact on the assessment of the volume-outcome relationship. A recent study of cholangiocarcinoma resection showed that the relationship was no longer significant after adjustment for the distance travelled [50].

Volume may not be the only issue to be considered. For example, Mukhtar et al. compare high-activity years with low/medium-activity years in a San Francisco hospital year over a 15-year period; neither the complication rate nor the mortality rate depended on the surgical volume [51]. These results are suggestive of a learning curve effect. Indeed, centres that increased their volume year-on-year sometimes had better outcomes than centres with absolute volumes that were higher but decreased year-on-year [52, 53].

The study populations in high-volume centres and low-volume centres are probably not the same, and thus should be taken into account in the analytical model. Indeed, Liu et al.’s 2017 study of cancer surgery showed that patient attendance at low-volume centres was associated with a shorter travelling distance, residence in a rural area, and the absence of neoadjuvant therapy but not with the severity of their disease [54]. In 2017, Gani et al. showed that ethnic minorities, elderly patients, and patients with many comorbidities may have more difficulty accessing high-volume centres, which increases inequalities in access to care [15].

Even though the great majority of studies (in almost all surgical fields and all countries) found a volume-outcome relationship, those that explored centralization showed that having only high-volume centre had adverse effects and might not improve patient outcomes. Stitzenberg et al. reported that a marked increase in travelling distance observed after the centralization of pancreatic surgery posed a significant obstacle to accessing quality care [55] and increased inequalities in care access for specific populations - mainly in rural states [56]. Dimick et al. even suggested that given the size of the USA and the numbers of some types of surgery, nationwide local access to a high-volume facility is impossible [57].

The great variety of outcomes and covariates used to assess the hospital volume-outcome relationship, the high predominance of studies in Western countries, and the over-representation of oncological, visceral and digestive tract surgery may limit the generalizability of the studies’ results. Given the many different ways in which this relationship has been explored, policymakers should be very careful when using the conclusions of specific studies to modifying healthcare facility maps.

This review suffered from several limitations. Firstly, the study’s design as a scoping review prevented us from evaluating the methodological quality of each study included. Secondly, our predefined categories may not have been precise enough to analyse each type of study. Indeed, the database categories, the types of surgery and the statistical methods could have been more precise. However, with a view to overcoming this limitation, the extraction grid was first tested on 10 studies. Thirdly, our literature search was limited to two electronic databases (PUBMED and Scopus) and the search terms selected may not be exhaustive. Hence, other relevant publications in other databases, or presenting none of the included keywords would have been missed [58]. Fourthly, our review was limited to the scientific literature and thus did not cover the pricing data used by policy makers to take decisions about healthcare facility mapping. Lastly, we reviewed the hospital volume-outcome relationship for surgery in general. Hence, our results may be relevant from the hospital perspective rather than that of individual surgeons.

The present review is the first to provide an exhaustive overview of how volume-outcome relationship has been explored and how relevant criteria can be selected as a function of a study’s objective. Results showed that even if most of the study showed a significant volume-outcome relationship, every feature of the analysis provide a different information. In consequence, before using such results to adapt a health facility mapping, policy-makers should perform a specific study on the surgery and territory of interest. In order to help them with such analysis, this review tries to provide a set of tools for investigating the volume-outcome relationship that can be adapted depending on the desired goal.