Introduction

Total knee arthroplasty (TKA) can improve pain and function in patients with end-stage knee osteoarthritis [99] and is increasingly performed worldwide [48, 87]. Unfavourable outcomes of TKA include revision surgery, deep infection, readmissions, and mortality, though rates of mortality are low [12, 24, 87].

A hospital volume–outcome relationship exists for various surgical procedures, meaning that higher hospital volume is associated with improved health outcomes [59, 84]. Some countries have therefore centralised selected surgical procedures to high-volume hospitals [70, 86]. A volume–outcome relationship may also exist for TKA [36, 84, 106]. Previous systematic reviews [26, 62, 107] are likely out of date, and have methodical limitations. The only published meta-analysis compared TKA outcomes only between the highest and lowest hospital volume categories [107].

The aim of this systematic review was to quantify the relationship between hospital volume and patient-relevant outcomes of TKA including complications using a dose–response meta-analysis. The hypothesis was that, as with other surgical procedures, a higher hospital volume would be associated with better patient-relevant outcomes of TKA.

Methods

The reporting of this systematic review adheres to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2020 Statement [80]. The protocol was registered prospectively in the Prospective Register of Systematic Reviews (PROSPERO registration number CRD42019131209 [89] and published upfront [90].

Systematic literature search

The search strategies were developed with the support of an experienced librarian according to the Peer Review of Electronic Search Strategies (PRESS) guideline [63]. The electronic search was conducted without any limits in four databases (MEDLINE, Embase, CENTRAL, CINAHL; Supplementary Material 1) from inception to February 2020 and in trial registers (ClinicalTrials.gov, German Clinical Study Register, International Clinical Trials Registry Platform). Further sources of literature included conference proceedings, reference lists of included studies, forward citation searching (Web of Science) and contact with experts (Supplementary Table 1). No language restriction was applied. Articles published in languages other than English, German, or Italian were sent for professional translation.

Study selection

Studies with any design that (1) involved patients undergoing primary and/or revision TKA, (2) reported data for at least two different hospital volumes, and (3) analysed at least one patient-relevant outcome were included (see Supplementary Table 2 for a full list of eligibility criteria). After the duplicates were removed, two reviewers independently screened the titles and abstracts of all retrieved sources in EndNote (Clarivate Analytics, version X9.1) and assessed the full text of all potentially eligible articles. Any discrepancies were resolved by consensus or, when necessary, by consultation with a third reviewer.

Data extraction

Data were extracted independently by two reviewers using standardised data extraction sheets. Any discrepancies were resolved by consensus. The data items included study, patient, hospital and surgeon characteristics; time and country of data collection; data source; hospital volume definitions; TKA details; patient-relevant outcomes; and statistical analysis details (effect size types, confidence intervals, and confounding factors). The primary outcome was the early revision rate ≤ 12 months after TKA. The secondary outcomes were any other patient-relevant outcomes that were classified according to clinical experience as ‘main outcomes’ [41] or ‘other outcomes’. All extracted outcomes are summarised and defined in Supplementary Table 3. Study results (adjusted and/or unadjusted) were extracted separately for each hospital volume category and outcome. If data were missing or incompletely reported, study authors were contacted via email [37].

Risk of bias and publication bias

The risk of bias in the included studies was independently assessed at the outcome level by two reviewers using the Risk Of Bias In Non-randomised Studies of Interventions (ROBINS-I) tool [108]. For any outcomes with at least ten studies, assessment of publication bias was planned by visual inspection of the funnel plots for asymmetry and by applying Egger’s [31] and Begg’s tests [10].

Statistical analysis

Hospital volume was defined as the mean annual number of patients undergoing TKA. Hospital volume categories were standardised using their midpoints. For individual study outcomes, odds ratios (ORs) with 95% confidence intervals (95% CIs) were converted such that the lowest volume category was the reference.

Individual study results were plotted to visually inspect linearity (e.g. better outcomes with increasing volume) for each outcome. A random-effects linear dose–response meta-analysis according to Greenland and Longnecker [38] was used to pool ORs for outcomes reported in at least three studies with sufficient data (Supplementary Material 2). For each outcome, measurements ≤ 3 months after TKA were aggregated in one analysis and those > 3 months in another. Revisions were aggregated in three analyses: ≤ 12 months, 1–5 years, and 6–10 years after TKA. Wherever the overlap among two or more study samples exceeded 20%, only one study was selected for meta-analysis based on data completeness, sample size, and the suitability of the volume categories as criteria (Supplementary Tables 4, 5, 6). The main dose–response meta-analysis was computed using the ‘best-adjusted’ effect estimates. These were the ORs adjusted for at least one confounding variable, including age, gender, and comorbidities, but not for post- or within-intervention variables such as surgeon volume. Heterogeneity between studies was assessed using the Q test and I2-statistic [46]. Four sensitivity analyses (Supplementary Material 3) were conducted; the first analysis compared extreme volume categories (highest vs. lowest), and the second, third and fourth analyses (post hoc) studied the influence of confounding variables. An additional post hoc dose–response meta-analysis was conducted using ‘best available’ (adjusted and unadjusted) effect estimates. All meta-analyses were performed with R 3.6.3 (R Foundation for Statistical Computing, Vienna, Austria) using the metafor and dosresmeta packages [25, 116]. Outcomes that were not suitable for meta-analysis (Supplementary Material 2) were synthesised narratively using the Synthesis Without Meta-analysis (SWiM) guideline (Supplementary Material 4) [20].

Grading the evidence

Confidence in the cumulative evidence was evaluated using the Grading of Recommendations, Assessment, Development, and Evaluation (GRADE) approach [19, 41, 91, 95, 113] and applying Murad’s approach [72] for SWiM outcomes. Two reviewers independently graded outcomes using GRADEpro GDT software [64] and reached consensus during discussion.

Patient involvement

Potential TKA patients were asked for their opinions on the hospital volume–outcome relationship for TKA and their hospital preferences using qualitative methodology (focus groups and interviews). The methods and results are reported elsewhere [55].

Results

Study identification and selection

A total of 13,048 records were identified from electronic databases and trial registers, and 2266 were identified from reference lists of included articles, forward citation search, websites, and author contact. Of 347 full-text reports, 269 were excluded (Supplementary Table 7). This review included 68 cohort studies reported in 78 articles [1,2,3,4,5,6,7,8,9, 13, 16,17,18, 21,22,23,24, 27,28,29,30, 32,33,34,35, 39, 40, 42,43,44,45, 47, 49,50,51,52,53,54, 57, 58, 60, 61, 65, 67, 68, 71, 73,74,75,76,77,78,79, 81,82,83, 85, 88, 92,93,94, 97, 98, 100,101,102,103,104, 110,111,112, 114, 115, 118,119,120,121,122] with data representing the years from 1985 to 2018 (Fig. 1).

Fig. 1
figure 1

PRISMA flow diagram showing selection of articles for review

Study and patient characteristics

The majority of studies used data from North America, while 22 used data from Europe, 9 from Asia and 1 from Australia. The data were obtained from administrative databases in 47 studies, clinical registries in 18 studies, and questionnaires in three studies. The average number of patients across all studies was 222,038 (data from 65 studies), with a median of 65% females (IQR 62–69%, data from 56 studies). The patients had a weighted mean age of 71 years (data from 40 studies). Each study included a median of 486 hospitals (IQR: 43–569, data from 51 studies). In 55 studies, the population was limited to primary TKA patients, 12 included primary and revision TKA patients, and one study did not specify the type of TKA. The study and patient characteristics of studies reporting primary and main secondary outcomes are shown in Table 1, and the characteristics of all 68 included studies are shown in Supplementary Table 8.

Table 1 Study characteristics with primary and main secondary outcomes

Study results

Individual study results are reported for all adjusted or unadjusted outcomes by hospital volume category in Supplementary Tables 4 and 9, respectively, and are summarised for the primary outcome (early revision rates) in Table 2.

Table 2 Study results and risk of bias for early revision

Risk of bias

The risk of bias was moderate for 30 study outcomes, serious for 168, and critical for 3 (Supplementary Table 10). Bias was suspected mostly due to potential confounding, since most effect estimates were not appropriately adjusted for age, gender, and comorbidity.

Primary outcome: early revision rate

A higher hospital volume may be associated with a lower early revision rate (7 studies [5, 50, 54, 61, 65, 82, 83], narrative synthesis Table 2, low certainty evidence). Five studies with a high risk of bias, which accounted for 261,243 of 301,378 (87%) patients in total for this outcome [50, 54, 61, 65, 83], reported lower revision rates for higher volumes. In contrast, the only study with a moderate risk of bias [5] found that a higher hospital volume (> 125 TKAs/year) was associated with a higher early revision rate.

Main secondary outcomes

The results of the linear dose–response meta-analysis of best-adjusted effect estimates are presented in Table 3 (main secondary outcomes), Supplementary Table 11 (other secondary outcomes) and Supplementary Table 12 (post hoc linear dose–response meta-analysis using ‘best available’ effect estimates).

Table 3 Results of linear dose–response meta-analysis of best-adjusted effect estimates (main secondary outcomes)

Revision

There was no evidence for a linear dose–response relationship between hospital volume and revision rate within 1–5 years (OR = 0.96 per 50 TKAs/year increase, 95% CI [0.86–1.07]; 5 studies [5, 50, 51, 54, 73], I2 = 98%, very low certainty, Table 3). This finding was robust to sensitivity analyses (Supplementary Tables 13, 14, 16).

The relationship between hospital volume and revision rate within 6–10 years was inconsistent (narrative synthesis, 5 studies [5, 9, 30, 81, 97], very low certainty).

Mortality

A higher hospital volume is likely associated with a lower mortality rate ≤ 3 months (OR = 0.91 per additional 50 TKAs/year, 95% CI [0.87–0.95]; 9 studies [45, 51, 52, 54, 60, 76, 83, 98, 104], I2 = 51%, moderate certainty, Table 3, Fig. 2a). The direction of this relationship was robust to sensitivity analyses (Supplementary Tables 13–16), although the pooled OR was no longer significant when the analysis included only data that were also adjusted for surgeon volume (Supplementary Table 15).

Fig. 2
figure 2

Linear dose–response meta-analysis for mortality (a) and readmission (b)

Deep infection

There was no evidence for a linear dose–response association between hospital volume and the rate of deep infection within 1–4 years (OR = 1.03 per 50 additional TKAs/year, 95% CI [0.97–1.09], 3 studies [4, 8, 74], I2 = 0%, very low certainty, Table 3). However, the sensitivity analysis comparing highest vs. lowest volume categories showed that higher hospital volume may be associated with a higher rate of deep infection (OR = 1.60; 95% CI [0.91–2.82], I2 = 54%, Supplementary Table 13).

Adverse events

Due to the heterogeneous clinical definitions of adverse events in the primary studies (Supplementary Table 3), this outcome was not pooled. The relationship between hospital volume and adverse event rates ≤ 3 months was inconsistent across studies in a narrative synthesis (Supplementary Tables 4, 9), and the certainty was very low based on 7 studies [52, 54, 60, 77, 94, 98, 118].

Readmission

A higher hospital volume was likely associated with a slightly lower readmission rate ≤ 3 months (OR = 0.98; 95% CI [0.97–0.99], 3 studies [7, 81, 122], I2 = 44%, moderate certainty, Table 3, Fig. 2b). The direction of this relationship was robust to sensitivity analyses (Supplementary Tables 13, 14), although the relationship was no longer statistically significant when only unadjusted effect estimates were included (Supplementary Table 16).

Other secondary outcomes

Limited evidence (Supplementary Table 6) showed that higher hospital volume may be associated with lower rates of the following outcomes:

  1. 1.

    Composite adverse events including mortality ≤ 3 months [22, 40, 57, 98, 104],

  2. 2.

    Any infection ≤ 3 months [45, 98, 104, 118] and > 3 months [22, 54, 104]

  3. 3.

    Length of hospital stay [1, 32, 33, 45, 47, 51, 54, 60, 68, 76, 81, 83, 85, 110, 111, 118, 121],

  4. 4.

    Pneumonia ≤ 3 months [52],

  5. 5.

    Superficial infection ≤ 3 months [7, 49, 78] and > 3 months [3, 71, 101],

  6. 6.

    ‘Surgical complications’ as a composite outcome ≤ 3 months [18, 40, 47, 83, 94],

  7. 7.

    Thromboembolic events ≤ 3 months [45, 52, 98, 104] and > 3 months [104] and,

  8. 8.

    Thrombophlebitis ≤ 3 months [104] and > 3 months [104].

Hospital volume may be associated with function ≤ 3 months in a U-shaped relationship [42, 49]. Specifically, postoperative mobility at discharge appeared to be highest at hospital volumes of approximately 300–400 TKAs/year, and hospitals with lower or higher TKA volumes had worse outcomes [49].

There was no evidence for a relationship between hospital volume and the rates of the following outcomes:

  1. 1.

    Deep infection ≤ 3 months [52, 58],

  2. 2.

    Mortality > 3 months [22, 40, 57, 98, 104],

  3. 3.

    Myocardial infarction ≤ 3 months [17, 52, 98],

  4. 4.

    Quality of life > 3 months [115],

  5. 5.

    Readmission > 3 months [51] and

  6. 6.

    Wound haematoma or secondary haemorrhage ≤ 3 months [78].

Although patient satisfaction was reported in two studies [32, 92], we did not synthesise the results due to critical risk of bias.

Certainty of evidence

Table 4 shows the GRADE assessment and summary of findings for the primary and main secondary outcomes. The individual GRADE domains and the certainty of evidence for the other secondary outcomes are shown in Supplementary Tables 5 and 6, respectively. The certainty of evidence was moderate for 4 outcomes, low for 7 outcomes, very low for 15 outcomes and not assessed for 1 outcome.

Table 4 Summary of findings and certainty of evidence (GRADE)

Discussion

The current systematic review reports the results of a dose–response meta-analysis of 68 cohort studies that assessed the relationship between hospital TKA volume and patient-relevant outcomes. As hypothesised, higher hospital TKA volume may be associated with a lower rate of early revisions and is likely associated with small reductions in mortality and readmission ≤ 3 months after TKA. Earlier systematic reviews by Critchley [26] and Stengel [107] also found small reductions in mortality with increased hospital TKA volume, whereas Marlow [62] found no evidence for this association.

The certainty of evidence of the synthesised results was reduced by the relatively high risk of bias resulting from the observational design of the primary studies, which lies in the nature of the topic. Furthermore, the selection of endpoints for this systematic review was limited to morbidity and mortality, which are more widely recorded than outcomes related to function and quality of life. As a result, the association of hospital volume with improvements in function, quality of life, and pain reduction (the primary goals of TKA) could not be assessed. Mortality may not be the most relevant endpoint to study from a patient perspective, and overall event rates are very low. Nevertheless, the results may be may be clinically relevant at the population level.

Higher hospital volume does not directly result in improved patient outcomes but, rather, acts as a proxy measure for quality [66, 70]. Three general explanatory factors for the hospital volume–outcome relationship have been identified for various medical procedures: level of specialisation, hospital-level factors including nursing staff and facilities, and compliance with evidence-based processes [66]. In addition, there is a tendency for a surgeon volume–outcome relationship in TKA surgery [69]. Based on the results of this systematic review, surgeon volume could constitute one aspect of the hospital volume–outcome relationship, since the meta-analysis no longer showed a significant association with mortality when only data adjusted for surgeon volume were included (Supplementary Table 15). In several types of cancer surgeries and cardiovascular procedures, surgeon volume accounts for a large proportion of the effect of hospital volume [15]. Therefore, the authors interpret hospital volume as a proxy for quality, of which surgeon volume is one element. Additional confounders exist, e.g. patient characteristics [26] and changing suppliers of implant systems [105].

Understanding the volume–outcome relationship is important in light of discussions regarding the centralisation of surgical procedures to specialised hospitals [14, 62]. These results suggest that centralising TKA surgery may improve patient outcomes. A drawback of centralisation is that it may increase patients’ travel burden and reduce access for disadvantaged patients [14, 56, 66, 96].

Future studies should adhere to reporting guidelines [11, 117] so that their data can be used more effectively for further research. To evaluate whether the volume–outcome relationship for TKA is non-linear, a future primary study could use multinational registry data. Measurement of patient-reported outcomes in the context of the hospital volume–outcome relationship is desirable.

This systematic review has several limitations. First, the results are based on a relatively small number of studies for most outcomes, although a large number of studies were included in this systematic review. This was because primary studies did not report the same outcomes, and time points or data required for the dose–response meta-analysis were missing. Second, the small number of volume categories in the primary studies may have hidden non-linear relationships, which could therefore have gone undetected by a dose–response meta-analysis. Third, the applicability of the results to other healthcare systems is limited because a large proportion of data were collected in North America. Fourth, there was considerable between-study heterogeneity for most outcomes, probably due to inconsistent methodology in primary studies, variation among healthcare systems and regulatory approaches, and different periods of data collection. Sources of heterogeneity could not be explored by subgroup analysis because there were fewer than three studies per subgroup for each outcome. However, when the highest and lowest volume categories were compared, heterogeneity decreased, and pooled effect estimates showed strengthened associations between hospital volume and outcomes. Fifth, it was not possible to assess publication bias because fewer than ten studies per outcome were included in the dose–response meta-analyses [109]. Because of these limitations, conclusions should be drawn from the direction and dimensions of the hospital volume–outcome associations rather than the exact numerical values of the pooled effect sizes.

Conclusion

Policy makers need solid evidence when regulating surgical procedures. The results for TKA show that there is moderate to low certainty evidence for an inverse hospital volume–outcome relationship for the outcomes of mortality, readmissions and early revisions. These small reductions in unfavourable outcomes may be clinically relevant at the population level. This finding supports the centralisation of TKA surgery to high-volume hospitals.