Introduction

When hospitals differ in cumulative revision rates (CRR) following primary knee arthroplasty (KA), it leads to assumptions of differences in the quality of surgery. This is the case for variation both among and within countries [22]. However, with pain relief and regaining knee function being the primary goals of KA surgery, CRRs may not be the most important measure of treatment quality in the vast majority of patients, i.e., those, who are never revised and whose spectrum of postoperative results are not reflected in the statistics [8]. In 2015, Danish KA surgeons recognized the need for a comprehensive comparison of the primary KA surgery performed in the three (of five) administrative regions with the greatest differences in 1-, 2-, 5- and 10-year CRRs. The goal was to determine whether CRR variations were a reflection of overall differences in the quality of primary KA surgery, defined as patients’ subjective improvement after surgery [26]. By posing this question, the focus was shifted from registry-based quality to quality based on patient assessment, patient-reported outcome measures (PROMs) and range of motion.

Each region was represented by its largest KA hospital in a prospective observational cohort study, SPARK (“Variation in patient Satisfaction, Patient-reported outcome measures, radiographic signs of Arthritis, and Revision rates in Knee arthroplasty patients in three Danish regions”). The baseline publication [16] from the SPARK study compared the three hospitals’ patient selection at primary surgery by analyzing preoperative data from 1452 patients who underwent primary KA in routine clinical settings. It was reported that patient demographics, anxiety and depression symptoms, KA incidence, implant selection and radiographic classification of knee osteoarthritis varied somewhat between hospitals, but preoperative PROMs did not. The majority of hospital differences at baseline were in opposition to well-known revision risk factors. For example, one low-revision hospital (Aarhus) used more unicompartmental implants (49%) than the others [Aalborg 14% (low-CRR) and Copenhagen 22% (high-CRR)] (Table 1). Overall, the study was unable to identify differences in baseline characteristics that could adequately explain the persistent hospital differences in CRRs [16]. The present follow-up study compares the postoperative outcomes of primary KA in the SPARK cohort and determines if patients with a certain level of symptoms or radiological knee osteoarthritis (OA) can expect similar outcomes at the three hospitals.

Table 1 Baseline characteristics of participants

Aside from differences in CRR between regions (e.g., 1.0, 2.2 and 5% per 2 years in 2015) and undocumented claims of cultural differences between east (Capital, high-revision) and west (low-revision), no previous data had led to hypotheses of quality differences across the country. Therefore, all data were analyzed under the assumption (null hypothesis) that there was no difference in hospital outcomes.

Materials and methods

The study was ethically approved by The National Committee of Health Research Ethics (Protocol no. 16038343, 2 September 2016). All patients provided written consent to participate in the study. The study was reported in accordance with the STROBE guidelines (STrengthening the Reporting of OBservational studies in Epidemiology). Register data were retrieved from the Danish Knee Arthroplasty Register and the Danish National Patient Register [24].

Patient inclusion

The prospective observational cohort study, SPARK, was conducted in three high-volume hospitals with 2-year CRRs that were representative of their region during the preceding 3 years: Aarhus University Hospital (1.9%) in Central Denmark Region (2.5%), Aalborg University Hospital Farsø (1.6%) in North Denmark Region (1.5%) and Copenhagen University Hospital Herlev-Gentofte (5.6%) in the Capital Region (4.7%) [16]. From 1 Sep 2016 to 31 Dec 2017, surgeons and employed medical students invited patients scheduled for KA (total (TKA), medial or lateral unicompartmental (MUKA/LUKA), or patellofemoral (PFA) knee arthroplasty) to participate in the study. Knee tumors, severe developmental lower limb deformities, haemophilia, dementia and severe language barriers were exclusion criteria, and this follow-up study only enrolled the 1452 patients who had provided PROM-data before surgery [16]. Questionnaires were sent by emails with unique links before surgery and 6 weeks, and 3, 6 and 12 months postoperatively. Patients who could only respond via traditional mail were only allowed inclusion in the final 6 months of the inclusion period. Some patients participated with both knees at separate occasions, and many more had bilateral knee trouble, thus the emails specifically addressed “right” or “left knee” and current follow-up time. If necessary, two reminders were sent, and at 1 year, a printed questionnaire with a postage-paid envelope was sent to non-responders. Surgeons continuously delivered surgical information as part of their routines and any missing or erroneous information was meticulously corrected using medical records.

Participants who underwent revision during the first year were not the main focus of the study. However, their PROMs were collected and reported until the day of revision. For example, a patient who was revised after 8 months could contribute to PROM analyses at 6 months, but not at 1 year. All subsequent revisions were attributed to the primary KA hospital regardless of which hospital (public or private) performed the revision. Minor surgery, such as wound debridement or manipulation under anesthesia did not result in participation cessation.

Radiographic classification of knee osteoarthritis

Blinded posteroanterior weight-bearing radiographs were used to evaluate the radiographic severity of knee osteoarthritis [23]. To facilitate fair comparisons, LUKA and PFA patients and radiographs with predominantly lateral joint space narrowing were excluded. Two radiologists graded knee OA according to Kellgren–Lawrence (K–L) classification (0–4, 4 most severe) and Ahlbäck score (0–5, 5 most severe) [16, 1, 12]. Moreover, 13 experienced KA surgeons performed thousands of “head-to-head” comparisons of the radiographs based on heuristics, i.e., “rules of thumb” and clinical experience and without the use of traditional classification systems, resulting in a complete OA severity ranking of all radiographs (further details in preceding publications [16, 18]).

Patient-reported outcome measures (PROMs)

The primary outcome, Oxford Knee Score (OKS, 0–48, 48 best), was reported as absolute score, change from baseline, and as proportions of patients achieving the Minimal Important Change (MIC) of 8 points indicating an important improvement for the average patient at 1 year [2, 6, 10, 11, 17, 25]. Copenhagen Knee ROM Scale (CKRS) assessed patient-reported passive range of motion (ROM), i.e., flexion (0–6, 6 max) and extension (0–5, 5 max) as well as estimated proportions of patients with flexion or extension deficits [14]. Every PROM set began with the generic EQ-5D-5L and EQ-VAS and a “global knee anchor” question asking, “How is your knee?” (VAS, 0–100, 100 best), and patients reported how often they used any type of analgesics for knee pain [30].

Additional questions and PROMs were added at varying time points. At baseline, height, weight, smoking, alcohol and level of urbanization data were reported [13, 29]. The Forgotten Joint Score (FJS) [3] and UCLA Activity Scale (UCLA) [15] were added from 3 months postoperatively, and UCLA was also used at baseline. At 6 months, patients answered whether they had received physiotherapeutic assistance in rehabilitation after hospital discharge. From 3 months on, patient satisfaction was measured by asking, “How satisfied are you with the overall experience of the operation and its result?” (five Likert boxes, one neutral). As the answers could be influenced by experiences related to hospital service, kindness of caretakers, etc. [5], also “willingness to repeat surgery” was reported at 1 year: “Suppose you could turn back time, would you still choose to have a knee replacement now that you know the outcome?” (five Likert boxes, one neutral).

Implants, perioperative care, and follow-up routines

The SPARK study did not interfere with local hospital routines concerning, e.g., analgesics use, aftercare, or selection of KA implants. Each hospital used a unique selection of cemented, uncemented and hybrid implants that were on the market for at least 10 years and had proven good survival in registries [24]. The predominant systems were NexGen (Zimmer Biomet), PFC Sigma (DePuy Synthes), Triathlon (Stryker), Oxford Mobile Bearing and ZUK (Zimmer Biomet) and Avon (Stryker).

In all three hospitals, tranexamic acid, glucocorticoids, local anesthetics and prophylactic antibiotics (dicloxacillin in Copenhagen, cefuroxime in Aarhus and Aalborg) were administered intraoperatively. Paracetamol, non-steroid anti-inflammatory drugs (NSAID) and opioids were the oral analgesics of choice for up to 4 weeks postoperatively. In 2017, the average length of stay for TKA patients in Aarhus, Aalborg and Copenhagen was 2.4, 1.4 and 2.2 nights, respectively, and for MUKA patients it was 0.6, 1.3 and 0.7 nights [27].

The routine preoperative multidisciplinary patient seminar included preparation training with physiotherapists (crutch walking, stair climbing, etc.). Postoperatively, Aalborg and Copenhagen patients were trained by a physiotherapist, and Copenhagen patients were routinely offered free of charge supervised physiotherapy upon discharge. Aarhus and Aalborg patients were screened 2–6 weeks after discharge to identify those in need of physiotherapy and only those whose progress was unsatisfactory after 6–8 weeks or who had abnormal findings on 1-year radiographs were referred to the surgeon for a follow-up appointment. In contrast, all Copenhagen patients saw their surgeon and had radiographs taken after 3 months.

Statistics

Based on pragmatic considerations and feasibility, the study aimed for a sample size of 1080 patients (75% inclusion rate and 80% response rate among 1800 patients) [16]. All significance tests comprised all three hospitals unless otherwise specified. In regression analyses, Aarhus was selected as the reference hospital, because it was situated between the other two hospitals in terms of geography, urbanization level, and revision rates. All observations were treated as independent data [21]. P values were two-sided with alpha level 0.05. Standard deviations were displayed as “(± SD)”. Tabular data were analyzed by Chi-square test (with Monte-Carlo correction for expected cell counts < 5), and Clopper–Pearson confidence intervals (95% CI) were provided when relevant. Non-parametric (ranked) methods (Kruskal–Wallis or Wilcoxon/Mann–Whitney U test) were used for ordinal measures (UCLA, global knee anchor, patient satisfaction, willingness to repeat, use of analgesics and radiographic classifications, while parametric methods (one-way analysis of variance (ANOVA) or t test) were applied to OKS, FJS, EQ-5D and CKRS [17]. Multiple linear regression analyses (dependent variable OKS) were conducted with both Ahlbäck and K–L, but since the overall result was not changed with classification method, only Ahlbäck-based confidence intervals (CIs) were reported. Since 1-year change scores for revised patients were unavailable, the analysis of change in OKS was repeated using imputed results (clearly specified). Analyses were carried out in R (RStudio) in Mar 2019 [20]. Data collection and Case Report Forms (CRF) were handled by Procordo Software Aps, Copenhagen.

Results

Patient inclusion

Baseline data were available for 1452 patients (68.0 ± 9 years, 45% males, 89% response rate) (Table 1). The 41 patients who participated by mail were 8 years older and more likely to be female (71%) than those who participated online. According to the post-hoc inclusion analysis, 56% of patients (62% in Aarhus and Copenhagen, 38% in Aalborg) provided baseline data for the SPARK study in 2017 [16]. Participants were younger than non-participants (67.7 ± 9 vs. 68.8 ± 11 years, p = 0.02) and more likely to be male (42 vs. 38%, p = 0.016). Implant types were equally distributed among participants and non-participants within each hospital (p ≥ 0.2) [16].

1414 patients (97%) responded postoperatively at least once, and 1307 (90%) responded at 1 year (Table 2). The response rate was comparable among hospitals (p = 0.4). In the first year, three patients left the study, seven died, and nine were lost to follow up due to errors, such as incorrect laterality or change of email address. Revision surgery was performed on 28 patients (1.9%) during the first postoperative year; 2 (0.6%) in Aarhus, 4 (2.0%) in Aalborg and 22 (2.4%) in Copenhagen (p = 0.1). Last available postoperative OKS, revision time and indication were listed for each patient (Table 3). Deep infection caused 13 revisions (1 (0.3%) in Aarhus, 1 (0.5%) in Aalborg and 11 (1.2%) in Copenhagen, p = 0.4).

Table 2 Postoperative response rates
Table 3 Characteristics of patients who were revised during the first postoperative year

Radiographic classification of knee osteoarthritis

Blinded K–L and Ahlbäck classifications of OA severity were made for 1051 radiographs (86% available of 1228, after exclusions) and radiographs were ranked from no. 1 to 1051 (no. 1 most severe) based on surgeons’ 17,767 direct comparisons [16, 18].

Patient-reported outcome measures (PROMs)

OKS at 1 year did not differ significantly among the three hospitals (39 ± 7, p = 0.1) (Fig. 1a, Table 4), nor when adjusted for age and sex, or when further adjusted for baseline OKS and EQ-VAS and variables that differed among hospitals preoperatively, i.e., BMI, anxiety and depression symptoms, and radiographic classification (Ahlbäck or K–L). OKS change at 1 year was lower in Aarhus (+ 1.6 in Aalborg, CI 0.07–3, and + 1.3 in Copenhagen, CI 0.2–2, respectively) (Fig. 1b, Table 4). This conclusion was partially modified by adjusting for age, sex and baseline OKS (+ 1.0 in Aalborg, CI -0.3–2, + 1.1 in Copenhagen, CI 0.2–2), and when additional adjustments were made for BMI, EQ-VAS, anxiety and depression and radiographic classification, there were no significant differences between hospitals (Aalborg CI -0.4–3, Copenhagen CI -0.1–2, p > 0.2).

Fig. 1
figure 1

a, b Oxford Knee Score at 1 year. A Absolute score and B change score with minimal important change (MIC) = 8 points

Table 4 Patient-reported outcomes at 1-year follow-up

At 1 year, 19% of patients in Aarhus, 13% in Aalborg, and 14% in Copenhagen did not attain the MIC of 8 OKS points (p = 0.051). To fairly account for the distribution of revised patients at 1 year, a new analysis was conducted in which all 28 revision patients, that were excluded from the latter analysis, were now assigned an imputed (hypothetic) change score below MIC. The new proportions of patients not attaining MIC in the three hospitals were now 20%, 15% and 16%, respectively (p = 0.2). Comparing the “last available” postoperative OKS change score of 1414 patients with at least one postoperative response (including 17 revision patients), hospitals did not differ significantly (21, 16 and 16%, respectively, p = 0.07).

When studied over time, OKS was higher at 6 weeks in Copenhagen (27.7 ± 7) than in Aarhus (25.6 ± 8) and Aalborg (26.1 ± 7) (p = 0.001, unadjusted) (Fig. 2a). This hospital difference was nuanced when MUKA and TKA patients were studied separately (Fig. 2b), and other PROMs did not vary over time between hospitals (Table 5 depicts the complete sample). Through the entire study period, OKS differed between TKA and MUKA patients, e.g., 1-year OKS was 38.7 and 40.3, respectively (CI 0.6–3).

Fig. 2
figure 2

a, b Oxford Knee Score during the first postoperative year in A all patients, and in B TKA and MUKA patients separately. Whiskers denote mean ± 2 × std. error of the mean

Table 5 Development of main PROMs over time after surgery (all hospitals)

Patient satisfaction and willingness to repeat surgery were no different between hospitals at 1 year (Table 4). Aalborg patients gained more in general health (EQ-VAS change, p < 0.001). Aarhus patients had better knee extension at 1 year, but when adjustments for baseline motion were made, 1-year extension (and flexion) were independent of hospital. In contrast, MUKA was associated with a greater increase in 1-year flexion (+ 0.34 CKRS flexion points, p < 0.001) after baseline adjustments when compared to TKA, but not to increased extension (p = 0.3) (Fig. 3a, b).

Fig. 3
figure 3

a, b Patient-reported a flexion and b extension after primary knee arthroplasty in the total sample, grouped by implant type (MUKA or TKA only), assessed with Copenhagen Knee ROM Scale. Whiskers denote mean ± 2 × std. error of the mean. Based on validation studies, flexion “4” corresponds to mean 101°, “5” to 121°, and 6 to 131°. In extension, “3” refers to mean 7°, “4” to 5°, and “5” to 1°

Hospital variation in results for comparable patients

When all patients were grouped by preoperative Ahlbäck or K–L classification, neither “willingness to repeat surgery”, 1-year OKS or “last postoperative OKS” varied significantly between hospitals (p = 0.09–1) (Fig. 4). An exception, however, was the “K–L 4” group of 64 patients, where the 17 Aarhus patients had 4–6 points lower 1-year OKS (CI 0.04–11) and 4–6 points lower “last postoperative OKS” (CI 0.03–10). With the frequent use of MUKA in Aarhus, it should be emphasized that only two “K-L 4” patients in Aarhus had MUKA and their 1-year OKS were 37 and 40, respectively. When patients were grouped by OKS at baseline (0–20, 21–30 and 30–48), none of the three aforementioned outcomes varied among hospitals (P = 0.2–0.5) (total sample displayed in Fig. 5). For patients with equivalent OKS results at 1 year (grouped by 10-point intervals), willingness to repeat surgery was independent of hospital (p = 0.2–0.8).

Fig. 4
figure 4

Willingness to repeat surgery at 1 year postoperatively grouped by Kellgren–Lawrence classification of preoperative knee OA and hospital, displayed as proportions

Fig. 5
figure 5

Willingness to repeat surgery at 1 year postoperatively per baseline Oxford Knee Score displayed as proportions of patients (total sample). Overlaying histogram depicts the number of patients with the specific preoperative OKS

Revision rate development

During the study period, the Danish Knee Arthroplasty Register observed a reduction in 2-year CRR variation between hospitals and regions (Table 6) [28].

Table 6 2-Year cumulative revision rates in study hospitals and according regions

Discussion

Patients who underwent primary KA surgery across three Danish high-volume centres with a history of varied CRRs had comparable postoperative results when measured with PROMs, patient-reported knee ROM, patient satisfaction and willingness to repeat surgery. Across the three hospitals, patients with comparable preoperative radiographic knee OA or symptoms (OKS) had similar postoperative OKS results and were equally willing to repeat surgery, suggesting an overall homogeneity in the quality of treatment. This contradicts the conclusion that could be drawn from implant survival statistics alone, where high hospital CRRs are generally seen as indicative of inferior surgical outcomes. The CRRs provided by national KA registries are efficient means to detect poor performance of implants, techniques, hospitals or even surgeons, but they offer little information about treatment results in the far majority of patients; those who are not revised [8, 19, 22]. Outcome of surgery is not a yes-or-no question, but rather a wide spectrum ranging from a satisfied patient with a perfectly functioning prosthesis to an ill, infected patient in definite need of revision surgery. To quantify and eventually improve surgical quality, outcome evaluation should reflect this fact.

Strengths and limitations

An observational cohort study was considered the most suitable design to explore the clinical reality behind the wide variations in Danish regional KA revision rates. With this design, however, no conclusions can be made regarding casual relationships. The three hospitals were selected to represent their respective regions; nonetheless, the results may not necessarily reflect the regional context. Despite the intention to invite virtually all primary KA patients, the average rate of participation was roughly 56%. Participants closely matched the demography and implant distribution of the total surgical population, but socioeconomic information was missing, general health data adhered from patient reports alone (EQ-5D-5L, smoking, alcohol consumption, height and weight), and patients who were unable to respond electronically were only allowed to participate in one-third of the study period [16, 7, 9]. In Aalborg, the larger gain in EQ variables was unexplained, but inclusion bias cannot be ruled out, given that only 38% participated here.

The study was strengthened by high response rates; 89% replied prior to surgery, and 97% of those participants responded after surgery (90% at 1 year).

The three hospitals differed in implant selection, introducing an important confounder which was inseparable from the hospital factor. In the total sample, there were variations in outcomes between MUKA and TKA patients, but although Aarhus used unicompartmental implants twice as frequently as the other two hospitals, implant-related differences were not readily apparent in the overall comparisons [13]. As an exception, Aarhus showed a tendency to have superior 1-year ROM. After adjusting for baseline flexion, the greater 1-year flexion gain in MUKA patients overall was + 0.3 CKRS points compared to TKA patients, which corresponds to about 5°, yet, the clinical relevance of differences in patient-reported ROM were not quantified as part of the CKRS scale validation [14].

In one low-revision hospital (Aarhus), there was a tendency of lower OKS change scores and fewer patients reaching MIC. Note, that the study cannot answer whether particular patients with poor progress would have benefited from revision surgery. The slightly higher OKS (+ 1.9) in Copenhagen patients at 6 weeks postoperatively may indicate a faster recovery that could be related to more frequent use of physiotherapy in rehabilitation. No differences were observed in other parameters, such as ROM, and when data were stratified by implant type, a different pattern was observed (Fig. 2b), suggesting that the finding may represent a sporadic and clinically insignificant variation [4].

Even though the SPARK study was motivated by regional differences in revision rates, revision surgery was not the main objective of the study, and the relatively few SPARK participants who underwent revision surgery during the first year were not expected to be representative of recent years’ practice. Yet, the contributions of the 28 patients who underwent revision were not disregarded, and efforts were made to compensate for the absence of 1-year PROMs in this group by use of transparent imputations.

The historical hospital differences in CRR were not confirmed in this cohort. This was anticipated given the sample size, but it is noteworthy that during the study period, variations in CRR did decrease at both the hospital and regional levels in Denmark [28]. This may represent a random variation or a general tendency. It cannot be ruled out that awareness of the ongoing SPARK study may have altered revision thresholds and patterns. As surgeon staffs and procedures have remained largely the same between past years and the study period, it seems unlikely that the quality of primary KA has changed at uneven pace in the three hospitals.

Having conducted this necessary comparison of primary KA results across Danish hospitals and regions, the next logical step in the search for explanations for revision rate variations would be a nationwide investigation of both revision thresholds and the benefit to patients from revision surgery. Such studies followed by discussions about revision indications and techniques might serve the patients with poor results as much as the ongoing attempts to refine primary knee replacement surgery.

Conclusions

Patient-reported results 1 year after primary knee arthroplasty were comparable across three high-volume centres whose revision rates had varied for a decade. It follows, that hospital variance in revision rates does not necessarily reflect differences in the overall quality of primary surgery. Further studies focusing specifically on revision procedures should determine whether patients across regions and hospitals are offered revision surgery on the same clinical grounds.