Introduction

Assessment of the quality of knee arthroplasty (KA) surgery is traditionally based on cumulative revision rates (CRR) [29]. According to data from national arthroplasty registries, there are significant CRR differences between countries and large and statistically significant differences within countries and between hospitals [7, 33]. These observations are rarely discussed and attempts to explain the variation often focus on implant selection. Data from the Danish Knee Arthroplasty Register show a statistically significant variation across the five administrative regions in Denmark for 1-, 2-, 5- and 10-year cumulative revision rates (CRRs) [42]. The CRR of the Capital Region has persistently been the largest and lower rates have been seen with increasing distance from the capital, Copenhagen (Fig. 1). For instance, in 2015 when this study was initiated, the 2-year CRR was 5.0% in the Capital Region, 2.2% in Central Denmark Region and 1.0% in North Denmark Region [42]. Variations among regions or hospitals can occur by chance, but consistent differences in CRRs could indicate systematic differences in the indications for the primary procedure, patient demographics, the quality of surgery including implant selection, or indications for revisions – or combinations of these. Demographics, preoperative knee symptoms and the severity of radiographic knee osteoarthritis (OA) are all factors that are associated with the degree of postoperative patient satisfaction and the risk of revision [6, 11, 14, 15, 29, 35]. These variables, however, have not specifically been compared across hospitals with varying revision rates following KA.

Fig. 1
figure 1

2-year revision rates, Danish regions. Overview of 2-year revision rates after primary knee arthroplasty in the Danish regions (The Danish Knee Arthroplasty Register, Annual Report 2016). The three study hospitals are mapped

For the three Danish regions in issue, register data provided no explanation for the CRR differences, and apart from undocumented assertions of cultural differences between regions, there were no hypotheses regarding the factors that might be responsible. This motivated the initiation of the prospective observational cohort study, SPARK (“Variation in patient Satisfaction, Patient-reported outcome measures, radiographic signs of Arthritis, and Revision rates in Knee arthroplasty patients in three Danish regions”). The present part of the SPARK study aims to compare patient characteristics, knee radiographs, implant selection and patient-reported outcome measures (PROMs) obtained before primary KA in a large hospital of each region, and to investigate whether hospital variations in patient selection were associated with the CRR differences. Postoperative outcomes will be reported in a separate publication.

Materials and methods

The National Committee of Health Research Ethics provided ethical approval (Protocol no. 16038343, 2 September 2016) and all patients gave their written consent to participate. Reporting adheres to the STROBE guidelines for observational cohort studies.

Patient inclusion

This prospective observational cohort study invited the largest knee arthroplasty university hospital in each of the three Danish regions that differed most in revision rates after KA surgery: Aarhus University Hospital in the Central Denmark Region, Aalborg University Hospital Farsø in the North Denmark Region and Copenhagen University Hospital Herlev-Gentofte in the Capital Region. Revision rates for each of the three hospitals were comparable to those for the region as a whole (Table 1) [42]. All hospitals were public (94% of primary KA’s were performed in public hospitals in 2017) [43].

Table 1 Latest cumulative revision rates in study hospitals and according regions at study start in 2016 (means of preceding 3 years)

From 1 September 2016 to 31 December 2017, patients who were scheduled for primary KA, i.e., total (TKA), medial/lateral unicompartmental (MUKA/LUKA) or patellofemoral arthroplasty (PFA) were eligible for inclusion. Participation did not interfere with implant selection or surgical routines. Exclusion criteria were knee tumors, hemophilia, severe developmental lower limb deformities, dementia or language barriers that could not be overcome by help from relatives. Patients unable to answer questionnaires online were excluded, with the exception of the final 6 months of the inclusion period (July 2017-Dec 2017) during which participation via paper questionnaires was permitted.

Patients were recruited for the study by the surgeon (Aarhus and Aalborg) or by an employed medical student (Copenhagen). Two days later, patients received an email with a unique link to the preoperative PROM set or a letter with the same content. Up to two email reminders were sent, three days apart, if necessary. To avoid confusion among patients with bilateral knee trouble, the email specified that the knee scheduled for surgery was the object of the study. Patients planned for surgery on both knees could participate twice if the operations were conducted on separate occasions, while patients with simultaneous bilateral surgery were asked to choose which knee to participate with in advance [24]. Since PROMs were the cornerstone of this study, patients who failed to complete the questionnaire prior to surgery were excluded.

Post-hoc quantification of inclusion rates and demographic comparisons between participants and non-participants were conducted. As the time from inclusion to surgery varied, these analyses were based upon registered surgical activity during a certain time period (1 Jan to 31 Dec 2017) [36].

Patient-reported outcome measures (PROMs) at baseline

Knee-specific PROMs included Oxford Knee Score (OKS, 0–48, 48 best) as the primary outcome [3, 5, 24, 40], UCLA Activity Scale (0–10, 10 highest) [23, 37] and Copenhagen Knee ROM Scale (CKRS) assessing patient-reported passive range of knee motion (flexion 0–6 (6 max), extension 0–5 (5 max))Footnote 1 [21, 22]. All knee-specific questions were preceded by the generic EQ-5D-5L and EQ-VAS [41, 44] and a global knee anchor question, “How is your knee?” (Visual Analogue Scale (VAS) from “My knee does not work at all or is extremely painful” to “My knee is pain-free and functions normally”, 0–100, 100 best). Patients’ motivation for the surgery was evaluated by marking up to 5 of 13 common reasons provided, based on explorative interviews with 35 knee OA patients (unpublished) or adding one free-text reason.

Patients reported their height and weight as well as additional health and lifestyle information, including their degree of urbanization (”city/suburb”, “small town/village” or “countryside”), daily smoking (“yes”/”no”), and alcohol consumption (more or less than two standard drinks (12 g alcohol) per day). Patients were asked whether the knee was their main physical disability, and “How often do you take painkillers due of your knee?” with five answer options ranging from “more than once per day” to “rarely or never” (full wording in Table 3).

Radiographic classification of knee osteoarthritis

The severity of tibiofemoral OA was assessed in blinded preoperative weight-bearing postero-anterior knee radiographs with the knee flexed 15–30° [2]. Patients listed for PFA or LUKA and those with predominantly lateral OA on radiographs were excluded from this analysis because the radiographic basis for surgery could not be fairly assessed without tangential (Skyline, Merchant) or flexed (Rosenberg) views, respectively.

Two radiologists with expertise in musculoskeletal radiology viewed the radiographs in a random sequence. First, the Ahlbäck classification (0–5, 5 severe), and secondly, in a new round of random order, the Kellgren-Lawrence classification (K-L, 0–4, 4 severe) was recorded for each patient [1, 13, 17]. In case of disagreement, both radiologists reevaluated each radiograph together and reached a consensus. Using a novel heuristic-based method, radiographs were evaluated free of classifications by 13 experienced knee arthroplasty surgeons from all five Danish regions. Each surgeon was presented with the knee radiographs in pairs and was asked to choose the radiograph that they expected would cause the most severe knee symptoms, not considering any formal grading system but instead using their personal experience and heuristics, i.e., “rule-of-thumb”. These thousands of comparisons resulted in a complete ranking of all radiographs [28].

Incidence of surgery and implant selection

The incidence of primary KA on a regional level was retrieved from the National Patient Register by NOMESCO procedure code KNGB (age > 40 years and subgroup 60–79 years). The CRRs for the hospitals (Table 1) were retrieved from the Danish Knee Arthroplasty Register (97% completeness). On an individual level, the medical record was consulted in case of a mismatch in laterality or implant type from inclusion to postoperative registration.

Statistics

Sample size and inclusion period were determined by clinical relevance and feasibility. Throughout the study period, around 1800 operations were anticipated and with a 75% inclusion rate and 80% response rate, 1080 responses would be ready for analysis. Any regional variations that were not detectable in a sample of this size were considered clinically irrelevant to the overall study question.

All analyses were based on the null hypothesis that patient selection was identical across the three hospitals. Due to the explorative nature of the study, additional data-driven analyses were allowed [30]. All tests were unpaired as though each knee belonged to a unique participant [32]. OKS and EQ-5D data were treated as numeric variables [24], as were knee flexion and extension [21], while Ahlbäck, K-L, surgeons’ ranking, and UCLA ratings were ordinal. A separate article describes the statistical details of heuristics-based assessment of radiographs using the Bradley-Terry model for paired comparisons [28].

Unless otherwise specified, statistical tests compared all three centres, not one against the mean. The significance of difference tests depended on the type and structure of data: Chi-square test for dichotomized variables, unpaired t-test or one-way analysis of variance (ANOVA) for parametric variables and Mann–Whitney U or Kruskal–Wallis test (> 2 groups) for nonparametric (ordinal) data. General linear regression models were used to estimate the effects of independent numerical variables on dependent variables, and when adjustment for confounders was relevant, multiple linear regression analyses were conducted (noted in text). Aarhus was selected as the reference hospital as it was situated between the two other hospitals in terms of geography, urbanization and CRR, i.e., the disparities known prior to inclusion. The level of significance was set to 0.05 (two-sided) and 95% confidence intervals (CI) were supplied when relevant. Data collection and Case Report Forms etc. were handled by Procordo Software Aps, Copenhagen. In Mar 2019, analyses were conducted using R (RStudio) [31].

Results

Patient inclusion

Questionnaires were sent to 1704 patients (Fig. 2), 52 of those through letter. In 32 cases, the email address or laterality was wrong, or a technical error occurred, and 48 patients had their procedure cancelled or postponed beyond the research period. Consequently, 1624 patients received a questionnaire, and 1452 patients (89%) completed the PROM set at a mean of 29 days before surgery, spending an average of 12:30 min each patient. The 53 patients who participated with separate knees accounted for 7.3% of responses.

Fig. 2
figure 2

Inclusion flowchart. See text for details

In the post-hoc inclusion analysis, 1924 patients were operated in 2017 at the three hospitals, 1083 of whom (56%) provided PROM data for this study; 62% in Aarhus, 38% in Aalborg and 62% in Copenhagen (Table 2). Non-responding participants were evenly distributed across hospitals (Aarhus 7.0%, Aalborg 8.2% and Copenhagen 10.0%, p = 0.2, Chi-square). In the total 2017 patient population, Copenhagen patients were older (mean 68.8 y.) than those at the other two hospitals (Aarhus 67.1 and Aalborg 67.6 y., p = 0.006, ANOVA). In Aalborg, there were more male patients (48%) than in Aarhus (39%) and Copenhagen (38%), (p = 0.005, Chi-square). The proportion of male inhabitants aged 60–79 years in each region ranged from 47% (Central Denmark and Capital Region) to 49% (North Denmark Region) [36].

Table 2 Inclusion analysis based on complete surgical activity in 2017

In the SPARK cohort, males and younger patients more often agreed to participate in the SPARK study than females and older patients (Table 2). Further analyses (not shown) found that the distribution of implant types within each hospital did not differ between participants and non-participants (p ≥ 0.2, Chi-square).

SPARK participants from Copenhagen had a mean age of 68.6 years, 1.4–2.0 years older than those in Aalborg (67.3 y.) and Aarhus (66.6 y.), respectively (p = 0.002, ANOVA) (Table 3). Male sex was more prevalent in Aalborg (56%) than in Copenhagen (43%) and Aarhus (45%) (p = 0.002, Chi-square). In Aalborg, males (68.8 y.) were 3.5 years older than females (65.3 y.) (CI 1–6, t test), whereas in the other hospitals, there was no significant difference. BMI (mean 29.5 ± 5 kg/m2) was lower in the elderly (− 0.13 kg/m2/year, CI − 0.16-(− 0.11), linear regression) and higher in females (+ 0.69 kg/m2, CI 0.2–1.2) as well as in Aalborg patients (+ 1.5–1.7 kg/m2, p < 0.001), even after adjusting for age and sex (+ 1.4–1.9 kg/m2, adjusted). There were no differences between hospitals for smoking, alcohol consumption, physical activity level (UCLA) or self-reported general health (EQ Index and VAS). Except for smoking, males reported significantly higher levels of these parameters (Table 3) as compared to females (significant on hospital level for ULCA and alcohol consumption, only). In sub-analyses of EQ-5D-5L items, 76% of Aalborg patients were “neither anxious or depressed”, compared to 66% in Aarhus and 62% in Copenhagen (p = 0.01, Kruskal–Wallis). This hospital difference was only significant among females (p females = 0.03, males = 0.3). The 41 patients who responded by letter (75.9 y) were 8.1 years older than those who responded via email (67.8 y) (CI 6–10, t-test) and 29 (71%) were female (54% in the email group, p = 0.05, Chi-square).

Table 3 Preoperative data from 1452 responding patients

Patient-reported outcome measures (PROMs) at baseline

OKS at baseline did not differ among patients in the three hospitals (23.3 ± 7, p = 0.9, ANOVA) (Table 3, Fig. 3), even after adjusting for age, sex and BMI (multiple linear regression). The same was true for use of analgesics, knee flexion and the global knee anchor (Table 3). Extension deficits were more prevalent in Aalborg (62 vs. 45–46%, p = 0.007, Chi-square). Males scored 2.8 OKS points higher than females in all hospitals (CI 2–4, t test) and reported less frequent use of analgesics, while the sex difference in the overall perception of the knee condition (global knee anchor) was not significant (p = 0.1, Mann–Whitney U). OKS was significantly lower (− 2.6 points, CI − 3−(− 2), t-test) in obese patients (BMI > 30) and in smokers (− 1.5 points, CI − 3−(− 0.4), t test). There were no hospital differences in patients’ motivations for surgery (p ≥ 0.1, Chi-square), but stratification by implant and sex revealed significant variation (Table 4).

Fig. 3
figure 3

Oxford Knee Score at baseline. Distribution of preoperative Oxford Knee Score per hospital (Kernel density plot)

Table 4 Patients’ motivations for surgery (total SPARK cohort)

Radiographic classification of knee osteoarthritis

Exclusions were made for 50 PFA, 7 LUKA patients and 167 patients with predominantly lateral OA. 177 radiographs were unavailable due to logistical matters unrelated to the patient, leaving 1051 radiographs (86% of those possible) ready for analysis. The two radiologists reached a moderate interobserver agreement of 0.59 (weighted Kappa) for both K-L and Ahlbäck [17]. Prior to consensus, they disagreed in 29% (K-L) and 41% (Ahlbäck) of cases, respectively. The surgeons’ heuristics-based evaluations (17,767 comparisons) ranked all radiographs from number 1 (most severe) to number 1051 [28].

Knee OA severity was unevenly distributed across hospitals according to K-L classification and surgeons’ ranking (Table 3). Mild degrees of knee OA (K-L/Ahlbäck < 2) were less prevalent in Aarhus patients (p = 0.01 (K-L), p = 0.01 (Ahlbäck), Chi-square test), and surgeons’ ranking of OA was more severe in Aarhus (p < 0.001, Kruskal–Wallis test) (Table 3). Radiographic classifications and urbanization level were not associated (p > 0.4, Kruskal–Wallis tests). Males had significantly more advanced OA than females using all three radiographic evaluation methods (p ≤ 0.01, Mann–Whitney U). On a hospital level, this difference was significant in Copenhagen (p ≤ 0.03), partly significant in Aarhus (p = 0.009–0.09), and not significant in Aalborg (p = 0.9).

Incidence of surgery and implant selection

In Capital Region, the incidence of primary KA surgery in patients aged 60–79 years in 2017 was 28% higher than in Central Denmark Region and 15% higher than in North Denmark Region (Table 5). 22 surgeons treated the SPARK patients: 4 in Aarhus, 6 in Aalborg and 12 in Copenhagen. All surgeons were exclusively occupied with joint replacement surgery, except for five surgeons in training programs, who were responsible for fewer than six operations each and were evenly distributed among hospitals. With the exception of one surgeon in each hospital, the staffs had remained stable over the preceding years.

Table 5 Regional incidence of primary knee arthroplasty per region in the year 2017

Implant selection varied widely across hospitals (Table 3). Overall, MUKA patients (67.0 ± 9 y) were 1.7 years younger (CI 0.6–3, t test) than TKA patients (68.8 ± 9 y), more likely to be male (52 vs. 44%, p = 0.01, Chi-Square), had a lower BMI (28.1 vs. 29.2 kg/m2, i.e., − 1.1 kg/m2 CI − 1.7−(− 0.5), t-test) and reported 1.4 points higher OKS (24.3 vs. 22.9, CI diff. 0.6–2, t test) and 3.9 (CI 1–6, t test) points better general health (EQ-VAS 64.5 vs. 60.6). In Aarhus, which had the highest frequency of MUKA use (40% MUKA, 51% TKA), there was no difference in age, sex or BMI between two patient groups (Table 6). In contrast, group differences were more pronounced in self-reported health (EQ-VAS), global knee anchor and patient-reported knee range of motion, e.g., preoperative flexion was 0.5 points better in MUKA patients (equivalent to approximately 5–10 degrees) [21].

Table 6 Characteristics of TKA vs. MUKA patients in hospitals grouped by frequency of MUKA use

Discussion

All hospitals had comparable preoperative PROM scores, indicating comparable symptom states prior to primary knee arthroplasty. Particularly four findings were unexpected in relation to commonly accepted revision risk factors: A very high percentage of patients from a low-revision hospital (Aarhus) were treated with unicompartmental implants, patients in both low-revision hospitals were younger than those in the high-revision hospital (Copenhagen), and the mean BMI and percentage of male patients was greater in one low-revision hospital (Aalborg) than elsewhere. Based on the literature, a higher risk of revision was expected in these four situations [9, 11, 29, 39]. In contrast, the more severe radiographic knee OA in a low-revision hospital (Aarhus) was consistent with previous findings [4, 6, 35]. The summarized findings show that the historical differences in revision rates among the three centres studied cannot easily be explained by variations in preoperative patient characteristics (Table 7).

Table 7 Summary of main findings

Strengths and limitations

Due to the observational nature of the study, causal conclusions cannot be drawn. Also, when a number of parameters are investigated, some significant differences will be discovered that are not necessarily reproducible or clinically important, as may be the case for e.g. the small difference in knee extension [21]. Similarly, the magnitude and clinical relevance of hospital differences in age or BMI may be debatable.

It is an important strength that the results were based on patients treated in routine clinical settings. Surgeons were not aware of any changes to patient selection practices during (or leading up to) the study period, so it was assumed that the study reflected standard hospital practice. However, the differences in treatment routines across hospitals introduce a massive amount of bias that cannot be compensated for through analyses, the most important probably being implant selection; Aarhus offered unicompartmental implants to 49% of all patients and only here, the choice between TKA and MUKA did not appear to be influenced by age, sex or BMI, an approach supported by recent literature [20, 25].

Response rates were relatively high. Numerous PROMs from nine out of ten participants in conjunction with radiographic OA classifications should provide a valuable reference set for future comparisons [42]. However, not all potential candidates were included, inevitably resulting in bias. To make the inclusion process feasible, no information was collected regarding patients who were not invited or declined participation and the reasons why. The surgeons and medical students in charge of patient recruitment reported that inclusion was occasionally overlooked or not prioritized, but patients were eager to participate. One could argue that the electronical collection of PROMs posed a threat to patient representation. However, Danish citizens are among the most IT-literate in Europe (2 out of 3 Danish citizens > 65 years used the internet daily in 2017) [45] and knee OA patients have previously preferred electronic questionnaires over paper ones [10]. Though the demography of the SPARK cohort largely resembled the surgical population of 2017 and the underlying hospital differences in demography were reflected in the SPARK cohort, males and young patients were overrepresented in the study. Participants without email address were 8 years older than others and were only allowed participation in the 6 of the 16 inclusion months. Therefore, it must be assumed that some of the oldest and possibly least resourceful patients were excluded, resulting in additional inclusion bias. Objective information regarding comorbidity or socioeconomic factors could have revealed important hospital differences in baseline health [40]. As a proxy of socioeconomic factors, 10% of men and 8% of women in age group 65–74 years reported daily smoking; this proportion was lower than the 17% and 14% reported in the National Health Profile 2018 [12]; however, smoking is associated to lower risk of OA (Relative Risk 0.80) [16].

In Aalborg, the low inclusion rate threatened the generalizability of results. A low level of self-reported anxiety and depression (especially among females) here may be a reflection of daily practice or may result from inclusion bias. The high proportion of males among patients undergoing KA surgery was a general tendency in Aalborg.

In this study, urban–rural variations in radiographic classifications were minimal. This may be due to the relatively small geographical distances in Denmark: almost all citizens live within a 1.5 h drive of a KA centre [27, 34]. In Aarhus, which is located in a region with a KA incidence 18–22% lower than the Capital Region, fewer patients with mild degrees of radiological OA underwent surgery. This would suggest that not all patients in Capital Region would have been offered (or accepted) primary KA surgery if they had lived in the Central (or North) Denmark Region. Utilization of primary KA is known to vary across economies and countries, for example by a factor of ten between countries in the Organization for Economic Co-operation and Development (OECD) alone [27, 29]. In welfare countries, the utilization of KA varies by a factor of two [26], and there are large regional variations within countries (Finland 1.6, Germany 1.8 and Spain factor of 27) [8, 19, 34]. In this light, the Danish variation in KA incidence by a factor of 1.3 is negligible. Regional variations in the threshold for primary KA surgery are not necessarily explained by the actions of knee surgeons alone [38]; expectations for surgery and risk aversion among patients, physicians and other caregivers (e.g. physiotherapists) the number of patients admitted for orthopaedic evaluation [18]. Therefore, the optimal comparison of patient selection should also include knee OA patients treated outside of hospitals and with non-surgical methods.

Conclusions

The observed hospital variations in patient selection prior to primary knee arthroplasty were not associated with well-known revision risk factors to an extent that could reasonably explain the persistent differences in revision rates among three Danish high-volume hospitals. These baseline data provide the basis for comparing postoperative outcomes within the same cohort.