Introduction

In any human activity requiring learning, repetition improves results, especially in manual activities. Surgery is a clear example of this situation. Therefore, we assume that a surgical learning curve is always present and that surgical expertise can only be achieved after many years of clinical practice [1,2,3,4].

Computer-assisted navigated knee replacement provides surgeons with quantitative measurement tools for real-time assessing lower limb alignment and kinematics [5, 6]. It is a powerful instrument for intraoperatively supporting and guiding the surgeon in the adequate postoperative soft tissue balance of the knee [7,8,9,10].

Patient-reported outcome measures (PROMs) are standardized, validated questionnaires completed by patients to measure their perception of their functional well-being and health status. No single instrument has established itself as the 'gold standard' for measuring patient status. Each tool measures different dimensions of health, uses different scoring levels, and references different periods [11]. The Forgotten Joint Scores (FJS) comprise measures for assessing joint-specific patient-reported outcomes [12]. These PRO questionnaires focus on patients' awareness of a specific joint in everyday life and pick up subtle differences between patients and follow-up time points.

This study aims to determine if the use of a surgical navigation system enables beginner and intermediate surgeons to achieve long-term clinical PROM outcomes and postoperative implant positioning and limb alignment as good as those performed by an expert surgeon. We conducted a retrospective cohort study design based on a consecutive case series focused on the alternative hypothesis that the long-term FJS achieved by surgeons with less surgical experience is noninferior by a prespecified amount conducted by a skilled surgeon in navigated assisted total knee replacement. More specifically, the null hypothesis of inferiority specifies that the FJS between a less experienced surgeon and a skilled one is worse by at least a prespecified acceptable margin of 18.5 points, corresponding to the average of the smallest and largest minimal clinically important difference (MCID) estimates given by Ingelsrud et al. [13].

The main objective is to assess the long-term FJS result by comparing three surgeon's knowledge outlines: (1) no more than 30 previous knee replacement performances, (2) between 31 and 300, and (3) more than 300 knee replacements. Secondary objectives are to assess the postoperative Hip Knee Ankle Angle, implant position, and survival rate between the groups.

Materials and Methods

We enrolled in this study 100 consecutive patients in which navigated total knee arthroplasty (TKA) was performed in our institution from 2008 to 2010. Seventeen patients had died during the follow-up period until 12/31/2020. The inclusion criteria were patients with primary osteoarthritic knee joints and receiving a posterior stabilized total knee replacement (Columbus, BBraun Aesculap) due to substantial pain and loss of functionality with any degree of deformity. Of the 83 living patients, five were excluded due they had prostheses revision surgery. We could not contact four other patients by phone or mail, resulting in a final sample of 74 patients (Fig. 1).

Fig. 1
figure 1

Patient flow diagram. TKA total knee arthroplasty

All patients provided written informed consent, and the Hospital Committee for Medical and Health Research Ethics approved the study. (Hospital General Universitario Gregorio Marañón. Madrid. Spain. Protocol 1-04. V-02). All procedures performed followed the 1964 Declaration of Helsinki and its later amendments.

It is estimated that prosthetic knee surgeons must perform about 30 procedures a year to maintain their skills [14]. We, therefore, grouped surgeons into those who had not even reached the first thirty prostheses, those who had between 31 and 300, and more experienced surgeons with more than 300 implanted prostheses at the beginning of the study. According to the principal surgeon's surgical experience, the patients were divided into three groups: (1) no more than 30 previous knee replacement performances, (2) between 31 and 300, and (3) more than 300 knee replacements. A total of 14 surgeons were involved in the study.

Demographic data collected on the cohort included gender, laterality, age, and body mass index (BMI).

Baseline characteristics of the study population are reported in Table 1.

Table 1 Study population baseline

Surgical Technique

The surgical procedure was performed following a navigated gap-balancing technique (Columbus PS. Orthopilot version 4.2; Braun Aesculap, Tuttlingen, Germany) in a regular fashion [5, 9]. A distal femoral cut was planned at 90° sagittal and coronal plane to the hip center. The tibial cut was planned at 90° coronal and 2° posterior slope sagittal to the ankle center. Femoral and tibial components were all cemented, and no patella was replaced.

Outcome Measures

Implant Positioning and Limb Alignment

The navigation system assessed the coronal and sagittal HKAA (Hip Knee Ankle Angle) at the surgical procedure’s beginning and end once the cementation process was completed and the tourniquet was deflated. The joint orientation angles in the frontal and sagittal planes were evaluated according to Paley [15].

Femoral and tibial component position, joint line height, and gaps at 0° and 90° were calculated after all bone cuts were done. The navigation system referenced to the preoperative posterior condyle axis measured the final femoral implant rotation.

Knee Balancing

To allow comparison of final knee balance between the three groups of surgeons, the authors classified the relationship between the medial and lateral gap, both in extension and flexion.

According to the most restrictive criterion, a knee is adequately balanced when there is no more than 2 mm difference in any of the four gap measurements (Flexion Medial Gap, Extension Medial Gap, Flexion Lateral Gap, Extension Lateral Gap). Two less restrictive criteria were defined similarly, considering three and four millimeters difference. Any value greater than five was regarded as inadequate knee balance.

Forgotten Joint Score

All eligible patients were asked to complete the FJS questionnaire at the end of the follow-up period. The FJS was assessed by the same author (NVS). The FJS is used to evaluate patients' ability to forget their artificial joints in daily life. It consists of 12 questions, and the score ranges from 0 to 100. The higher the score, the more favorable the outcome. The score was calculated under the original publication [12].

Survival Rate

Prosthetic failure is defined as any clinical circumstance that removes the prosthesis, either due to aseptic loosening or prosthetic infection. All patients were evaluated until the end of the follow-up period:12/31/2020.

Statistical Analysis

Traditional tests for differences were used to compare patients’ baseline characteristics, postoperative implant positioning, and limb alignment. For quantitative variables, one-way ANOVA was used under the assumption of Normality. For non-normally distributed data, the Kruskal–Wallis test was used instead. The premise of Normality was checked using the Shapiro–Wilk test.

For the primary quantitative outcome, one-sided tests were used to test for noninferiority between each group of less experienced surgeons and the group of more skilled surgeons. Due to the non-normality of the FJS values, these one-sided tests were based on a robust version of the two-sample Student t test that uses trimmed summary statistics, allowing for heterogeneity and deviations from Normality (the Yuen–Welch t test with a 5% trimming at both ends of the data). Given that there are two comparisons of interest (Group 1 versus Group 3 and Group 2 versus Group 3), Holm's sequential Bonferroni (HB) correction was used to control the family-wise error rate at the prespecified significance alpha level [16].

The HB method is a less conservative approach than the Bonferroni method that compares the k-ranked p value to the nominal significance level divided by (n − k + 1), wherein in this case, n = 2 (the number of comparisons of interest) and k = 1, 2.

The FJS-12 has a minimally clinically significant difference (MCID) of 14–23 points, as estimated by Ingelsrud et al. [13] Based on the average of these two most extreme MCID estimates, we define the interval of equivalence for the difference of FJS mean scores to be in the range from − 18.5 to 18.5 points; that is, the margin of noninferiority is given by delta = − 18.5. For the interval of equivalence previously defined and the nominal alpha level of 5%, 15 patients are required per group to prove the equivalence between groups of less experienced surgeons and the group of more skilled surgeons, with a statistical power of 80%. This sample size was calculated assuming an expected population standard deviation of 16.72 on the FJS-12 scale.

Finally, Kaplan–Meier curves were obtained to describe the survival of the prostheses, according to the principal surgeon's surgical experience.

The analyses were carried out using two statistical packages: SPPS version 25 and R version 4.0.4. A significance alpha level of 0.05 was set for all statistical tests.

Results

Limb Alignment

No statistically significant differences between the groups were demonstrated in the preoperative coronal and sagittal HKAA, making them comparable. There were no statistically significant differences in comparing postoperative coronal and sagittal HKAA between the groups.

Table 2 describes the preoperative and postoperative coronal and sagittal KHAA between the three groups globally.

Table 2 Comparison of preoperative and postoperative coronal and sagittal KHAA between the three different groups

Figure 2 graphically expresses, through a boxplot, the homogeneity in the coronal and sagittal alignment (green color) between the different surgical experience groups. It can be seen that all the surgeons achieved close to neutral alignment in the coronal plane. At the same time, the HKAA was lightly recurvatum in the sagittal plane, especially between the most experienced surgeons, despite no statistically significant differences.

Fig. 2
figure 2

Coronal and sagittal alignment. The horizontal line in the box represents the median value. The height of the box is the interquartile range, Q1–Q3, i.e., where the central 50% of the most representative values are found. The vertical outbox lines represent the minimum and maximum of the non-outliers; when a value deviates from the top or bottom of the box more than 1.5 or 3 times the interquartile range, it is identified as an outlier or extreme outlier and expressed as a circle or a star

There was no outlier value in the coronal and sagittal postoperative alignment in any group.

Implant Positioning

Only femoral component rotation showed statistically significant differences. The more experienced surgeons group provided more external rotation to the femoral component with a non-clinically relevant mean difference of 1°. Table 3 shows the implant position description according to the different surgical experience groups globally.

Table 3 Implant positioning descriptive evaluation between the three different groups

Long-Term Forgotten Joint Score

Seventy-four patients completed the FJS at the end of the follow-up period. FJS scale scores showed a non-normality distribution of the data. For a more robust statistical analysis of the FJS values, 5% trimmed summary statistics were considered (Table 4).

Table 4 FJS scores between the three different groups

The statistical analysis proved noninferiority (and equivalence) for Groups 1 and 2 for Group 3, representing clinically that the beginner and intermediate surgeons achieved long-term clinical PROM results not inferior (and equivalent) to those of a skilled surgeon. As can be seen in Fig. 3, for both comparisons of interest, the corresponding 90% CIs for the mean difference (derived from the Yuen–Welch t test) are inside the interval of equivalence given by (− 18.5, 18.5), with first ranked p value = 0.0185 < 0.025 and second-ranked p value = 0.0354 < 0.05 (proving equivalence at a 5% significance level, based on the Holm–Bonferroni correction).

Fig. 3
figure 3

The noninferiority of the scores on the FJS scale graphically, considering Group 3 as a reference. Groups have been defined in “Material and Methods”. The alternative hypothesis (H1) states that the scores on the FJS scale cannot be worse than 18.5 points lower than in Group 3 (the group of skilled surgeons). According to the Minimal Clinically Important Difference, the pre-established margin of noninferiority is defined as the mean between the smallest and largest estimate of the MCID following Ingelsrud et al. [13]. Thus, any value located in the blue area (H0) represents clinical inferiority concerning Group 3

Note that the score on the FJS scale is a favorable or beneficial outcome and, consequently, the higher the values, the better.

The most experienced surgeons tend to achieve better scores at FJS with less dispersion between values, representing a more consistent outcome. (Fig. 4).

Fig. 4
figure 4

FJS. The horizontal line in the box represents the median value. The height of the box is the interquartile range, Q1–Q3, i.e., where the central 50% of the most representative matters are found. The vertical outbox lines represent the minimum and maximum of the non-outliers; when a value deviates from the top or bottom of the box more than 1.5 or 3 times the interquartile range, it is identified as an outlier extreme outlier and expressed as a circle or a star. Groups have been defined in “Material and Methods”. For example, group B has the greatest dispersion between the values in the FJS, but it is also the one with the most assigned patients. Note that the least experienced surgeons achieve the worst median FJS score

Survival Rate

Seventeen patients died before the long-term assessment. It was impossible to contact another four patients. None of these twenty-one patients required knee revision, according to clinical records.

The overall prostheses survival rate with a follow-up greater than ten years is 93.7%. Five patients required revision surgery related to aseptic loosening during the follow-up period. There were no diagnosed prosthetic infections.

The mean follow-up was 11.10 ± 0.78, 10.86 ± 0.66, and 11.30 ± 0.74 years, respectively.

Figure 5 shows the Kaplan–Meier survival curves. The need for revision surgery occurs mainly in the first three years after surgery.

Fig. 5
figure 5

Kaplan–Meier graph showing prostheses survival. Groups have been defined in “Material and Methods”. Color code (1 = blue, 2 = green, 3 = red). Five patients required knee revision, two of them in the first two years and corresponding to groups 1 and 2

Discussion

The most important finding of the present study is that long-term outcomes are equivalent between surgeons with different clinical experiences. The common denominator in all surgeries was using a surgical navigator to assist the surgeon during the procedure.

In 2008 our hospital was opened with a mix of surgeons with different previous experiences. All procedures are performed in our department, assisted by a navigation system. This unusual situation has allowed us to determine whether a surgical navigator can equalize results among surgeons. We measured the outcomes of interest in two different moments. We assessed the implant placement, alignment, and prosthetic stability at the end of the surgical procedure. In contrast, the long-term outcome was measured through a PROM assessment after more than ten years of follow-up.

There are many ways to measure clinical outcomes [17, 18], and any evaluation of the effectiveness of TKA depends on the definition of "successful". Previous studies have shown that 10–25% of patients are dissatisfied with the outcome of knee replacement at one to three years [19, 20]. In this situation, it is necessary to focus on quantifying the success of these procedures using patient-reported outcome measured (PROMS). The most popular PROMS assessing TKA outcomes are the WOMAC, Knee Injury and Osteoarthritis Outcome (KOOS) Score, and the Oxford Knee Score (OKS). Despite their many similarities, when using PROMs to assess TKA outcomes, the statistical significance (a p value) must be reported, and the clinical importance using the minimum clinically important difference (MCID) reported in the literature. The most crucial advantage of noninferiority and equivalence trials is that both designs allow comparison with currently existing, clinically accepted treatment, even if there is a ceiling effect [18].

The Forgotten joint score represents a valid and sensible PROM score with a low ceiling effect [12]. The low ceiling effect limits other scores such as WOMAC and OKS when detecting small clinical changes in patients who report good results.

Ingelsrud et al. reported an MCID of the FJS-12 in TKA between 14 and 23 points [13]. Defining the margin of equivalence as the average of these two values ((14 + 23)/2 = 18.5), we have proven equivalence of the beginner and intermediate surgeons concerning the expert group, with mean scores on the FJS scale of 80.86 ± 21.88, 81.36 ± 23.87, and 90.48 ± 14.65, respectively. In this sense, we believe our study demonstrates a similar outcome between beginners and experienced surgeons with a follow-up that exceeds ten years. The same observer (NVS) carried out all the patient interviews. The global results are slightly higher than those reported in the literature. This difference may be due to some positive bias. If this scale use bias were present, it would be uniform across the three groups. Our work does not want to compare the results obtained in our series with other published ones but rather to compare the surgeons in our study.

There were no statistically significant differences between the groups regarding the final alignment and the position of the femoral and tibial components. The overall result of a knee replacement depends on many factors, almost all related to the implant placement. The navigation system acts as a support tool, displaying real-time implant position and limb alignment, allowing the beginner surgeon to access consistent and relevant information through the procedure [5, 9, 10, 21, 22].

Although we have not found statistically significant differences between postoperative alignment, it should be noted that the sagittal HKAA is closer to neutral in the less experienced surgeon group.

There was no statistically significant correlation between implant alignment and position and the patient's subjective satisfaction measured on the FJS scale. This situation may indicate that both measures (objective clinical and subjective satisfaction) are complementary when evaluating the postoperative success of the prosthesis [18].

Good knee balancing is traditionally related to an excellent clinical outcome [23]. However, there is no direct correlation between a balanced prosthesis and an excellent clinical result [21, 24]. For objectivity, we used the principles of symmetry and congruence between the flexion and extension gaps to establish the comparison between the groups. It is remarkable that regardless of the measure, two, three, or four millimeters apart, an inexperienced surgeon, assisted by a navigation system, can achieve the same balance parameters as an experienced one, as shown in Fig. 3.

The overall prostheses survival rate with a follow-up greater than ten years is 94%. This value is slightly higher than that published in the registries [25] and may be based on the fact that, as mentioned, a surgical navigation system was used in all cases [26, 27]. As expected [20], most revision surgeries were indicated in the first three years.

The findings of this study will enable healthcare professionals to understand better the impact of implementing navigated assisted TKA on the surgical workflow, especially among less experienced surgeons or in those centers with a low volume of annual surgeries. Prosthetic knee surgery has changed in the last decade from a technique performed exclusively by experienced arthroplasty surgeons to almost a fundamental demand for any orthopedic surgeon. We must not forget that nearly all revision surgeries occur in the first two years after implantation and directly relate to the surgeon [20]. We have proven that relying on an external navigation system for intraoperative decision-making allows the surgeon to perform it as safely and efficiently as an experienced surgeon. Hospital managers should consider these findings, which would finance these systems, especially if the volume of prostheses per year is not very high.

There are several limitations to this study. First, there was no randomization between the allocation of cases and surgeons. The surgeries were performed consecutively and distributed according to the daily workload. We have conducted extensive statistical analyses to compare demographic values and prior alignment between patients. There are no statistically significant differences between them, affirming that the groups are comparable. Second, we have not considered the learning curve effect for navigation for each surgeon [14]. At least one experienced navigational surgeon in all the surgeries as part of the team. We sincerely believe that the presence of this surgeon in the team does not invalidate the results obtained since the role he played was secondary, leaving the leading surgeon to make decisions. We highlight this limitation through an express mention in the title and conclusion to provide the reader with a truthful judgment on the applicability of the study results. Recently it has been reported that Robotic-arm-assisted total knee arthroplasty has a learning curve of seven cases for integration into the surgical workflow but no learning curve effect for accuracy of implant positioning [28, 29]. It is more than possible that the same happens in navigation-assisted surgery since its data collection and surgery workflow have many similarities.

Conclusion

Navigated assisted TKA, under expert guidance, can be as effective when performed by beginner or intermediate surgeons as performed by senior surgeons regarding the accuracy of implant positioning, limb alignment, and long-term clinical outcome.