Introduction

Anterior cruciate ligament (ACL) tears have a major impact on the individual as well as the society. Injuries of the ACL are common and occur often in the younger population. A meta-analysis including 4108 patients shows that 10 years after ACL reconstruction around 20% and 20 years after ACL reconstruction around 50% of the patients suffer from osteoarthritis (OA) [11]. This highlights the importance of the ACL plastic and thus also the correct graft selection.

For reconstruction of the ACL different types of auto- and allograft are being used. While 8–12% of transplants were allografts between 2006 and 2012, they are currently only used for specific indications. In the past decades, a change in the type of autograft transplants used has been observed. While in 1992, bone-patellar tendon-bone (BPTB) represented the largest proportion of grafts with 90%, hamstring tendons gained importance over time due to less side effects like anterior knee pain and represented the most used graft in 2010. During this time, the quadriceps tendon (QT) gained importance so that it represented 10% of the tendons used in 2020. In comparison, the hamstring tendon (HT) was used as a graft in half of the ACL reconstructions and the BPTB only in 1/3 of the cases [3].

There are numerous studies comparing BTPB and hamstring tendon (HT) in short- to mid-term outcomes [1, 4, 7, 13, 16, 23, 28, 44]. However, there are only a few studies investigating the long-term outcome and there is no consensus on which one should be the graft of choice [35, 44].

Keays et al. showed that among other factors like meniscectomy, chondral damage and low quadriceps-to-hamstring strength ratios, the choice of graft for ACL reconstruction is a significant predictor for the development of OA [21].

In a 2011 Cochran review by Mohtadi et al., no superiority of BPTB or HT over each other could be established at 2-year follow-up [30]. Patients who underwent ACL surgery with autologous BPTB showed a more stable knee in the clinical examination after 2 years compared to autologous HT, but also suffered more from anterior knee pain [30].

In studies with a follow-up of 6, 7, and 10 years has been shown that ACL reconstruction with BPTB leads to significantly higher incidence of OA [20, 34, 37].

On the other hand, large registry study from Scandinavia and Denmark showed that patients with BPTB reconstruction have a statistically significant lower risk of revision [15, 36].

To help resolve the controversy between ACL reconstructions with BPTB and HT in the long-term outcomes, this systematic review analyses studies comparing these two autografts with a follow-up of a minimum of 10 years. The focus of this systematic review lies on the three variables: patient-oriented outcome, radiographic outcome and clinical testing and measurement.

Materials and methods

The PRISMA (Preferred Reporting Items for Systematic reviews and Meta-Analyses) statement and the PRISMA 2020 Explanation and Elaboration Document were used as guidance for our systematic review and literature [32, 33]. The study was also registered at prospero under CRD42022310607.

Search strategy

The medical databases, Embase and PubMed, were searched from inception through 31st of March 2022. The following search term was used: ((anterior AND cruciate) OR acl) AND (reconstruction OR surgery OR repair) AND (patella* OR bptb) AND (hamstring* OR semitendinosus OR gracilis). Additionally, we screened reference lists of the literature reviews on the same or similar topic for potential studies we missed in our systematic search of the databases but could not identify any new ones [2, 6, 10, 17, 19, 27, 29, 35, 44].

Study selection

The literature selection obtained by the search term was first presorted using the abstract. If the suitability was unclear from the title and abstract, the full text of the article was obtained and checked for suitability. Studies were included that met the following criteria: comparison of ACL reconstruction using both HT and BPTB graft types in human patients, reporting on at least one of the three outcome variables (patient-oriented outcome, clinical tests and measurements of laxity as well as function, radiological outcome), and a follow-up period of at least 10 years. It was decided not to include studies with quadriceps tendon as graft because of the very limited number of long-term studies.

Due to the following exclusion criteria, we had to exclude studies from our review: minimum duration of follow-up of less than 10 years, full text not available or in a language other than English, German, Spanish or French, literature reviews or letters, studies with exclusively paediatric patients, studies on cadavers, animals or biomechanical in vitro studies. In addition, if there were multiple reports based on the same collective, the one with the longest follow-up was included and the others were excluded (Fig. 1).

Fig. 1
figure 1

PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) flow diagram showing the selection process of studies included in this systematic review

Data extraction

The following data were extracted from the full text of all included studies: year of publication, study type, single centre or multicentre, duration of follow-up, number of patients in the BPTB and the HT group, surgical technique, patient-oriented outcomes (Lysholm score, IKDC subjective knee form, KOOS, Tegner Activity Score, other measurements of activity level, kneeling pain, anterior knee pain, knee walking test), radiographic outcomes (Kellgren and Lawrence Score, IKDC), clinical tests (Lachman test, pivot shift test, KT-1000 arthrometer, single-legged hop test, extension and flexion deficit) and graft rupture rates and rates of contralateral ACL rupture.

Two authors will review the title and abstracts of each article identified in the literature search. When eligibility is unclear from the title and abstract, the article’s full text was obtained and evaluated for eligibility. The whole process of study selection and data extraction has been done by two independent authors. Any disagreements will be resolved by consensus discussion between the two independent reviewers. A third author will be consulted if the debate cannot be resolved.

Study quality assessment

To assess each study included in this review for its quality of methodology a modified version of the Coleman Methodology Score has been used. The Coleman Methodology Score has been developed to assess the methodology of studies reporting surgical outcomes. It consists of a Part A with seven items (study size, mean duration of follow-up, number of different surgical procedures used, type of study, diagnostic certainty, description of surgical procedure and postoperative rehabilitation) and a Part B with three items (outcome criteria, procedure for assessing outcomes and description of subject selection process) and the result is reported as a score from 0 to 100 [12]. The modification of the original Coleman Methodology Score has been made by adapting the values for the first item, study size, as shown in Table 6 in the “Results” section.

Results

The search of PubMed and Embase with the above-mentioned search terms led us to 959 records from PubMed and 1167 records from Embase. After an additional screening of the reference lists and exclusion of duplicates, a total of 1299 records could be achieved. 1261 studies were excluded because they met the above exclusion criteria. Of the remaining 34 reports, we screened the full text and identified nine studies with a total of 1833 analysed patients, that met the eligibility criteria determined for this systematic review (Fig. 1).

Study characteristics

The main characteristics of the nine studies included in this systematic review are presented in Table 1 in alphabetical order [18].

Table 1 Characteristics of the included studies

Operative techniques

Techniques used in the studies included in this systematic review are presented in Table 2 [9, 14, 25].

Table 2 Operative techniques used in included studies

Quality assessment of included studies

The included studies have been assessed for their methodologic quality using the modified Coleman Methodology Score as shown in Table 3.

Table 3 Study type characteristics and modified Coleman methodology score of nine studies addressing long-term outcomes of ACL reconstruction with patellar tendon versus hamstring tendon

Results of individual studies

Overview of significant differences between BPTB and HT groups can be found in Table 4.

Table 4 Overview of significant differences found between PT and HT in the included studies

Results for each outcome variable

Patient-reported outcome

A large variety of different scores have been used across the nine studies included in our systematic review. Four out of nine studies reported Lysholm Scores at the long-term follow-up, but none could find significant differences. The Tegner Activity Score has been reported by five studies; however, none of them could show a significant difference between the BPTB and the HT group, either (Table 5).

Table 5 Results of studies reporting Lysholm Score and Tegner Activity Score

Focussing on activity, we found five studies reporting results of activity levels other than Tegner Activity Score. Two of them showed significantly more patients with higher activity levels in the BPTB group (75.6% for BPTB versus 67.4% for HT with moderate to intense level of activity (IKDC C or D), P = 0.02 before inverse probability weighting treatment (IPWT) [24]; 73% for BPTB versus 48% for HT with weekly participation in sports, P = 0.05 [41]).

Four out of nine studies reported results of the IKDC-SKF at final follow-up. One study reported significant results favouring HT (mean ± SD: 90.7 ± 11 for BPTB versus 92.6 ± 11.3 for HT, P = 0.046) [24]. Two of the three studies reporting insignificant results showed slightly higher mean scores for HT [8, 40]. And only two studies reported KOOS and one of them reached the level of significance and showed higher scores for HT (mean ± SD: 81.9 ± 12.6 for BPTB versus 84.7 ± 14.4 for HT, P ≤ 0.0001) [24] (Table 6).

Table 6 Results of studies reporting IKDC-SKF and KOOS

Concentrating on donor site morbidity, we found five studies reporting either kneeling or anterior knee pain or both. A trend is noticeable towards more donor site morbidity in BPTB patients. However, only one study showed significantly more kneeling pain in BPTB (62% for HT vs 80% for BPTB with no or mild kneeling pain, P = 0.018) [40]. One study reported results of a knee walking test with significantly more BPTB patients having difficulties than HT patients (49% for BPTB vs 62% for HT patients reporting it to be OK, P = 0.049) [40] (Table 7).

Table 7 Results of studies reporting donor site morbidity (anterior knee pain, kneeling pain, knee-walking test)

Clinical tests and measurements

Four studies reported results of the Lachman test. None of the results reached levels of significance, and however, three of the four studies show insignificant results with more normal (Grad 0 or negative) tests in BPTB patients. Results of pivot shift tests were reported by three studies. One of them showed significantly more patients in the HT group with normal tests when excluding reinjured patients and patients suffering a contralateral ACL rupture (51% normal in BPTB versus 71% normal in HT patients, P = 0.048) [8] (Table 8).

Table 8 Results of studies reporting clinical stability tests (Lachman test and Pivot-shift test)

More objective results on knee stability were reported as instrumented laxity testing with the KT-1000 arthrometer. Five studies reported results and one of them showed significantly less patients in BPTB group with increased laxity (92% < 3 mm and 8% with 3–5 mm mean side-to-side difference for BPTB vs 67% < 3 mm and 33% with 3–5 mm in HT group, P = 0.03) [38]. Of the four studies showing insignificant results, three showed trends towards less side-to-side differences in BPTB patients (Table 9).

Table 9 Results of studies reporting instrumented laxity testing with KT-1000 Arthrometer

Advancing to the functional test, the single-legged hop test, we found four studies reporting results from this test, but none of the results were significant. And finally, we found four studies presenting results on deficits in range of motion. All of them reported results of extension deficit testing and two of them of flexion deficit testing. However, no study could show significant differences.

Radiographic evidence of osteoarthritis

To evaluate osteoarthritis of the knee, most studies used either the Kellgren and Lawrence or the IKDC classification. Therefore, only studies with results presenting in form of one of these classifications are considered in this review. Additionally, we screened for results of tunnel widening, but none of the included studies presented any.

Four studies reported results of radiographic analysis of osteoarthritis following the Kellgren and Lawrence classification. None of them could show significant differences when defining definite osteoarthritis as a score ≥ 2. The cut-off for definite osteoarthritis in the IKDC classification system was defined as C or D. Three studies reported results of radiographic evaluation of degenerative joint disease following the IKDC system, and two studies managed to show significantly more patients with grade C or D degenerative joint disease in BPTB (20% in BPTB vs 13% in HT patients, P = 0.008 [40]; 33% in BPTB vs 21% in HT patients, P = 0.003 for patellofemoral OA and P = 0.037 for medial OA [38]) (Table 10).

Table 10 Results of studies reporting rates of osteoarthritis using the IKDC or the Kellgren and Lawrence system

Graft rupture or contralateral ACL rupture (CACLR)

Five studies reported rates of graft rupture at final follow-up. None of them could show a significant difference between the two groups, but all five studies showed trends towards less graft ruptures in BPTB patients. Contralateral ACL rupture rates have been reported by five studies. One study showed significantly lower survival of contralateral ACL in BPTB group compared to HT group (70% in BPTB patients versus 84% in HT patients, hazard ratio of 2.2 (95% CI 1.2–4.3), P = 0.022 [40]).Three of the four studies with insignificant differences showed a trend towards more CACLR in BPTB (Table 11).

Table 11 Results of studies reporting graft rupture and contralateral ACL rupture

Discussion

In the analysis of the studies with a long-term follow-up of more than 10 years, neither of the two autologous tendons were significantly superior to the other in terms of outcome parameters. However, a detailed examination of the parameters studied reveals certain trends.

With the studies by Björnsson et al. and Thompson et al., we found two studies showing hamstring tendon to be significantly superior over BPTB in terms of donor site morbidity [8, 40]. These results are also consistent with the results of previous analyses with shorter follow-up time in which it was described in some as the only difference[10, 23, 26, 31, 36, 43]. Therefore, when selecting a graft, patient’s individual risk factors such as an occupation involving kneeling should be considered.

With regard to the clinical measurements, only Sajovic et al. found a significantly increased anterior translation in the HT group in the KT 1000 measurement [38]. However, in the same study, the Lachman examination showed no significant difference and the clinical outcome parameters in this cohort were also equally good. None of the other studies included in our review showed a significant difference in anterior translation regarding the two graft types. Thompson et al. found more patients with laxity > 3 mm in patients with a BPTB graft, but also without a significant difference [40]. The findings from Sajovic et al. are consistent with the systematic reviews of Xie et al. [42] and Mohtadi et al. [30] who found significantly higher stability using the BPTB graft with respect to the Lachman and pivot shift phenomenon as well as the measurement of anterior translation in the KT-1000 arthrometer. In our evaluation, only Björnen et al. showed a significantly more frequent positive pivot shift test in the BPTB group. However, it has to be considered that in their study for all BPTB grafts, the femoral tunnel was drilled in a trans-tibial technique, while for HT grafts, the femoral tunnel was drilled either in a trans-tibial technique or through the medial portal. However, this does not seem to be as clear in the long-term follow-up.

In our review, no difference in range of motion (ROM) was observed between the two tendon grafts during the follow-up period in all studies. While in the Cochran review by Mohtadi et al., an increased extension deficit was described in the BPTB [30]. However, a difference of 3° was found in these data, which is probably of little clinical relevance. In addition, Mohtadi et al. included studies with a follow-up period of 2–8.5 years, the probability is high that minor ROM impairments were reduced by further training over a longer period [30]. Additionally, none of the grafts seem to have an influence on the muscular strength of the leg in the long-term follow-up, as already in the short-term follow-up, since the single-leg hopping test showed no differences [8, 10, 22, 30, 38, 40].

In all included studies, the activity level was determined. Unfortunately, the scores used show an inhomogeneous spectrum and therefore cannot be directly compared.

Lecoq et al. showed significantly higher scores in KOOS and IKDC-SFK in the HT group [24]. However, a detailed examination reveals a difference of less than 10 points between BPTB and HT, the clinical relevance of this seems questionable, especially since in both groups the results were above the thresholds for patient-acceptable symptom state identified by Muller et al. [31]. Regarding the postoperative activity level, two studies showed significantly higher values for BPTB and two studies showed no significant difference in the Tegner activity score, but at least a trend towards better results for BPTB. Although BPTB cannot be generally favoured because of the inhomogeneously used tests. However, these results should be kept in mind for the individual patient graft selection. In particular, since Xie et al. also showed a higher return to sport level with BPTB than HT transplant in his 2-year follow-up review [42].

Among our nine studies, seven reported outcomes on radiographic osteoarthritis using either the Kellgren and Lawrence or the IKDC system. Two of these studies managed to show significant results with more patients in the BPTB group developing osteoarthritis up to the final follow-up [38, 40]. Thompson et al. show a higher risk for OA with BPTB graft, and at first, it is surprising because this group has less meniscus resection in the further course [40]. Since meniscal resection is an additional risk factor for OA [5]. However, no data are given on the initial associated injuries besides ACL rupture. Also, in the second study of Sajovic et al., the increased risk of OA in BPTB transplantation has to be considered in a differentiated manner, because in 21 of the 24 patients of the BPTB cohort at least a partial resection of the meniscus occurred [38].

Also, in the previous reviews, the situation is not completely clear. Poehling-Monaghan et al. 2017 showed higher rates of OA in an average 8.9-year follow-up for BPTB [35]. In 2018, Belk et al. did not confirm these results and found no significant difference in a review with an average of 11.5 years of follow-up [6].

Two studies also compared the osteoarthritis rate of the injured knee with the contralateral ones. Neither ACL reconstruction with BPTB nor with HT was able to reduce the risk of osteoarthritis to that of the non-injured opposite side. Of the studies reporting radiographic outcomes, none of them presented results on tunnel widening.

Graft rupture and CACLR

Rahardia et al. and Grifsted et al. showed in their register work significantly higher rates of graft revision in patients with HT autograft compared to BPTB and significantly higher rates of contralateral ACL reconstruction in the BPTB group. This increased rupture of the contralateral side could also be an indicator for the higher physical activity of this patient population. Similarly, results from the Scandinavian registry based on 45,998 patients with primary ACL reconstructions were presented by Gifstad et al. [15]. Maletis et al. managed to show with a study size of 17,436 patients a significant difference between BPTB and HT autografts, as in HT needing more ACL reconstruction revision surgeries and BPTB leading to more CACLR [26].

These results are similar to our findings; however, most of our studies could only show trends—none of the studies included could show significant differences on graft rupture rates and only one study [40] could show a significant difference for contralateral ACL rupture with higher rates in patients with BPTB autograft.

The influence of different rates of concomitant meniscus injuries seems to be unlikely, since Gifstad et al. [15] and Salmon et al. [39] could not identify this as a contributing factor.

This could be explained by the relatively small patient collectives of our included studies, since Salmon et al. estimated that in order to detect a significant difference in graft failure rates of 1–2%, a cohort size of 19,000 patients would be necessary [39]. In accordance with that, other systematic reviews by Belk et al. [6], Chen et al. [10] and Poehling-Monaghan et al. [35] could not show any significant differences in graft failure rates either.

Limitations

There are limitations to studies and reviews on procedures with such a long-term follow-up. Since surgical techniques continued to develop and enhance in the time passed since the intervention in the studies, the results can only show the outcome of surgeries performed 10 to over 20 years ago and can therefore only be applied to surgeries performed nowadays with caution. Another limitation of our review is the many different ways especially patient-oriented outcomes have been presented in studies with the consequence of often only few studies reporting results on each outcome item.

We are aware that there are also some limitations to the methodology of our systematic review. First of all, we did not limit our review to randomized controlled trials, and also included studies not reporting results on all three outcome variables. This was a conscious decision we took because there is already only a very limited number of studies to be found that compare the two grafts for ACL reconstruction with such a long follow-up. The third limitation to our review is the variable methodology of the studies included in this review. Different inclusion and exclusion criteria for patients have been used, for example, concomitant injuries to the menisci that could influence the long-term outcome. The last limitation we see to our review is that we only did a qualitative and no quantitative analysis, which does not allow us to reach significant results by pooling the results of individual studies, when they show no significant trends in one direction.

Conclusion

We regard patient-oriented outcomes as more relevant than stability tests or radiographic evidence of osteoarthritis, but also as we do prioritize these outcomes, we cannot draw a final conclusion on which autograft is superior. Results on activity levels favour BPTB autograft, while donor site morbidity and IKDC-SKF and KOOS favour HT autograft. Radiographic evidence of osteoarthritis is present more frequently in BPTB group, and finally, there is a trend towards more graft ruptures in HT and more contralateral ACL ruptures in BPTB group. The significance of our results should be evaluated particularly in the light of the long-term studies included and thus the superiority over short- and medium-term studies. We see the need for more studies on this matter with long-term outcomes and preferably also considering quadriceps tendon autograft as a potential graft choice.