Introduction

Despite a high incidence of lateral ankle ligament injuries [45], only a small proportion of the patients seeks medical care [2, 38, 42, 43, 46]. In instances where a patient has not responded favourably to conservative treatment (e.g. prolonged course of physiotherapy and/or bracing), surgical stabilization may be an appropriate option to restore function, depending on the patient’s needs and expectations [16, 33].

Tenodesis is the oldest surgical technique. It includes a non-anatomic reconstruction [16]. Currently, anatomic reconstruction or repair techniques are preferred in order to restore joint configuration and mechanics [1, 2, 14, 15, 33, 44]. The last technique that has been used is capsular shrinkage that uses local heat application to induce shrinkage of the anterior talofibular ligament (ATFL) and capsule [11, 29].

Many studies have shown the success of these techniques in treating CAI. Mabit et al. [31]. were the first to compare anatomic repair with non-anatomic reconstruction, showing superior short-term results (pain, symptoms, function) for anatomic repair. Other studies confirmed these results [18, 26,27,28]. Up till now only de Vries et al. [12]. published a Cochrane review on outcome and complications after different surgical stabilization techniques in patients with CAI. Despite comparisons of effectiveness between techniques, they concluded that there was insufficient evidence to support any surgical intervention over another. The previously published review has not focussed on the patient-reported outcomes after surgical ankle stabilization. Additionally, since then more research has become available.

If it is known which technique provides the best post-operative technical and functional outcome, then patient benefit and surgical results can be simultaneously optimized. For this reason, the objective of this systematic review is to determine the most effective surgical treatment in patients with CAI by providing a review of published studies and comparing functional outcomes after surgical stabilization.

Materials and methods

Search strategy

The research question of this review was: ‘what is the best surgical treatment strategy for patients with CAI based on patient-reported functional outcome?’ To answer this question a search was conducted in Pubmed, EMBASE, Medline and the Cochrane Library from 1950 up to April 2016, including the terms ‘surgical treatment’, ‘lateral’, ‘ankle’, ‘instability’ or ‘outcome’ and their synonyms (Appendix).

Selection criteria

Articles were selected according to the following inclusion criteria: (1) patients were at least 18 years old at the time of surgery, (2) patients suffered from isolated lateral ankle instability for at least 6 months and were characterized by the subjective reporting of symptoms such as pain, swelling, instability and/or giving way, (3) patients were treated by some form of surgical stabilization, (4) described any of the following functional outcome measures at follow-up like pain, swelling, function, sport or quality of life.

Studies were excluded if they: (1) consisted of (systematic) reviews or case reports, (2) were not published in English, (3) only covered treatment of acute instability, (4) included medial instability, (5) only included conservative treatment, or (6) included patients with concomitant injuries, deformities or previous surgical treatment for ankle instability.

Study selection

First all articles were screened by title and abstract for eligibility by two independent researchers. Next, the full-texts of the included articles were checked to determine whether they met the inclusion criteria. All articles of which full-texts were unavailable were excluded. Subsequently, all full-texts were read by two independent researchers and included or excluded based on the selection criteria. In case of disagreement, consensus on inclusion was reached during a meeting.

The final selection of included articles was scored according to the modified Coleman Scale for Methodology [35]. Each article was scored on study type, patient selection, diagnostics, treatment and assessment. The Coleman Score ranges from 0 to 90 referring to the methodologic quality, with a higher score representing better methodologic quality. Points were scored for number of included patients (0–10 points); mean follow-up (0–5); number of different procedures studied (0–10); type of study (0–15); diagnostic certainty (0–5); description of given treatment (0–5); outcome criteria (0–10); procedure used for assessing outcomes (0–15); description of subject selection process (0–15). The Modified Coleman Score (MCS) does not specifically include the rehabilitation process. In current studies, mostly the aftercare in terms of cast/bandage, etc. has been described, but details of the rehabilitation protocol have often not been reported. As our focus was on treatment and functional outcome and to avoid scoring bias due to underreporting of the rehabilitation protocol we therefore chose to use the MCS.

Data extraction and statistical analysis

Two researchers reviewed all the included articles independently and extracted article characteristics, patient demographics, patient history, surgical treatment and questionnaires/scales used (including pre- and post-operative outcome).

To analyse baseline characteristics, the name of the main author, year of publication, study design, number of included patients and intervention were extracted.

To determine the best surgical procedure for treatment of CAI, outcome scores and outcomes (e.g. mean/median, SD/range) were extracted per procedure and article. In case reported outcomes were only shown as graphs, the mean/median and SD/range were estimated from the graphs. If studies included merely post-operative questionnaire scores, these questionnaires were only included in the qualitative analysis. Studies reporting both pre- and post-operative scores were pooled based on their mean scores and their mean score improvement. Using these means a weighted mean was calculated. Improvement per technique and superiority of a technique was evaluated using the independent t test. Questionnaires had to be used in at least two studies that assessed the same technique to make them eligible for pooling. If not, these articles were only used in the qualitative analysis. For pooling, Review Manager was used (RevMan [Computer program] version 5.3, Copenhagen: The Nordic Cochrane Centre, The Cochrane Collaboration, 2014) For statistical analysis SPSS was used (version 23.0, IBM Corp. Armonk, NY, USA). A p value of < 0.05 was considered statistically significant.

To assess heterogeneity between study population (number of patients, age, gender distribution and follow-up period), I2 was calculated [19]. In studies assessing the same technique using the same outcome scores, statistical pooling was performed. Pooling was only performed with patient-reported outcome measures (PROM) scores per technique.

Results

Study and patient characteristics

The initial search provided 658 articles. After exclusion of irrelevant articles by screening the abstracts and subsequently reading the full-texts that remained, a total of 19 articles were included of which 11 were eligible for pooling of outcome data in the quantitative analysis (Fig. 1). Publication dates of included articles ranged from 2000 to 2015. The majority of the studies, 10 of 19, concerned retrospective cohort studies. Of five out of the 19 studies it was unknown whether the study design was prospective or retrospective. Articles that used any form of patient-reported outcome measure that was used by less than 3 of the included studies (per surgical procedure) were not included in the pooled results (Table 1).

Fig. 1
figure 1

Flow chart included studies

Table 1 Patient characteristics

A total of 882 patients were included in the studies described in the 19 articles with a mean of 44.4 patients per study (SD ± 59.3). Of the 882 included patients, 61% was male and 39% female. The mean age of included patients was 29.3 (SD ± 4.2), and a mean follow-up period of 76.0 months (SD ± 64.6) which varied greatly between articles (range 6–156 months). A total of 23 procedures were evaluated, including anatomic repair (n = 7), anatomic reconstruction (n = 6), tenodesis (n = 6) and capsular shrinkage (n = 4). Within the 23 different procedures, 5 different variations of tenodesis reconstruction were described, 4 of anatomic reconstruction, 4 of anatomic repair and 2 variations of performance of capsular shrinkage. In total, 6 studies performed additional procedures such as synovectomy, osteochondral debridement and microfracture, ossicle excision, loose body removal and bony spur resection. Only 5 articles (26%) mentioned the mean duration of symptoms, reporting a mean duration of symptoms of 31.6 months (SD ± 26.2) with a minimum duration of symptoms of 7 months and a maximum of 168 months.

Critical appraisal and heterogeneity

The included articles were scored using the Modified Coleman Methodology Scale with a maximum score of 90 points. The mean score was 49.6 points (SD ± 12.0) with scores ranging from 30 to 73 and no outliers, indicating that the included studies greatly vary in methodological quality (Fig. 2). The included articles mainly score low on the Modified Coleman Scale because of the low number of included patients, short follow-up periods, retrospective study designs and an insufficient or lack of description of the patient selection process.

Fig. 2
figure 2

Quality assessment according to the Modified Coleman Methodology Score showing average quality of included articles with a large range in scores (30–73 on a scale of 0–90)

The I2 on population heterogeneity (number of patients, mean age, male to female ratio, mean follow-up duration) was 19.9%, presenting no relevant heterogeneity in composition of the population. However, the inclusion of the outcome scores used for the analyses lead to 100% heterogeneity, reflecting the great number of different PROMs used. For the pooled data analyses, the heterogeneity varied from 93 to 95% (Fig. 3a, b).

Fig. 3
figure 3

Forest plot pooled AOFAS (a) and Karlsson Scores (b). AR anatomic reconstruction, Arep anatomic repair, CS capsular shrinkage, TD tenodesis

Patient-reported outcome measures

To assess surgical outcome a wide range of outcome scores were used, such as radiographic outcome, muscle function, ankle range of motion, but also joint laxity. In total, 11 different questionnaires were used to assess 23 procedures. In the 19 included studies, a total of 44 questionnaire-based outcome scores were available for analysis. The most commonly used questionnaires were the Karlsson Score (n = 13; 30.9%) and the AOFAS (n = 11; 26.2%). Only 25 (56.8%) out of 44 measurements were performed both pre- and post-operatively. Only 15 out of the 19 included studies reported whether the reported PROM score included a significant change.

The four studies that could not be pooled due to missing pre-operative scores reported overall good post-operative scores [5, 15, 26, 30]. The weighted mean of the post-operative Karlsson Score of these articles for anatomic repair was 83.7 (SD ± 10.4), for anatomic reconstruction 88.5 (SD ± 6.2) and for tenodesis 75.6 (SD ± 8.6). Other outcome scores were not reported frequently enough to calculate a weighted mean (Table 2).

Table 2 Outcome scores

After pooling data per stabilization technique, all outcome scores showed post-operative improvement. Only four outcome scores were used often enough to assess whether there was significant improvement comparing the pre- and post-operative PROM scores (Table 3), i.e. the AOFAS, Karlsson, Kaikkonen and Tegner Score.

Table 3 Score improvement per technique

Except for the mean post-operative AOFAS score of anatomic reconstruction compared to tenodesis (n.s.), all three techniques showed significant score changes comparing both the pre-operative post-operative outcome scores (p = 0.000–0.001). The highest post-operative scores were shown for anatomic repair as assessed by the AOFAS (93.8; SD ± 2.7) and Karlsson Score (95.1; SD ± 3.6). All outcome scores also showed significant improvement comparing pre- and post-operative scores (p < 0.001). Comparing pre- and post-operative questionnaire scores, all four studied techniques showed score improvement post-operative compared to the pre-operative situation (Fig. 3a, b). However, when comparing mean score improvement for anatomic repair, anatomic reconstruction and tenodesis, the greatest improvement was reported for anatomic reconstruction, followed by anatomic repair (p < 0.001–0.002) (Table 3; Fig. 3a, b).

Discussion

The most important finding of the present study was better functional outcome after anatomic reconstruction and anatomic repair compared to tenodesis for operative treatment of chronic lateral ankle instability. Such a comparison could not be conducted earlier because of a lack of data. Due to the high number of different outcome scores used among studies, only anatomic reconstruction, anatomic repair and tenodesis reconstruction techniques could be quantitatively compared. Comparing patient-reported outcomes after surgical stabilization of the lateral ankle ligaments, all techniques showed relief of symptoms after surgical stabilization and improvement in PROM scores compared to pre-operative reports. Anatomic repair showed the highest post-operative scores. Despite overall improvement, tenodesis reconstruction showed the lowest scores.

Anatomic repair did not only provide higher post-operative outcome scores compared to anatomic reconstruction or tenodesis, it also showed higher pre-operative scores. This may be caused by selection bias. Even though anatomic repair is currently referred to as the ‘golden standard’, it can only be used when the tissue quality of the elongated ligament is sufficient for repair [6, 7, 10, 40]. In case of insufficient quality of the elongated ligaments, anatomic reconstruction is indicated. These cases might indicate a more severe instability on PROMs such as the Cumberland Ankle Instability Score. Higher initial scores of anatomic repair may reflect less severe instability. Techniques have changed over time, and so have surgical approaches and indication for treatment. Currently, tenodesis is mainly used as a salvage technique when other treatment choices are no longer viable options, compared to a few years ago when it was the primary treatment choice [4].

All techniques provide overall good results. For this reason, other factors may be taken into account when selecting the treatment. Patient preference may play a role in patient satisfaction [41]. The risk of complications and possible recurrence are other important factors to consider when choosing between treatment strategies. Anatomic repair may result in excellent post-operative outcomes, but its application is limited by the quality of remaining tissues [2, 37]. Tenodesis is often, as mentioned before, used as a salvage technique [4]. For this review, however, only studies were selected where patients had not yet undergone any form of surgical stabilization to filter out previous failed interventions and therefore avoid tenodesis being used for more severe indications.

While including outcome scores in the assessment, there was a high level of heterogeneity. This was caused by the number of different outcome scores used in the studies. When comparing the study populations a heterogeneity percentage of only 20% was calculated, meaning no important heterogeneity was present between the study populations. Hence, it was decided to pool the data with the aim to arrive at reliable conclusions, bearing in mind that the subgroups and high variety in used outcome scores affected study power.

The main limitation of this study is lack of power. There was a low number of studies per treatment type, a lack of pre- and post-operative assessment often without reporting of a SD or 95% CI, thus making data pooling impossible. Additionally, these studies used different outcome scores, again reducing power and increasing heterogeneity of the pooled data. Most studies were excluded based on participation of under-aged patients, performing multiple procedures at the same time or performing stabilization after failed initial surgery. To enable comparison of pre-operative assessments with post-operative assessments, minimizing bias due to unknown pre-operative scores, only the study outcomes that contained both outcome measures were pooled. This lead to a high number of studies being excluded from pooling data, again leading to a reduction of power [30]. An additional problem causing heterogeneity is patient selection for surgery as patients may suffer from mechanical and/or functional ankle instability. As functional instability is neuromuscular by nature, multiple factors are responsible for the feeling of giving way, possibly limiting the effect of surgery [25]. These studies were only included in the qualitative analysis. The quality of all included studies was low. Although the reported Coleman Scores were mainly around 55% of the scale, the population sizes of the individual studies were overall small and included too many outcome scores for the population size. This increased the chance of finding a coincidentally significant difference.

Despite these limitations and the different indications included in this meta-analysis, the strength of this review is the comparison of results per treatment modality. Comparability was enhanced by focusing only on first time surgery of CAI in adult patients. This may help treatment selection in case multiple treatment options are open.

In clinical practise, anatomic repair and anatomic reconstruction are preferred and should be the main treatment choice. Possibly with a slight preference towards anatomic repair in case the ligament remnants allow it, due to a minimal change in outcome with anatomic reconstruction. Additionally, if a repair fails, an anatomic reconstruction is still an option. Tenodesis reconstruction should be limited to salvage procedures only, when no other treatment option is open.

Implications for future research should include more high level studies such as randomized controlled trials on the outcome after different surgical stabilization procedures with a specific description of the population and use of minimum reporting standards advocated by the International Ankle Consortium [13, 17]. This may enhance comparability of both the indications and outcomes.

Conclusion

In conclusion, anatomic reconstruction and anatomic repair provide better functional outcome based on PROM scores in patients treated by surgical stabilization for their ankle instability complaints, compared to tenodesis reconstruction and capsular shrinkage.