Introduction

Ankle fractures present as one of the most common fractures with a prevalence of 4–9% [1, 2]. Posterior malleolar fracture (PMF), also known as malleolus tertius, posterior tibial fracture, or Volkmann-fragment appears in up to 44% of ankle fractures [3,4,5]. If the posterior malleolus is affected, therapy results may be worse and its presence in ankle fractures is known to be of negative prognostic value [1, 4, 6,7,8,9,10].

Decision-making to fixate PMF is still highly debatable and traditionally often based on fracture size measurement on radiographs, with lack of accuracy and poor reliability [11,12,13,14,15,16,17,18]. Nowadays, it is generally believed that the morphology of the fragment is more closely related to the fracture pattern and is, therefore, more important in classifying the fracture [14, 19,20,21]. Consequently, with regard to the proportion of the affected joint surface and recommendation for surgical fixation of PMF, there is a shift away from the 1/3 dogma [7, 17, 22,23,24,25,26,27,28]. With increasing understanding of fracture morphology and the routine use of computed tomography (CT), efforts have been made in recent years to establish new classification systems based on CT imaging [14, 29, 30]. Until now, there is no international consensus regarding classification and treatment of PMF [24, 31, 32].

A good classification system helps the orthopedic surgeon to identify and characterize a problem, suggest a potential prognosis, and offer guidance in determining the appropriate treatment method for a particular condition. To achieve optimal therapeutic results, a complete understanding of the morphology is indispensable.

Therefore, the aims of this systematic review were first, to determine how many studies use a classification of the PMF; second, to identify and to describe which classifications of PMF exist; third to examine which classification system does have the most reliable (inter- and intra- observer) scores; and fourth, to evaluate the predictive value of the classifications in terms of postoperative outcomes.

Materials and methods

Search strategy

The study protocol was registered in the PROSPERO database (CRD42021264268). The review was performed and reported according to the PRISMA 2020 checklist [33].

The electronic databases of the Cochrane Central Register of Controlled Trials, MEDLINE via PubMed and Scopus were searched systematically. The search was performed on the 20th of March 2021. The following search algorithm was used: (posterior AND ankle AND fracture) OR (posterior AND (malleolus OR malleolar) AND fracture) OR (ankle AND volkmann) OR (trimalleolar AND fracture) OR (posterior AND pilon AND fracture). A final update of the search was conducted 12th of May 2022 using the same search string. Furthermore, reference lists of relevant reviews and included articles were screened for additional articles. Bidirectional citation search was used including backward and forward citation search methods [34]. There were no limitations on journal or publication date of the article.

Study selection

Studies reporting data on classification systems of trimalleolar ankle fractures were screened for using a PMF classification. Inclusion and exclusion criteria were cross checked by three reviewers (HW, JT, EM), first by screening the title and abstract, second by reading the full text. Clinical studies were included for data extraction. Cadaveric studies, review articles, case reports with fewer than 10 cases, studies that did not include a posterior malleolus specific classification, and studies not written in English, were excluded.

Data extraction

The study selection and data extraction were independently performed by two review authors (JT, EM). Disagreements were discussed in a consensus meeting and if a disagreement persisted, a third reviewer (HW) made the final decision. Data were extracted from the included studies using a Microsoft Office® Excel spreadsheet. This included the following data: study design, sample size and source, fragment characteristics (e.g., classification, displacement, treatment), reliability- and validity scores and additional data the classification addressed, like treatment allocation and prognostic value of it, were collected. Names of used classification system were listed and their frequency in use was counted.

Study quality assessment

The methodological quality of the included studies was quantified using a modified Coleman score [35]. The modified Coleman score was applied by two independent reviewers (HW, JT) (Online Resource 1). The score is composed of two parts. Part A assesses study size, average follow-up time, percentage of patients with follow-up, number of interventions, study type, diagnostic certainty, description of surgical method, and postoperative rehabilitation. Part B is comprised of outcome criteria, procedure for assessing outcomes, and description of the subject selection process. The maximum score to be achieved is 100 points.

Statistical analysis

The data were processed descriptively, therefore, no meta-analysis was performed. Patient demographic characteristics (number of patients/feet, patient age and sex) were summarized. Weighted median scores were calculated for the modified Coleman score and for the age of the evaluated patient cohort. Data analysis was performed using IBM SPSS Statistics Version 26.0 (IBM Corp., Armonk, NY, USA). The kappa values of inter- and intraobserver reliability were interpreted as defined by Landis and Koch (< 0.20: slight, 0.21–0.40: fair, 0.41–0.60: moderate, 0.61–0.80: substantial, 0.81–1.00: almost perfect) [36].

Results

Included studies

Evaluation of the databases revealed 3.377 studies potentially relevant for inclusion. After excluding duplicates, title and abstract of the remaining studies were assessed. 380 studies were eligible for full-text analysis, after applying the exclusion criteria (no clinical study, case reports < 10 patients, no classification/no PMF-specific classification), 110 remaining relevant studies were included in this review. The selection process was performed according to “Preferred Reporting Items for Systematic Review and Meta-Analyses” (PRISMA) and is shown in Fig. 1 [33].

Fig. 1
figure 1

PRISMA flow chart

Study characteristics

A total of 110 studies, published between 1965 and 2022, using 143 classification systems, were included. The studies include a number of 12.614 patients with 12.633 ankle fractures and a weighted median age of 44.55 years (13–100). 5.963 patients were female and 5.231 male, 11 studies did not report gender distribution [6, 20, 24, 37,38,39,40,41,42,43,44]. There were 22 prospective studies, 88 retrospective studies; 11 studies were multicenter, and 99 single center studies (Table 1). Detailed information about patient demographics is demonstrated in Table 2. Four specific classifications for the PMF were found: a classification based on the relation of the fragment size compared to the size of the tibial joint surface [45] (referred to as PMF Classification according to fracture size) and 3 CT-based classifications according to Haraguchi, Bartoníček/Rammelt, and Mason [14, 29, 30].

Table 1 Overview of the included studies
Table 2 Patient demographics

PMF classification according to fracture size

Sixty-six studies that used the size of the PMF in relation to the joint surface as a classification could be included. Of these, 35 studies used radiographs and 30 studies used CT to estimate size, one study did not provide a clear statement in this regard. Only one studied inter- and intraobserver reliability, measuring a substantial Kappa of 0.64 and 0.63 respectively [13]. The majority of these studies used either a cut-off value of 25% for fixation of the PMF (26 studies) or fixed the posterior malleolus regardless of size (28 studies). The remaining studies used either 20% (4 studies), 30% (2 studies), or > 1/3 of the joint area (5 studies) as the cut-off value, 1 study fixed the PMF in young patients or in the presence of subluxation from 10%, and 3 studies did not provide any information (Table 3). Nine studies reported a better outcome with reduction of smaller posterior malleolus fragments [4, 6, 7, 46,47,48,49,50,51], whereas seven studies reported no difference between fixation and no fixation of smaller posterior malleolus fragments [52,53,54,55,56,57,58].

Table 3 Studies reporting on the “Size Classification”

Haraguchi classification

The first CT-based classification found, was developed 2006 by Haraguchi et al. which classified PMF into 3 distinct types [14]. Type I is described as a posterolateral-oblique wedge-shaped fragment involving the posterolateral corner of the tibial plafond, type II as a transverse medial-extension fracture line extending from the fibular notch to the medial malleolus, and type III is characterized as a small-shell type fragment at the posterior lip of the tibial plafond (Fig. 2). So far, Haraguchi's classification has been mentioned in 101 studies and was applied in 44 of them, which were, therefore, included and can be seen in Table 4. Three studies reported on the reliability of the classification, all showing substantial interobserver reliability (Fleiss kappa 0.70/Cohen’s kappa 0.799/Cohen’s kappa 0.797) and substantial to almost perfect intraobserver reliability (Fleiss kappa 0.77/Cohen’s kappa 0.985) [24, 32, 59]. Modifications of the Haraguchi classification were found three times. Kumar et al. divided Haraguchi type II into subtype A: a single fracture line extending from the fibular notch of the tibia to the medial malleolus, and subtype B: a posterior fracture lines forming 2 separate fragments, which was also applied by Sheikh et al. [60]. Wang et al. also modified Haraguchi type II by categorizing the fracture line into an anterolateral oblique line (subtype I) and into a small avulsion (subtype II) [61]. Palmanovich et al. divided the posterior segment by a central line, perpendicular to the bimalleolar line, into medial and lateral sub-segments, creating a 4-quadrant grid; each posterior malleolar fracture was then categorized based on the fragment’s location into “postero-lateral”, “postero-medial” and “postero-central” [62]. In terms of predictive values, type II fractures were regarded to show worse outcome [19, 59, 63], have higher presence of osteoarthritis [59], and are more likely to require placement of 2 syndesmotic screws [41]. The use of a posteromedial approach for type II fractures have resulted in good Olerud and Molander ankle score (OMAS)[64]. Mertens et al. observed an improving AOFAS score from type I to type III [65], Xie et al. found most intercalary fragments (more than 2/3) in type I fractures [28], and Kang et al. reported a better outcome with surgical treatment of type I fractures smaller than 25% [49].

Fig. 2
figure 2

Overview of the Haraguchi classification based on CT images (axial views). a Haraguchi type I b Haraguchi type II c Haraguchi type III

Table 4 Studies reporting on the Haraguchi classification

Bartoníček/Rammelt classification

Another CT-based classification was presented by Bartoníček/Rammelt in 2015 [29]. Five different fracture types were defined: type 1 as an extraincisural fragment with intact fibula notch, type 2 as a posterolateral fragment including the fibula notch, a posteromedial two-part fragment extending to the medial malleolus as type 3 fracture, a posterolateral fragment larger than one-third of the notch as type 4 fracture, and finally irregular osteoporotic fragments as type 5 fracture (Fig. 3). It also includes a treatment algorithm. The Bartoníček/Rammelt classification has been found 46 times in the literature, of these, 21 studies have used it as a classification system, which were included in this study and are shown in Table 5. There is one modification made by Tucek et al., who divided Bartoníček type 4 into three subtypes: subtype 1 as a fracture line that passes laterally past the malleolar groove, subtype 2 as a fracture line that involves the malleolar groove, and subtype 3 as an intercollicular fracture line or a line involving the posterior colliculus [66]. Two studies reported reliability of the classification, both showing substantial interobserver reliability (Fleiss kappa 0.78/Cohen’s kappa 0.744) and almost perfect intraobserver reliability (Fleiss kappa 0.81/Cohen’s kappa 0.936) [24, 32]. Regarding the predictive outcome value, type 1 fractures showed to have better outcome than type 2 fractures [65], and a significantly improved clinical outcome was achieved in type 4 fractures when they were surgically fixed [54]. With increasing fracture type, clinical outcome became worse [1, 21, 63].

Fig. 3
figure 3

Overview of the Bartoníček/Rammelt classification based on CT images (axial views). a Bartoníček type 1 b Bartoníček type 2 c Bartoníček type 3 d Bartoníček type 4

Table 5 Studies reporting on the Bartoníček/Rammelt classification

Mason classification

In 2017, Mason et al. developed a CT-based classification of PMF ascending in severity of injury [30]. Therefore, Mason described type 1 as an extra-articular avulsion fracture following a rotational force applied to the foot when the ankle is in plantarflexion and the talus unloaded. Rotational forces applied to a loaded foot result in a type 2A fracture in form of a primary triangular posterolateral fragment. A type 2B fracture with a secondary posteromedial fragment, usually angled at 45° to the primary fragment, occurs when the talus continues to rotate in the mortise. A type 3 fracture is characterized by a coronal fracture line that involves the entire posterior plafond due to an axial loading of a plantarflexed talus (Fig. 4). Until now, Mason's classification has been mentioned 22 times in literature, and used for classification in 12 studies, which were included and can be found in Table 6. One modification of Mason type 2B fracture was found. Vosoughi et al. divided it into a large intra-articular pilon fragment and a small extra-articular fragment [67]. Interobserver reliability ranged from substantial to almost perfect values (Cohen’s kappa 0.919/Fleiss kappa 0.61/Cohen’s kappa 0.717) as did intraobserver reliability (Fleiss kappa 0.65/Cohen’s kappa 0.957) [24, 30, 32]. As for the predictive outcome value, type 3 fractures tend to show worse postoperative outcome [68].

Fig. 4
figure 4

Overview of the Mason classification based on CT images (axial views). a Mason type 1 b Mason type 2A c Mason type 2B d Mason type 3

Table 6 Studies reporting on the Mason classification

Quality assessment of included studies

The Coleman score achieved a total median value of 43.5 points (14–79), composed of Part A with a median of 26 points, and Part B with 18 points. Based on the number of patients included, the weighted median total Coleman score was 42.5. Coleman score points are shown in Table 1.

Discussion

By reviewing the literature, 4 classifications were found describing PMF: a classification based on the fragment proportion in relation to the distal tibial joint surface [45] and the three CT-based classifications according to Haraguchi, Bartoníček/Rammelt, and Mason [14, 29, 30]. The earliest and most commonly used classification was the PMF Classification according to fracture size as first specified by Nelson and Jensen, who postulate a recommendation for treatment of PMF with a fragment size exceeding more than 1/3 of the articular surface on lateral radiographs based on a study sample consisting of 8 patients [45]. With 66 included studies, this classification accounts for the largest proportion of classifications used by surgeons in clinical practice. In the included studies the most used cut-off value was 25%, but also values of 20%, 30% or 1/3 of the articular surface were used.

There are still controversial opinions for osteosynthetic treatment of PMF [69]: McDaniel and Wilson demonstrated, that if a PMF of less than 25% of the tibial joint area was not reduced, it did not significantly affect the overall outcome [58]. De Vries et al. and Xu et al. found no evidence for fixing PMF smaller than 25%, as outcome scoring systems showed no significant better outcome [52, 53], as well as Guo et al. for PMF in tibial spiral fractures [54]. Comparing the outcome of treating PMF less than 25% with that of not fixing it no significant difference in the AOFAS Score was found [55,56,57]. On the other hand, a trend toward better clinical and radiological outcome in patients in whom PMF was fixed was observed and, therefore, authors recommend PMF fixation of even smaller fragments that cannot be satisfactorily reduced by ligamentotaxis [6, 46, 47, 49, 50]. Baumbach et al. and Tosun et al. postulated even that in PMF of all sizes, syndesmotic stability is significantly more likely to be restored if treated by open reduction internal fixation [48, 51]. In relation to the total number of studies using this classification, the number of studies in terms of predictive outcome values is rather limited. In the matter of inter- and intraobserver reliability, the available evidence is also meager, Büchler et al. were the only ones to study this, providing good results with an inter- and intraobserver reliability of kappa of 0.64 and 0.63, respectively [13]. Of all studies that asses the PMF classification according to fracture size, all but two [6, 49] are of retrospective design. Especially in the earlier studies, the evaluation of the fracture was not optimal, since this was done mainly on the basis of lateral radiographs.

The use of radiographs was found to be limited for the accurate size estimation of PMF [12, 14, 18, 70], therefore, it recently came to the increasing use of computed tomography (CT) in the diagnosis of trimalleolar ankle fractures [18, 47, 54]. Subsequently, the conviction increases that not the size, but the fracture morphology is crucial for the improvement of outcome [19]. Factors such as syndesmotic stability, joint congruity, postoperative step-off, reconstruction of the incisura, intercalary fragments and talar subluxation are thought to be of prognostic importance to consider when treating PMF [7, 23, 48, 50, 51, 53, 58, 63, 71,72,73,74,75,76]. Hence, a paradigm shift has occurred [21, 24, 31, 77], as also the systematic review by Odak et al. has previously shown [22].

This is where the three CT-based classifications come to the fore. The classification used in the majority of studies is the one proposed by Haraguchi [14]. Most probably due to being the first CT-based classification and due to the simple and clear structure dividing the fracture in three types. Since 2015, however, a preference for the Bartoníček/Rammelt classification has emerged, with the main strengths of this classification being the ascending severity of the classification and the derived therapy recommendations [29]. After noting that the Haraguchi classification did not map the mechanism of injury, Mason developed the most recent classification, also considering the injury mechanism [30].

Some objections against Haraguchi’s classification have arisen with the time. First, the classification is not based on severity and thus does not relate to functional outcome [78]. Second, that the classification was based only on axial sectional images and, therefore, fractures were only assessed in one plane, vertical size expansion not being estimated [31], that medial injuries were not evaluated, which may lead to misjudgments [17, 32], and that the extent of involvement of the tibial incisura was not specified, wherefore type I fractures include a wide range of both small and large posterolateral fragments [59]. Most multi-fragmentary fractures cannot be defined using this classification [79]. Also, the three modifications found [61, 62, 80] may suggest that Haraguchi’s classification is not as advanced to represent all fracture types. Regarding the predictive value of the classifications in terms of postoperative outcomes, some authors have shown that type II fractures have worse clinical outcomes [19, 59, 63], whereas Mertens et al. observed an improvement in the AOFAS score from type I to type III [65].

The Bartoníček/Rammelt classification was developed on the basis of a larger patient population. It ascends in severity and contains a therapy recommendation [29, 81]. Zhang et al. were able to show that the potency of the Bartoníček/Rammelt classification also applies to distal tibial spiral fractures with associated PMF [82]. One objection is the imprecise definition of type 5 fractures, which includes all fractures that cannot be classified as type 1–4. We were not able to find an image of such a type 5 fracture: neither in the original article nor in our own fracture-database. Another objection is the difficulty of estimating 1/3 of the tibial incision to distinguish between a type 2 and type 4 fracture [32]. There is a consistent opinion on worse outcome with increasing fracture type [1, 63, 65]. Only Neumann et al. saw an increase in the AOFAS score and no difference in the Olerud and Molander ankle score (OMAS) [21].

The authors of the Mason classification see the advantage in the ascending degree of severity of the classification considering the accident mechanism. They have also introduced treatment recommendations based on their classification. Gandham et al. even made a recommendation on the appropriate operative approaches [30, 68, 83]. However, they described the classification using schematic drawings and also do not define the tibial incisura [32]. In addition, a multi-fragmentary fracture of the entire tibial plafond may be mistaken for a two-part posterolateral and posteromedial fracture (type 2B) [32]. With the exception of one study describing a worse outcome in Mason type 3 fractures [68], there are no further statements on predictive values. Until now, Mason’s classification has not yet been able to establish itself in literature with only 12 included studies. In addition, half of all studies using Mason’s classification were conducted by the author's own research group, and it was Mason himself who found the highest interobserver reliability in his study (kappa 0.919) whereas other authors found considerable lower reliability scores (Fleiss kappa 0.61 / Cohen’s kappa 0.717) [24, 30, 32, 43, 44, 67, 68, 83].

Intra- and interobserver reliability are substantial to perfect for all classifications, with Mason scoring the lowest in comparison to the other classifications [24, 32]. However, none of the classifications can adequately describe the complexity of posterior malleolus fracture, as factors such as extent of articular surface impaction, degree of dislocation or intercalary fragments among others are not taken into account [32, 79].

Several important classifications were excluded because they are not PMF-specific. This includes the AO classification originally published in 1987 by Müller/AO, being a universal classification depicting all skeletal injuries. It is a valuable, international classification, which has its justification, and which has been used for years [84, 85]. With the routine use of CT imaging to reliably diagnose and classify trimalleolar fractures [9], authors have shown that all fractures involve the articular surface of the distal tibia [14, 29, 81]. This in contrast to the specification of the AO’s classification through Heim, dividing posterior malleolar fractures into extra- and intra-articular fractures [86]. The AO classification, based on standard plain radiographs, is therefore not suitable for considering the significance and morphology of PMF, nor is it applicable in addressing specific questions regarding PMFs [24, 48, 87].

Classification systems of posterior pilon fractures were also considered to be non-PMF-specific. Hence, the differentiation of pilon fractures from trimalleolar ankle fractures still often causes difficulties in clinical practice [75, 88, 89]. This has led to the emergence of a subset of PMF, also known as the “posterior pilon” variant, which has recently gained popularity [61, 87, 90,91,92,93]. However, there is still no clear definition and the understanding of it varies [75, 81, 94, 95]. In addition, there are studies showing that posterior pilon fractures are a separate entity due to morphological differences [61, 94].

Other excluded classifications were sub-entities of PMF fractures. For example, a classification of PMF in tibial shaft fractures (TSF) [96, 97], and one also involving talar subluxation [98].

A few more limitations are worth noting, with majorly the limited quality of the included studies. Limitations affecting the Coleman score include the predominantly retrospective nature of the included studies and small patient cohorts. Therefore, the results of this study could only be presented in a descriptive manner. Only studies written in English were considered, excluding further useful contributions written in other languages.

In conclusion, this review demonstrates that there has been a shift from usage of the PMF classification by fracture size to the newer CT-based classifications, however, none have been able to establish itself in the literature so far. Summarizing all of the previously described points, we believe that, to date, no classification is able to adequately describe the complexity of the PMF. Also, the classifications are weak in terms of a derivable treatment algorithm or prognosis of outcome. According to this review, the Bartoníček/Rammelt classification has the most potential to prevail in the literature and in clinical practice due to its treatment algorithm, its reliability in combination with consistent predictive outcome values.