Pseudotumor in metal-on-metal hip arthroplasty: a comparison study of three grading systems with MRI

Objective Pseudotumors, a well-known complication of metal-on-metal total hip arthroplasty (MoM THA), are well identified on metal artifact-reducing sequences magnetic resonance imaging (MARS-MRI). Several MRI grading systems are described in the orthopedic literature, but their validity is unknown in large clinical studies. Our study was undertaken to describe the classification of pseudotumors in a preselected cohort divided into high- and low-risk patients, using three pseudotumor grading systems applied on MARS-MRI, and to determine the interobserver reliability of the grading systems. Patients and methods A retrospective study was performed on 377 consecutive patients (240 MRI scans) treated with an M2a-38 and Taperloc stem combination (Biomet, Warsaw, IN, USA). Patients were divided into a high-risk and a low-risk group based on previous published risk factors. Two observers determined the presence of pseudotumors using three different pseudotumor grading systems for classifying MARS-MRI results. Results The prevalence of pseudotumors as determined with MARS-MRI was 59% in our high-risk group, 0% in the low-risk group and 43% in the control group. Serum cobalt values were increased in the high-risk group. The kappa values of the Anderson, Hauptfleisch and Matthies grading system scores were 0.43, 0.44, and 0.49 respectively. Conclusions High-risk patients are at a high risk for pseudotumor development. No pseudotumor development was found in low-risk patients. Interobserver reliability scored best with the Matthies system, but all three grading systems showed only a moderate agreement.


Introduction
Concerns have been raised on the use of metal-on-metal total hip arthroplasty (MoM THA) because of frequent early revision rates [1], raised cobalt and chrome serum levels in the blood and their possible toxicological effects [2], and the occurrence of cystic and/or solid massed, or so-called pseudotumors, in the periprosthetic tissue [3]. Recent studies show that the incidence of pseudotumors is comparable with other THA systems such as cobalt on polyethylene and metal on polyethylene [4,5]. MoM THA-induced pseudotumors can cause compression of the neurovascular bundle [6,7]. This had also been described for conventional THA, although this pseudotumor compression was associated with a broken or worn-out inlay, which induces a MoM reaction [8,9]. A possible explanation could be that the MoM reaction stimulates the formation of pseudotumors with a larger mass than those observed in conventional THA, although recent research rejected this suggestion [4]. Associations between the presence of a pseudotumor and serum cobalt levels have been described [10]. High local cobalt values may induce pseudotumor formation and are known to cause osteolysis [11].
Risk factors for the formation of a pseudotumor are cobalt >5 μg/ l [10], female gender [12,13], pain [14], and a high inclination angle >55° [15]. Despite the observed associations and risk factors, the exact mechanism of THA-induced pseudotumors is still unclear. Pseudotumors are well defined on MRI and three grading systems have been described in the orthopedic literature by Anderson [16], Matthies [17], and Hauptfleisch [3]. Table 1 provides details of the scoring systems.
Van der Weegen et al. [18] described kappa values for all three classification systems (49 hips) and Chang et al. [14] describes a kappa for the Anderson classification (192 hips). Importantly, other radiological studies on pseudotumors do not report a kappa value and therefore the results should be interpreted with caution [3,[19][20][21]. The reproducibility of these grading systems is of clinical importance and may help to unravel the etiology of a pseudotumor.
The aims of this study were to describe the classification of pseudotumors in a preselected cohort by utilizing metal artifact-reducing sequences magnetic resonance imaging (MARS-MRI) in a high-risk and low-risk group for pseudotumor development, and to study the interobserver reliability of three different pseudotumor grading systems in a large single cohort of MoM THA (240 hips).

Patients and methods
Our investigation reviewed 377 uncemented MoM THA performed in our institution between February 2008 and January 2011. In all cases, a cobalt and chromium bearing couple consisting of a monoblock acetabular cup with a 38-mm fixed size head design was implanted (M2a-38; Biomet, Warsaw, IN, USA). The cup size ranged from 48 mm to 64 mm. All patients received a press-fit titanium femoral stem (Taperloc, Biomet, Warsaw, IN, USA).
All patients were subjected to a pre-defined screening protocol, which was initiated after the first concerns of the MoM THA. The clinical results of this screening were reported recently [22]. In the current study, all patients with an MoM THA and available MARS-MRI were selected.

Screening protocol
Patients received a standardized outpatient consultation. This included physical examination, patient-reported questionnaires, blood analysis for serum cobalt, radiographs of the hip and pelvis, and magnetic resonance imaging (MRI). Contrast-enhanced MRI of the hip region with MARS was performed in patients with osteolysis on the X-ray, elevated serum metal ion levels above 5 μg/l, or pain. Patients without these criteria received routine annual follow-up, which was the same as the first screening. Pain was defined as the presence or absence of any pain in the hip area reported by the patient.
Cobalt and chromium ion levels were determined in the serum with the use of an AAnalyst 800 Atomic Absorption Spectrophotometer (Perkin Elmer, Waltham, MA, USA). The blood samples were collected in a metal-free container. Serum cobalt levels ranging from 0.04 to 0.64 μg/l are considered normal in the general population [23]. Cobalt serum levels higher than 5 μg/l were defined as elevated in MoM THA [2]. Inclination was measured on the 6-week post-operative X-ray. The observer who performed the measurements had shown good reliability for measuring post-operative cup inclination angle (ICC = 0.74, p = <0.001) in a previous study (Smeekes et al., accepted for publication) Patients were divided into three groups based on the likelihood of developing a pseudotumor: one group was supposed to have a high risk for developing a pseudotumor based on the literature. Selection criteria for the high-risk group were: serum cobalt >5 μg/l [10], female sex [12,13], hip pain [14] and a high cup inclination angle >45°, which is considered to be out of the safe zone in resurfacing prosthesis [16,24]. Inclusion criteria for the low-risk group were the opposing criteria (male sex, no hip pain, cup inclination <45°, and serum cobalt <5 μg/l). All other patients were used as the control group.
For a description of the pseudotumors, the Anderson classification [16], the classification of Matthies [17], and the Hauptfleisch classification were used [3]. Also, a list of other findings on the MRI scan that are not reported in these scores is shown (Table 1). An experienced musculoskeletal radiologist and a musculoskeletal radiologist in training independently scored the MRI scans. The radiologists applied all grading systems at the same time. Kappa values were compared for each grading system. Discordant cases were discussed and consensus was obtained for the classification system with the best kappa values.
For the MRI scans, an MRI scanner with a field strength of 1.5 Tesla was used. The following MARS-MRI sequences were used: T1-weighted coronal plane with echo time (TE)

Statistical analyses
Descriptive analyses were performed on baseline data and final outcomes. The results are expressed as means with standard deviations or medians with ranges where relevant.
Reliability of the radiological measurements was evaluated by calculating interclass correlation coefficients using Cohen's kappa. The differences among the cobalt values in the pseudotumor classification group were analyzed using post-hoc tests in a one-way ANOVA after logarithmic transformation of the cobalt values because of the skewed (positive) distribution of these values. Chi-squared test was used for the analyses of gender and pain between the pseudotumor classifications. Fisher's test was used to analyze the difference between the presence of a pseudotumor in the high-and low-risk groups. A t test was used for comparing the serum cobalt levels in patients in the control group with and without a pseudotumor. For all tests, a two-tailed significance level of 0.05 was used. SPSS software (version 20; IBM, Armonk, NY, USA) was used for the analyses.

Results
A total of 240 patients with an M2a-38 and Taperloc stem combination (Biomet, Warsaw, IN, USA) were eligible for analyses. Demographics and reason for surgery are listed in Table 2. Seventy-five patients had bilateral prostheses: all 75 had prostheses of the M2a-38 type with a Taperloc stem and were included in the study. The contralateral hip prostheses were of different types and were not included in the study. Twenty-three patients had a bilateral M2a-38 prosthesis and Taperloc stem (=46 M2a-38) combination and 119 patients had a unilateral MoM prosthesis of the M2a-38 type and a Taperloc stem (=119 M2a-38).
A bursa filled with fluid (without connection with the joint) was found for 25.8% cases by observer 1 and for 24.2% of the cases by observer 2. Atrophy of the gluteus medius and minimus muscles was scored by observer 1 in 90 cases, whereas observer 2 scored 39 cases as positive. Atrophy of the gluteus medius muscle was found in 72 of the cases by observer 1 and in 16 cases by observer 2.
After the consensus, 106 pseudotumors were diagnosed with the use of the Matthies grading system. Type 2a pseudotumor was the most frequent classification (Fig. 3). Patients with a type 2a pseudotumor had a mean serum cobalt of 13.1 μg/l (SD 23.7). Thirty-eight of these patients had pain and 61 were female. After the logarithmic transformation of the cobalt values, a significant difference could be observed between the group of patients without a pseudotumor and the group of patients with a type 1 or 2a pseudotumor (p < 0.05), and also between a type 1 pseudotumor versus a type 2a pseudotumor (p < 0.05) or type 3 pseudotumor (p < 0.05) respectively. No difference between the pseudotumor classification and pain or gender could be detected (Table 5). Of the 106 pseudotumors, 52 were asymptomatic (49%). There was no difference between the presence of a pseudotumor in the symptomatic patients (54 out of 118) compared with the asymptomatic patients (52 out of 122; p = 0.62; Table 6). Regarding categories of pseudotumor wall thickness (no wall, <3 mm or ≥3 mm), there were no differences between the symptomatic and asymptomatic patients ( Table 5).

High-and low-risk patient screening
After classification of pseudotumors according to the Matthies grading, patients were divided in a high-risk group for pseudotumor development, a low-risk group, and a control group. A significantly higher risk was found for pseudotumor development in the high-risk group (59%, 20 out of 34) versus the low-risk group (0%, 0 out of 5; p < 0.001). In the control group, 86 pseudotumors were diagnosed in 201 THAs (43%). The patients in the control group with a pseudotumor had a mean serum cobalt of 8.3 μg/l (SD 6.6 μg/l) and patients in the control group without a pseudotumor had a mean serum cobalt level of 7.8 μg/l (SD 7.5 μg/l; p = 0.61).
A difference was found between cobalt serum values and high-and low-risk patients. No differences were found in the control group between patients with a pseudotumor and those without a pseudotumor with regard to serum cobalt levels. No significant differences in the type of pseudotumor, pain, and gender were found among the groups. Also, no differences in the control group were found between symptomatic and asymptomatic patients.

Discussion
As far as we know, this is one of the largest series of MoM THA in which patients were screened for having a pseudotumor by utilizing MARS-MRI [25]. A total of 106 pseudotumors were diagnosed after consensus in 240 MoM THAs (44%), of which 49% had no symptoms. In the low-risk patient group (male sex, no hip pain, cup inclination <45°, and serum cobalt <5 μg/l), no pseudotumors were diagnosed, whereas a high percentage of pseudotumors (59%, 20 out of 34) was found in the high-risk patient group (serum cobalt >5 μg/l [10], female sex, hip pain, and a high cup inclination angle >45°). However, the control group also showed a high  Fig. 3 Flowchart of patients after consensus on the Matthies classification percentage of pseudotumors (43%, 86 out of 201). The highrisk group showed a significantly higher risk of developing a pseudotumor.
A higher cobalt level is also correlated with different levels of pseudotumors. One may hypothesize that a (local) higher cobalt level influences the formation of a pseudotumor; however, several studies show that cobalt values are a poor predictor [17,26,27]. In the literature, it is reported that the existence of a pseudotumor varies by type of prosthesis. In large-head MoM prosthesis patients, 40-60% of the patients develop a pseudotumor [10,28,29]. Our cohort results agree to these findings. Other studies showed that the prevalence of pseudotumors in conventional THA ceramics on polyethylene and metal on polyethylene is comparable with MoM THA [4,5], but they are less symptomatic and revisions due to pseudotumors are rare in non-MoM THA. More patients with MoM THA report pain compared with patients with conventional THA [30], which may be caused by the local toxicity of the cobalt, hypersensitivity reactions on the metal release, subsequent osteolysis, and soft-tissue damage [22,31]. Chang et al. reports that soft-tissue damage is associated with pain and not the presence or size of a pseudotumor [14]. The behavior of pseudotumors in the long term is unclear.
A high complication rate (14%) in MoM revision surgery has been reported, with a 7% re-revision rate after 2 years (range: 26-52 months) [32] and a dislocation rate of up to 28% [33][34][35]. Three of the studies used a posterior approach and one study used a posterior approach in 80% of the cases and in 20% an anterolateral approach. The exact reason for the higher luxation rate after these revisions is not clear, but all authors suggest that the extensive destruction of soft tissue caused by the MoM prosthesis might play an important role. Atrophy of the gluteal musculature and subsequent instability may contribute to the high dislocation rate. In the case of symptomatic patients, MARS-MRI can be used as a preoperative tool to classify the damage of the soft tissue before revision and add to decision-making to choose a dual mobility cup to lower the dislocation rate [36].
To the best of our knowledge, only one study has reported the interobserver reliability among the three grading systems [18]. In this study, the highest kappa (0.58) was found for the Anderson grading system. In our study, the highest kappa (0.49) was found for the Matthies classification, 0.44 for the Anderson classification, and 0.43 for the Hauptfleisch classification. Although this study included a larger cohort of patients compared with van der Weegen et al., a lower kappa value was measured. We do not have an explanation for this, but in both studies, there was only moderate agreement, which questions the reproducibility and thus clinical use. Another difference is that we only used the M2a-38 system and in the other study, three different systems had been used. A shortcoming of the study is that the MARS-MRI scored all grading systems at the same time, which can create a Bcross-contamination^of results. The low kappa value of the Anderson score can be partly explained by the inclusion of more parameters of periprosthetic tissue compared with the Matthies and Hauptfleisch scores, which only score the characteristics of the pseudotumor.
Compression by a pseudotumor on the neurovascular bundle, soft-tissue damage, and osteolysis aid in the clinical decision-making whether to revise or not.
In conclusion, a higher occurrence of pseudotumor development in high-risk patients was found. No pseudotumor development was found in low-risk patients. However, patients in the control group also showed a high occurrence of pseudotumors. This means that every patient, except those defined as Blow risk,^is at a substantial risk for developing a pseudotumor. No differences were found in the control group between patients with a pseudotumor and those without a pseudotumor with regard to serum cobalt levels. The Matthies score was the most reliable classification, but all three grading systems showed limited interobserver reliability. MARS-MRI is one of the tools that aids in clinical decision- There are significant differences between the group of patients without a pseudotumor versus the group of patients with a type 1 (p < 0.05) or 2a (p < 0.05) pseudotumor respectively, and of type 1 versus type 2a (p < 0.05) or versus type 3 (p < 0.05) respectively

Compliance with ethical standards
Conflicts of interest The authors declare that they have no conflicts of interest.
Open Access This article is distributed under the terms of the Creative Comm ons Attribution 4.0 International License (http:// creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.