Introduction

Metal-on-metal hip resurfacing arthroplasty (HRA) is a surgical option in the treatment of end-stage hip disease. Systemic levels of chromium (Cr) and cobalt (Co) ions in whole blood, serum, or urine reportedly correlate with levels measured in joint fluid [4] and with the linear [4] and volumetric [15] wear of the femoral component. During the run-in phase of metal-on-metal HRA, the ion levels rise to peak levels approximately 9 to 12 months followed by a leveling off or a slow decrease of the systemic Cr and Co concentrations once the lower wear steady-state phase is reached [11]. In patients with continuous elevated wear associated with edge loading resulting from component malpositioning (eg, steep acetabular component) [3] or with hip resurfacing designs with a lower coverage angle [16], metal ion levels increase further and are correlated with clinical symptoms and radiographic evidence of adverse reactions such as osteolysis [3, 7, 10, 16, 19]. Thus, once the running-in phase is completed, ie, after approximately 12 months, systemic Cr and Co concentrations are considered surrogate markers of in vivo wear and their measurement is advocated by regulatory bodies such as the Medicines and Healthcare products Regulatory Agency in the United Kingdom [21] and the Food and Drug Administration in the United States [6] as a screening tool for the malfunctioning of metal-on-metal hip arthroplasties.

Metal ion measurements allow the early detection of increased wear before extensive tissue destruction has occurred with a better outcome of revisions [5, 8]. The Medicines and Healthcare products Regulatory Agency is recommending additional investigations when Cr and/or Co levels exceed 7 μg/L, which may include cross-sectional imaging [21]. However, this limit is arbitrary and not supported by scientific data. Moreover, it only applies to unilateral metal-on-metal hip arthroplasty. For bilateral metal-on-metal hip arthroplasty, metal ion levels are vaguely considered to be approximately twice as high [10]. Rigorous safe upper limits of systemic Cr and Co concentrations with clinical importance have yet to be established for metal-on-metal hip arthroplasty in general and for metal-on-metal HRA in particular. Furthermore, the interpretation of systemic metal ion levels and their use in the diagnostic and therapeutic management of patients with metal-on-metal HRA has not been outlined and may be challenging in asymptomatic patients with elevated ion levels or symptomatic patients with low ion levels.

We therefore questioned whether: (1) patients undergoing hip resurfacing with no clinical problems could be differentiated from those with clinical and/or radiographic problems (evidence of component malpositioning, migration, loosening or periprosthetic bone loss) based on metal ion levels; (2) if there was a threshold metal level that predicted the need for clinical intervention; and (3) if patient and implant factors differed between these groups.

Patients and Methods

We retrospectively identified all patients with unilateral (n = 453) and bilateral (n = 139) HRAs in whom metal ion measurements were available after 12 months postoperatively (after the running-in phase) and operated on from 1998 to 2010. During that period, a total of 3454 HRAs were implanted by a single surgeon. We included only those patients for whom we had complete clinical and radiographic data available at last followup at more than 12 months postoperatively and included metal-on-metal HRA of any design. We excluded patients with metal-on-metal THAs so as to avoid the additional ions from modular taper junctions. We also excluded patients with other possible sources of metal ions such as medication or food supplements containing Cr or Co, occupational exposure, or the presence of other metal implants such as THA, TKAs, spinal hardware, or metal dental implants; we also excluded patients with renal insufficiency or even slightly elevated creatinine levels. The mean age at first surgery was 53 years (range, 29–70 years). The primary diagnosis was osteoarthritis in 91% of the patients, avascular necrosis in 6%, and congenital dysplasia, inflammatory arthritis, and trauma each in 1%. There were 260 men (57%) and 193 women (43%) in the unilateral group and 78 men (56%) and 61 women (44%) in the bilateral group. The minimum length of followup was 1 year (mean, 4.3 years; range, 1–12 years) for patients with unilateral HRA and 1 year (mean, 5 years; range, 1–13 years) after the last resurfacing procedure for patients with bilateral HRA. No patients were recalled specifically for this study; all data were obtained from medical records and radiographs.

Sixteen patients received their HRA as part of bilateral procedures performed simultaneously. In the unilateral group, there were eight different hip resurfacing designs (Table 1). The most commonly implanted prosthesis was the BHR (Smith & Nephew, Memphis, TN, USA) (n = 288 [64%]) followed by the Conserve Plus (Wright Medical Technology, Memphis, TN, USA) (n = 128 [28%]), the ASR (DePuy, Warsaw, IN, USA) (n = 20 [4%]), the Durom (Zimmer, Warsaw, IN, USA) (n = 9 [2%]), and four other designs (< 1%). In the bilateral group, there were seven different HRA implant combinations. The most commonly implanted prosthesis was the BHR (67%) and the most commonly encountered combination was a bilateral BHR (n = 85 [61%]) followed by a BHR-Conserve Plus combination (n = 23 [17%]) and a bilateral Conserve Plus (n = 22 [15%]); all other combinations occurred in less than 3%. The median femoral head size was 50 mm both in the unilateral (mean, 49.5 mm; range, 38–62 mm) and the bilateral group (mean, 49.6 mm; range, 38–58 mm).

Table 1 Demographics of the study population

Clinical followup evaluation for asymptomatic patients was typically done at 6 weeks and 1, 2, 5, and 10 years, but patients with pain were evaluated at the time of presentation to the clinic and more regularly as needed to monitor symptoms. The evaluation included reviewing patients’ pain, walking, and function; calculating a Harris hip score; performing a ROM examination of the hips; and obtaining standing AP and lateral radiographs of the pelvis and resurfaced hips.

Two of us (GG, AC) independently measured acetabular component inclination and anteversion using Einzel Bild Röntgen Analyse (EBRA) [18]. The theoretical contact patch to rim (CPR) distance was calculated using the method proposed by Langton et al. [17] based on hip contact forces [2] taking into account the acetabular component orientation, the femoral component radius, and the functional coverage arc as described by Griffin et al. [9] as an indication of the in vivo area of cover of the femoral head by the acetabular component. A CPR distance of less than 10 mm is associated with a high risk of edge loading and subsequent increased wear [17]. All radiographs were evaluated for radiolucent lines, osteolysis, component loosening, or migration by the two observers (GG, AC) independently. Except for reactive lines, any change, even minor, from the postoperative radiographs was considered pathological and was noted. The correlation coefficient for intra- and interobserver reliability was 0.9 (p < 0.01).

We began obtaining metal ion measurements in 1999 for clinical trials and subsequently on a routine basis at every followup visit since the end of 2005. All patients’ clinical, radiographic, and metal ion data are entered prospectively in a database designed for the followup of hip arthroplasties (Orthowave™; Aria Software Ltd, Arras, France). We collected blood samples for metal ion measurements in compliance with a recommended rigorous collection protocol [19] using an intravenous catheter (Becton Dickinson Insyte-W™, Sandy, UT, USA). After the catheter was introduced, the metal needle was withdrawn and the first 5 mL of blood was discarded to avoid possible metal contamination from the needle. We collected a subsequent second 5 mL of blood using approved metal-free vacuum collection tubes for metal ion measurements (Terumo Venosafe VF-106SAHL; Terumo Europe NV, Leuven, Belgium). There is no consensus to date on which matrix (whole blood or serum) is superior [19, 24]. We routinely use serum measurements that are performed at the Laboratory of Toxicology, University Hospital, Ghent, Belgium, using an inductive-coupled plasma mass spectrometry technique (ELAN DRC II; Perkin Elmer Life and Analytical Sciences, Shelton, CT, USA). The laboratory quotes its quantification limit as 0.5 μg/L with a reproducibility of 95%. The analyses are IQC and EQC controlled according to the QMEQAS (Quebec Multielement External Quality Assessment Scheme).

To address the first question, the cohort was divided into a well-functioning group and a poorly functioning group. The criteria to be fulfilled for allocation to the well-functioning group were rigorous: (1) no patient-reported hip complaints; (2) no surgeon-detected clinical findings; (3) Harris hip score higher than 95 points; (4) CPR distance greater than 10 mm; (5) no abnormal radiological findings; and (6) no further operation scheduled (Table 2). For a HRA to be considered well functioning and to be allocated to that group, all of the criteria had to be fulfilled, whereas bilateral patients had to fulfill all criteria for both hips. A patient with any deviation from these criteria, even for only one criterion in one hip, was allocated to the other group. The well-functioning group consisted of 251 patients with a unilateral HRA (55%) and 58 patients with bilateral HRA (42%), whereas 202 patients with a unilateral and 81 patients with a bilateral HRA were identified to have clinical and/or radiographic problems that placed them in the poorly functioning group.

Table 2 Criteria for allocation to the well-functioning hip resurfacing arthroplasty group*

The questions were examined as follows: (1) the differences in serum metal ion levels between the well-functioning and poorly functioning groups were examined using the Mann-Whitney U and the Kruskal-Wallis tests to determine if ion levels could be used to differentiate between these groups; (2) to establish a threshold metal level that predicted the need for intervention, the guideline upper limit ion level values for well-functioning implants was established as the highest values, which were not considered as outliers. The definition used for the upper limit was (75th percentile) + 1.5× (interquartile range) = top margin of the box and whisker plot [12]. The sensitivity and specificity of these upper limits of both Cr and Co for diagnosing clinical problems were examined by performing receiver operating characteristic (ROC) analyses [26]; (3) finally, the patient and implant factors, including sex, component size, cup inclination, coverage arc, and CPR distance, of the well-functioning group were compared with corresponding values of poorly functioning patients using parametric or nonparametric tests as necessary. IBM SPSS Statistics Version 19 (SPSS, an IBM Company, Chicago, IL, USA) was used.

Results

When the ion levels were compared between the two groups, higher ion levels were found in the poorly functioning group (p < 0.001) (Table 3). The median ion levels in the unilateral well-functioning group were Cr: 1.6 μg/L, Co: 1.5 μg/L versus Cr: 2.5 μg/L, Co: 2.1 μg/L in the other group (p < 0.001; Fig. 1A). Similarly in the bilateral patients, median levels were higher in the poorly functioning group (p < 0.001) (Fig. 1B). Patients with unilateral HRAs had lower ion levels compared with bilateral HRAs (p < 0.001).

Table 3 Serum chromium (Cr) and cobalt (Co) ion levels in the well-functioning and poorly functioning unilateral and bilateral HRA*
Fig. 1A–B
figure 1

(A) Box plots of serum Cr and Co levels in the well-functioning and poorly functioning groups for unilateral HRA. Well-functioning group patients have lower (p < 0.001) levels compared with poorly functioning group patients. There is a predominance of females among the outliers. (B) Box plots of serum Cr and Co levels in the well-functioning and poorly functioning groups for bilateral HRA. Like in the unilateral group, well-functioning group patients have lower (p < 0.001) levels compared with poorly functioning group patients and there is a predominance of females among the outliers.

The upper ion limits differentiating the two groups were established as Cr: 4.6 μg/L and Co: 4.0 μg/L for unilateral and Cr: 7.4 μg/L and Co: 5.0 μg/L for bilateral HRA (Fig. 2). Ion levels higher than these were correlated with clinical symptoms (p < 0.001) and with risk factors for high wear, ie, smaller component size (Cr: p < 0.001, Co: p = 0.002), smaller cup coverage arc (p < 0.001), smaller CPR distance (p < 0.001), and higher cup inclination (p < 0.001). From the ROC analyses, both Cr and Co levels were better than chance (p < 0.001) at diagnosing clinical problems (Fig. 3A–B). For unilateral HRA, sensitivity and specificity of the well-functioning group upper limits in predicting poor function were, respectively, 25% and 95% for Cr and 22% and 96% for Co (Fig. 3A). Odds ratios of having a poorly functioning HRA were 6.0 for Cr > 4.6 μg/L and 5.9 for Co > 4.0 μg/L. For levels higher than 10 μg/L, the specificity of predicting clinical problems was 100%. For bilateral HRA, sensitivity and specificity of the well-functioning group upper levels were, respectively, 43% and 93% for Cr and 38.6% and 91% for Co (Fig. 3B). The odds of having a poorly functioning HRA were 8.7 for Cr > 7.4 μg/L and 6.0 for Co > 5.0 μg/L.

Fig. 2
figure 2

Acceptable upper limits of Cr and Co levels for unilateral and bilateral HRA were established as the highest values, which were not considered as outliers for the well-functioning group patients. The definition used for the upper limit was (75th percentile) + 1.5 × (interquartile range) = top margin of the box and whisker plot [12]. Lower ion levels (p < 0.001) were found with unilateral HRA compared with bilateral HRA.

Fig. 3A–B
figure 3

(A) ROC curve demonstrating the power (sensitivity and specificity) of serum Cr and Co for diagnosing clinically relevant problems in unilateral HRA. For Cr, area under the curve (AUC) was 0.67 (0.62–0.72), whereas for Co, AUC was 0.65 (0.59–0.70). Sensitivity and specificity of the upper limits in predicting poor function were, respectively, 25% and 95% for Cr and 22% and 96% for Co. (B) ROC curve demonstrating the power (sensitivity and specificity) of serum Cr and Co for diagnosing clinically relevant problems in bilateral HRA. For Cr, AUC was 0.79 (0.71–0.86); for Co, AUC was 0.76 (0.68–0.84). Sensitivity and specificity of the safe upper levels for bilateral HRA were, respectively, 43% and 93% for Cr and 38.6% and 91% for Co.

The well-functioning group contained more males (57% unilateral, 61% bilateral) compared with the poorly functioning group (Fig. 4A–B) and larger sized components. A ≥ 50-mm femoral component had 64% chance of being in the well-functioning group compared with 38% for a < 50-mm component. There was no sex difference in the < 50-mm size group but for ≥ 50-mm sizes, females still had a higher chance of being in the poorly functioning group (59% versus 33% of males; p = 0.004). Female patients predominated in the ion level outliers (Fig. 1A–B). Implant designs were differently distributed between the two groups for unilateral but not for bilateral patients; 66% of unilateral patients with a Conserve Plus were in the well-functioning group versus 43.5% for ASR, 41.5% for BHR, and 11% for Durom (p = 0.014). Patients receiving the Conserve Plus and Durom also had lower (p < 0.001) ions and greater CPR distance compared with patients receiving ASR and BHR. For bilateral HRA, different implant combinations had similar ion levels (p = 0.25) and a similar risk ratio of being in the poorly functioning group (p = 0.45).

Fig. 4A–B
figure 4

(A) Differences in sex and component size between the well-functioning and poorly functioning group in unilateral HRA. The majority of males (57%) were in the well-functioning group compared with the majority of females (64%) who were in the poorly functioning group. Well-functioning group patients had larger sized components (mean femoral head size, 50.4 mm) compared with poorly functioning group patients (mean, 48.6 mm) (p < 0.001). (B) Differences in sex and component size between the well-functioning and poorly functioning groups in bilateral HRA. Like with unilateral HRA, the majority of males (61%) were in the well-functioning group compared with the majority of females (72%) in the poorly functioning group. Optimum group patients had larger sized components (mean femoral head size, 51.5 mm) compared with poorly functioning group patients (mean, 48.0 mm) (p < 0.001).

Discussion

Metal ion levels have been used as a surrogate marker for wear of metal-on-metal hip arthroplasties [3, 4, 7, 10, 15, 16, 19, 21]. At our institution, metal ion measurements have been collected on a large number of patients undergoing HRA, initially only when a malfunctioning prosthesis was suspected [3]. Since 2006, metal ion measurements are part of our routine clinical followup also testing asymptomatic patients and those without apparent risk for high wear. All data are entered prospectively into a database. The purpose of this study was to use these data to find threshold levels of ions differentiating well-functioning and poorly functioning HRA and to determine which patient and implant factors were associated with those groups. The results confirm that ion measurements are an important diagnostic tool in the management of patients with HRA [4].

We acknowledge limitations to our study. First, it is not always straightforward to interpret ion levels in certain clinical situations such as symptomatic patients with low levels or asymptomatic patients with high levels. Metal ions are an adjunct to clinical and radiographic evaluations and should be considered as part of the whole clinical picture. For this reason, we have developed a clinical management algorithm that incorporates our findings with our clinical experience and wear measurements from explants (Fig. 5). Second, the strict criteria that were used to assign patients to the well-functioning or poorly functioning groups may have resulted in lower than expected sensitivity of the threshold levels to predict poorly functioning implants, as discussed subsequently.

Fig. 5
figure 5

Diagnostic and therapeutic algorithm for the followup of a hip resurfacing arthroplasty is shown.

The findings of the present study confirm previous reports that both unilaterally and bilaterally resurfaced patients with well-functioning implants have low systemic metal ion levels [3, 4, 10, 16]. These levels are comparable to the 1-μg/L level proposed as the indicator of well-functioning small-diameter metal-on-metal THAs [19]. By contrast, metal ion levels were higher in poorly functioning implants as previously reported [3, 4, 13, 15]. Per our strict criteria, implants suspected to be loose were assigned to the poorly functioning group and although loosening may be associated with higher ions [4, 5], this is not always the case. For example, we identified seven patients with loose Durom acetabular components with low ions (median, Cr: 0.8 μg/L, Co: 0.7 μg/L) and retrieval analysis of the explanted components demonstrated low wear (Fig. 6).

Fig. 6A–B
figure 6

(A) Coordinate measuring machine (CMM)-derived wear depth map of a retrieved Durom HRA, head size 44 mm, showing low wear (maximum, 8.5 μm). This was implanted in a 26-year-old female patient with congenital hip dysplasia and revised after 19 months for cup loosening. Prerevision cup position measured with EBRA was 50° inclination and 14° anteversion. Metal ions prerevision were Cr: 1.6 μg/L, Co: 0.5 μg/L. (B) Same retrieved Durom HRA: scanning electron microscopy (SEM) picture of the bearing area approximately 5000× displaying only occasional scratches and a smooth background surface.

We propose that the upper metal ion levels in the well-functioning group can be considered as acceptable upper limits for unilateral (Cr: 4.6 μg/L, Co: 4.0 μg/L) and bilateral HRA (Cr: 7.4 μg/L, Co: 5.0 μg/L) patients and that these can serve as threshold levels for clinical management decisions. These levels are lower than the 7-μg/L threshold recommended by the Medicines and Healthcare products Regulatory Agency [21], but this study had a very low tolerance for what was considered a poorly functioning hip. Seventeen patients undergoing unilateral HRA assigned to the poorly functioning group had levels < 7 μg/L but higher than the proposed acceptable limits. The high specificity and odds ratios of the proposed limits in predicting problematic HRA are concurrent with reports [3, 4, 7, 10, 15, 16, 19] associating outlier values of metal ions with poorly functioning implants with high wear. However, because of low sensitivity, lower levels than the proposed limits may still be associated with poor function. The median ion levels of the poorly functioning group are only slightly higher than in the well-functioning group despite the statistically significant difference as a result of our strict criteria, which can include infection, metal allergy, or a loose, low-wear bearing with low ions.

Certain patient and implant variables differed between well-functioning and poorly functioning implants. As previously reported [3, 4, 16, 20, 23], females and smaller head sizes (< 50 mm) are more at risk of having a problematic HRA associated with higher metal ion levels. These findings are consistent with Arthroplasty Registry reports [1, 22]. The effect of cup coverage angle on ion levels is illustrated by the comparison of different HRA designs. Implants with a larger functional articular arc (Durom, Conserve Plus) had lower ion levels compared with ASR and BHR designs with a smaller articular arc [15, 17], particularly in the smaller sizes. Cup malpositioning and impingement are associated with increased wear and higher metal ion levels [3, 10, 16]. Well-functioning bilateral HRAs have higher ion levels compared with well-functioning unilateral HRAs. Having bilateral HRAs increases the risk of having a problematic resurfacing with metal ion levels predominantly reflecting the wear condition of the worst implant.

To provide practical applications of our findings, we have developed an algorithm (Fig. 5). For ease of use, metal ion levels are subdivided into < 4 μg/L, 4 to 10 μg/L, 10 to 20 μg/L, or > 20 μg/L (Table 4). Symptomatic patients with low ion levels must be investigated thoroughly with blood tests and additional imaging for infection, soft tissue reactions possibly related to allergy, or component loosening. If a patient has elevated metal ion levels without clinical symptoms or radiographic changes, cross-sectional imaging should be performed, which may reveal a soft tissue reaction [13], necessitating revision surgery. In case of minor, asymptomatic abnormalities such as a thickened hip capsule or a small fluid collection (< 5 cm3), with metal ion levels < 20 μg/L, close clinical followup and sequential cross-sectional imaging are advocated. If all investigations fail to show any problem, the patient is followed yearly with a clinical, radiographic, and metal ions examination for 2 consecutive years. If no further increase or a decrease of ion levels is seen, the patient returns to the normal followup regime (1, 2, 3, 5, 10 years). In case of a continuous increase of metal ions, additional investigations are repeated.

Table 4 Practical classification of the metal ions levels (Cr and Co) for use with the diagnostic and therapeutic algorithm (Fig. 5)

Co ion levels above 20 μg/L are a reason for serious concern because such levels have been associated with neurological, otological, and cardiac symptoms [25]. In those cases, a revision should be considered even with minor clinical symptoms. The use of metal ion measurements as a screening method for the early detection of increased wear before extensive soft tissue or bone destruction has occurred leads to a better outcome of HRA revisions [5, 14].