Figure 2 shows a flow chart of the complete experimental work flow. Ten human femora were retrieved post-mortem from seven donors. These comprised three female and seven male femora with a mean age of 80.7 years (range 67–98). Dual energy X-ray absorptiometry (DXA) measurements performed prior to preparation yielded a median T-score of −0.7 (average −1.4) within a range of −4.9 to +1.1. A T-score of −1 or higher is considered normal, whereas clinical osteoporosis is defined by a T-score of −2.5 or lower [22]. All femora were preserved in formalin and surrounding soft-tissue removed. We fitted each femur with a polished tapered cobalt-chrome Exeter size 42-2 stem (Stryker, Limerick, Ireland). For fixation we used radiopaque contrast-enhanced bone cement (Palacos, Biomet, Warsaw, IN, USA). The prostheses were implanted under supervision of an experienced orthopedic surgeon (H.J.L.vdH.) while using standard cemented implantation protocol.
To enable scanning each femur with and without the metal prosthesis, we required removable prostheses. After each prosthesis was cemented it was mechanically removed from the femur, leaving the remaining cement mantle and femur intact (Figs. 3 and 4). This was possible due to the Exeter prosthesis’s smooth polished surface and tapered shape.
Each femur was axially bisected so as to intersect the cement mantle. We subsequently created lesions both proximally and distally from the sawn-through interface and at varying locations along the circumference (Fig. 4) using a rotary burr (Dremel). In total, 27 cavities were created having a mean volume of 2.4 ml (range 1.1–5.0 ml). We measured lesion volumes by using a 0.2 ml-graduated syringe to fill each cavity with water. The lesions were then drained and filled with a fibrous tissue substitute.
Previous studies used water [23], lean beef mince [8, 24], or an unspecified “soft-tissue equivalent” material to fill artificially created lesions [3]. In this study we specifically chose radiologically compatible tissue to represent the fibrotic zones. On four occasions, real periprosthetic fibrotic tissue was retrieved during hip implant revision surgery and its CT opacity measured ex vivo. These tissues had a mean opacity of 72 Hounsfield Units (HU) with standard deviation of 10 HU. This differs substantially from water (mean 0 HU) and our measurements for lean beef mince (mean 50 HU). After evaluating several commercially available alternatives we chose chicken liver, which was considered sufficiently similar with a mean opacity of 77 HU and standard deviation of 6 HU.
During scanning each femur required an inserted prosthesis to hold the two bisected halves in place. When the metal prosthesis was removed we used a mould-cast resin substitute. The resin had a measured CT opacity of 150 HU, placing it above the opacity of soft tissues and blood (∼50 HU), but less than bone (> 300 HU) and much less than metal (>3,000 HU) [25]. The resin prosthesis’s low radiopacity did not significantly contribute to beam hardening, the main source of metal-induced artefacts, and therefore enabled us to acquire optimal images for CT ground truthing.
Scans were performed on a helical CT scanner (Aquilion 16, Toshiba Medical Systems, Japan) at 135 kVp using a 200 mA tube current. The in-slice voxel spacing was 0.44 × 0.44 mm with a slice thickness of 0.5 mm. Following the advice of Lee et al. [12] and Douglas-Akinwande et al. [26], we chose a standard smooth reconstruction filter (FC 12) to minimize metal artefacts.
For MAR we used the recent sinogram-interpolation method of Veldkamp et al. [27]. This algorithm has a lot in common with the original method of Kalender et al. [15] but uses raw sinogram data to interpolate metal traces. Adding a fraction of the original metal signal to the interpolation has a similar role as the nonzero “confidence parameter” of Oehler et al. [28] and makes the implant visible in the final reconstruction.
Each of the 27 fibrotic lesions was independently and manually segmented by each of two experienced users (F.M. and G.K.) using MITK, an interactive segmentation software tool [29]. F.M. and G.K. independently segmented the resin prosthesis volumes as well as the metal prosthesis volumes with and without application of MAR. F.M. and G.K. segmented the volumes sequentially and in randomized order, with 2 weeks separating their segmentation work.
The volumes of the segmented lesions were compared to the physically measured ground-truthed fluid volumes. The metal-affected and MAR image segmentations were registered to their metal-free counterparts using a 3D iterative closest point (ICP) algorithm, correcting for translational and/or rotational offsets between scans. Geometric deviation in each segmented metal or MAR volume was compared to the corresponding metal-free resin prosthesis volume. To avoid interobserver bias when comparing segmentations performed with metal, MAR, or resin volumes, we always compared pairwise segmentations of the same lesion on a per-user basis. Measurements by F.M. and G.K. were treated as separate and not averaged.
The residual shape difference between each segmentation pair was computed by their Hausdorff distance, mean Hausdorff distance, and Dice coefficient. The Hausdorff distance is defined as the global maximum of all the minimum distances between two surfaces. The mean Hausdorff distance is the mean minimum distance between the two surfaces. The Dice coefficient is a ratio between the volumes enclosed by the two surfaces, defined by \( c = \frac{{2\left| {A \cap B} \right|}}{{\left| A \right| + \left| B \right|}} \) and has a value in the range [0,1] where 1 represents complete overlap between volumes and 0 represents completely disjoint volumes. A perfectly matched segmentation pair would have a zero Hausdorff distance and a Dice coefficient of one, whereas a bad match will have a high Hausdorff distance and Dice coefficient approaching zero. The Dice coefficient and Hausdorff distance are well suited to evaluating differences in 3D segmentation such as in Van der Lijn et al. [30].
For each segmentation boundary we computed the median image gradient magnitude, as well as the Michelson contrast between the inner and outer region defined by this boundary. The Michelson contrast for each lesion is defined as \( \frac{{{I_{{out}}} - {I_{{in}}}}}{{{I_{{out}}} + {I_{{in}}}}}, \) where I
in and I
out represent the median image intensities in a 1 mm wide region symmetrically located inward and outward of the segmentation border.
Image registration, distance metrics, and contrast metrics were computed using the Insight Segmentation and Registration Toolkit (ITK), Visualization Toolkit (VTK) and the Python programming language. All computations were performed on the DeVIDE image processing and visualization platform [31].
We did not assume normal distributions of the measured differences in volume, edge gradient magnitude, Michelson contrast, pairwise Hausdorff distances, or Dice coefficients. This decision was supported by the Shapiro-Wilk test for normality, indicating that the hypothesis of normality should be rejected for several of the measurement pairs, as is also visually evidenced in asymmetry in several of the measurement distributions (e.g., see Figs. 6 and 8 below). Distributions of measurements and differences between measurement pairs are described by nonparametric measures such as median and interquartile range. Rather than the Student’s t-test we therefore chose the Wilcoxon signed rank test to compare measurements of the same quantities under metal-free, metal-containing, and MAR acquisition. We furthermore chose not to assume linear relationships between variables when testing for correlation, choosing instead to use Spearman’s rank correlation coefficient, which serves as a nonparametric analogue to Pearson’s correlation.