Various diseases are associated with increased liver iron content (LIC), which may induce or contribute to liver damage [13]. Serial measurement of LIC during long-term follow-up and treatment is highly desirable, but repeated invasive measurements are not recommended due to risks of complications of serial liver biopsies. Surrogate biochemical markers including serum ferritin and transferrin-saturation are widely used, but are flawed by limited specificity. Thus, accurate non-invasive MRI-based methods of LIC measurement are used in clinical practice for patients (suspected) with increased LIC [4, 5].

Several types of MRI LIC measurement have been described in the literature. Straightforward in–out phase gradient echo (GRE) shows signal loss at the later echo time (TE) but is only qualitative and easily confounded by the presence of hepatic steatosis. Quantitative approaches include (i) signal intensity ratio (SIR) measurement (e.g., the Gandon method) and (ii) MR-relaxometry. The Gandon method (henceforth referred to as “SIR”) utilizes the liver-to-muscle SIR on differently weighted MRI-scans [6]. This method allows easy and free calculation of the LICSIR, by entering ROI values in an online tool [7]. Hence, assuming the acquisition and placement of regions-of-interest (ROIs) are performed correctly, the method is robust to observer influences. A major limitation is its upper limit of detection of 350 µmol/g (equal to 20 mg/g): changes above that threshold cannot be measured.

MR-relaxometry relies on the calculation of tissue relaxation rates (R 2 and R 2*, the inverse of relaxation times T 2 and T 2*), which increase as iron accumulates and are sensitive to changes in LIC values well above the SIR-threshold. One commercialized R 2 approach using single-echo spin-echo (SE) MRI is the FDA-approved St. Pierre method [FerriScan®], performed in 10 min in free-breathing [8]. The per-scan analysis price is ~$300, on top of the costs of the MRI-scan itself. Alternative free-of-charge approaches are available for R 2 using free-breathing or respiratory triggered SE-MRI and for R 2* using single breath-hold GRE MRI [9].

Recent developments in MR-relaxometry include multipeak fat corrections and the use of complex instead of magnitude-only data fitting [10], assessment of the effect of fat suppression on R 2* [11] and the comparison of advanced data fit models [12] and analysis approaches [13].

A comparative study of LICSIR, R 2, and R 2* in 94 patients with β-thalassemia reported high correlations [14]. However, success rates, interobserver agreement, and applicability for diseases other than β-thalassemia were not investigated, nor were serum markers assessed. The latter may be useful to screen for elevated LIC (i.e., >36 µmol/g), saving expensive and limited MRI time. We hypothesize that R 2* is preferable over SIR and R 2 in terms of success rate, acquisition time, and range of detection and over serum values in terms of accuracy in detecting elevated LIC.

In our center, the clinical LIC protocol has included SIR, R 2, and R 2* since 2005, with regular weekly clinical referrals since 2008. The SIR measurement is recommended by the national guideline for hemochromatosis [15]. It is supplemented by R 2 and R 2* measurements to fill the gap caused by the SIR method’s hard cut-off at 350 µmol/g. To investigate our hypothesis, we (i) assessed SIR, R 2, and R 2* LIC measurements and their success rates and interobserver agreement; and (ii) compared the diagnostic accuracies of LICSIR, R 2, and surrogate serum markers for correctly predicting elevated LIC based on increased R 2*.

Materials and methods

Ethical

All data used for this study were acquired in clinical setting and were anonymized prior to analysis. Informed consent was waived by the Medical Research Ethics Committee of the AMC Amsterdam.

Patients

All MRI-based LIC measurements performed between January 1, 2008 and December 31, 2013 were retrospectively included in this study. As additional measurements were added to the protocol in 2014, only measurements up to end 2013 were included. Clinical diagnosis and—when available—serum markers of iron metabolism (total iron, transferrin, transferrin-saturation, ferritin) were collected and subsequently anonymized by a colleague not otherwise involved in this study.

MRI

MRI-scanning was performed supine, feet first on a 1.5T Avanto MRI-scanner (Siemens AG, Erlangen, Germany) using phased-array coils (body array and spine coil) for localizers and R 2 and R 2* measurements and the body coil for the SIR measurement [6]. Use of the body coil provided an as homogenous B1 field as possible, reducing variation in SIR measurements due to variations of flip angles between patients. For R 2* and R 2, the B1 variation is eliminated via the data fit. Breath-hold imaging (localizers, SIR and R 2*) was performed in expiration. Three 10-mm slices with a variable slice gap to cover the liver were equally positioned for all three LIC measurements. Especially for the GRE-based SIR and R 2* measurements, careful B0 shimming is important to achieve a homogenous B0 field, ensuring correct measurements. Shimming was performed with a shim box covering the field-of-view in the feet-head direction and the contours of the abdomen (i.e., excluding the arms) in the left–right and anterior-posterior directions. The SIR measurement according to Gandon et al. requires five (T1, PD, T2, T2+, and T2++) image weightings with specific TR/TE combinations [6]. Table 1 contains an overview of the relevant scan parameters. Of note, the TE interval used for R 2* was shorter (1.41 ms) than the standard in- and out-of-phase interval (2.26 ms).

Table 1 MRI parameters

Data analyses

After inclusion all measurements were checked for correct TRs, TEs, and RF coils using DICOM header information as for SIR measurements, specific TR/TE combinations and the use of the body coil are mandatory. Image quality was assessed by a research trainee (JHR, 4 years of experience) and an abdominal radiologist (JS, 20 years of experience) using a 3-point scale (good/adequate/inadequate). The type of artifact(s) was noted. Measurements with incorrect scan parameters or inadequate image quality were classified unsuccessful.

ROI-placement

SIR, R 2, and R 2* data were processed using custom-made software that allowed ROI-placement, LICSIR calculation, and R 2 and R 2* data fitting. Three blinded observers (JHR, MAT, and EMA) with four, a half and 9 years of experience, respectively, independently placed regions-of-interest (ROIs) for three slices per scan. First, the liver parenchyma was masked on R 2* source data, excluding a rim near the liver edge (Fig. 1 A). Next, non-liver voxels (e.g., vessels, gall bladder) inside the liver contour were masked (Fig. 1 B). By subtracting ROI-2 from ROI-1, only liver parenchyma remained (Fig. 1 C). Liver ROIs were copied from the R 2* data for SIR analysis, with two additional ROIs in both paraspinal muscles, carefully avoiding areas of signal intensity loss close to the lung (Fig. 1 D). This also allowed a check to identify whether patients had moved between R 2* and SIR measurements, in which case new ROIs were placed. Ghosting artifacts caused by aortic blood flow were present in SIR measurements before November 2012 (when saturation slabs were added). Separate ROIs were placed to remove these artifacts from the liver and muscle ROIs (Fig. 1 E, F). Some reports indicate that susceptibility artifacts may affect R 2* measurements when using a single ROI in liver segments VII or VIII [16]. Due to the limited number of slices, we did not formally assess segmental variations of R 2, R 2*, or LICSIR in this study.

Fig. 1
figure 1

Placement of ROIs. AF The placement of ROIs on the data. AC How the ROIs for the total liver parenchyma (A) and intrahepatic vasculature and/or gall bladder (B) are drawn and the result of subtraction in (C). D The ROI-placement on the paraspinal muscles for SIR calculations. E, F The placement of a ghosting artifact ROI (E) and the final liver parenchyma ROI (F) obtained by subtracting (E) from (C)

The respiratory triggering applied for R 2 data acquisition resulted in slight changes in slice positioning so that new ROIs were placed using R 2 source data as described above.

LICSIR

The calculations published by Gandon et al. were entered into the aforementioned program [7, 17], which automatically chooses the most reliable SIR (i.e., T1, PD, T2, T2+, or T2++) which is converted to LICSIR. The mean LICSIR of three slices was used and, when one or more values exceeded the 350 µmol/g threshold, the final value was noted as >350 µmol/g. In two subanalyses, the R 2 and R 2* values and the individual SIR ratios in patients with LICSIR >350 µmol/g were evaluated.

R 2*

In magnitude images, the noise is distributed in a non-Gaussian manner. This is known as Rician noise [18]. At high signal levels, the non-zero mean has a negligible effect on the average signal, but near the noise level, a noise bias exists which needs to be taken into account when fitting R 2*. We explored three different fit routines: a truncated exponential fit (A) [19, 20], an exponential + constant fit (B) [9, 21], and an exponential + Rician noise (C).

The truncated exponential method A is considered the reference standard, but is time-consuming, where methods B + C do not require further manual input. We compared method B and C with method A as reference using Bland–Altman analysis and R 2* data from a single reader (EMA). Based on this comparison (mean paired difference (\( \bar{d} \)) was 0.8 Hz for A–C and 33.6 Hz for A–B), we employed method C (Rician noise bias) for the remaining analyses [22, 23].

R 2* calculation was thus performed with a monoexponential model (Eq. 1) with a Rician noise factor. In Eq. 1, E R describes the Rice distribution (Online Resource 1), where σ is a noise parameter and \( S_{0} \times {\text{e}}^{{ - {R_{2}} ^{*} \times {\text{TE}}}} \) reflects the true magnitude value. Data were averaged inside the ROI before data fitting (average-then-fit).

$$ S\left({\text{TE}} \right) = E_{\text{R}} \cdot \left( {S_{0} \cdot e^{{ - {R_{2}}^{*} \cdot {\text{TE}}}} ,\sigma } \right) $$
(1)

The effect of intrahepatic fat on R 2* was assessed by applying a biexponential model in a subset (n = 10) with definite presence of fat, as identified by the presence of a oscillating signal intensity decay over time. R 2* values with and without correction were compared using Bland–Altman analysis. The (\( \bar{d} \)) was 0.1 Hz—indicating low overall fat content in this cohort—and deemed negligible compared to the subset mean of 70 Hz. Monoexponentially fitted R 2* values were used for all comparisons.

R 2

For R 2 calculation an average-then-fit routine was applied using a biexponential model as shown in Eqs. 2 and 3. In Eq. 2, S T (TE) is the signal intensity without noise at time TE, S 0 is the signal intensity at TE = 0, and R 2 is the relaxation rate. The subscripts a and b indicate fast and slow relaxation components, respectively. For R 2, Rician noise bias was approximated by the Pythagorean addition of an extra fit parameter, the noise factor ‘ν’ in Eq. 3.

$$ S_{\text{T}} \left( {\text{TE}} \right) = S_{{{\text{0}},a}} \cdot e^{{ - R_{2,a} \cdot {\text{TE}}}} + S_{{{\text{0}},b}} \cdot e^{{ - R_{2,b} \cdot {\text{TE}}}} $$
(2)
$$ S\left( {\text{TE}} \right) = \sqrt {S_{\text{T}} \left( {\text{TE}} \right) + \nu^{2} }. $$
(3)

In the biexponential model, an iron-dense and an iron-sparse component are assumed, with short and long R 2, respectively. For further comparisons with LICSIR and R 2*, the bulk R 2 was calculated (Eq. 4) in accordance with the literature [8, 9, 14].

$$ R_{2} = \frac{{S_{{{\text{0}},a}} \cdot R_{2,a} + S_{{{\text{0}},b}} \cdot R_{2,b} }}{{S_{{{\text{0}},a}} + S_{{{\text{0}},b}} }} $$
(4)

Comparison with the literature

The relations between the LICSIR, R 2, and R 2* were compared to published regression analysis results based on either biopsy-proven LIC (LICBIOPSY) [8, 9, 1921] or LICSIR [14].

Statistical analyses

Data are described as number (%) or median (interquartile range, IQR). Results of observers were compared using a Friedman test and Wilcoxon Signed-Rank test as post hoc. Success rates are defined as the number of correctly acquired scans of at least “adequate” quality divided by the total number of measurements. These were compared using a McNemar test. Correlations were assessed with Spearman’s correlation coefficients (r S), interobserver agreement with two-way random, and absolute intraclass correlation coefficients (ICCs). Both were graded according to Landis et al. [24]. Bland–Altman analysis was performed to compare accuracy between the three MRI methods for a single observer and compare the performance of the three observers [22]. In a separate analysis, the calculated R 2 and R 2* values were converted to \( {\text{LIC}}_{R_{2}(\ast)} \) values in μmol/g using the formulas provided by St. Pierre et al. and Garbowski et al. [8, 20] as these were established with image analysis protocols similar to ours.

ROC-analyses were performed for LICSIR, R 2, and serum values with significant correlation with R 2* to establish their diagnostic accuracy to identify increased R 2*, i.e., ≥44 Hz [9]. R 2* was chosen as a reference value as it had the best success rate and shortest acquisition time. The optimal cut-off value for R 2 was found by optimizing the Youden index, while for LICSIR we used the established cut-off value of >36 µmol/g. P values of <0.05 were accepted as statistically significant. Statistical analyses were performed using SPSS Version 22 (IBM Corp, Armonk, NY), MedCalc Statistical Software version 16.2.0 (MedCalc Software bvba, Ostend, Belgium; https://www.medcalc.org; 2016), and GraphPad Prism 5.0 (GraphPad Software, La Jolla, CA).

Results

Patients

Between January 1, 2008 and December 31, 2013, a total of 114 patients (M/F: 74/40) underwent 144 MRI-scans for routine LIC measurement. Patient characteristics and clinical indications for LIC measurement are described in Table 2. Thirty patients had multiple measurements. To prevent a repeated measurements effect on correlation assessment between LICSIR, R 2, and R 2*, only the 114 baseline measurements were used. SIR, R 2, and R 2* data were available for 108/114 (95%), 72/114 (63%), and 113/114 (99%) baseline measurements.

Table 2 Patient characteristics

MRI success rates

Five SIR measurements were classified unsuccessful because a surface coil was used, one due to erroneous TR/TE combinations. Furthermore, image quality was inadequate (respiration artifacts) in a single patient (only R 2 and R 2* acquired). Hence, SIR was successful in 102/114 (89%), R 2 in 71/114 (62%), and R 2* in 112/114 (98%) subjects. The success rate of R 2 was lower than that of SIR and R 2* (P < 0.0001, each). Missing datasets were presumed to not have been scanned, with time constraints and respiratory triggering problems as the major cause of the low success rate of the R 2 measurement. For subsequent analyses, only successful baseline measurements were used.

Interobserver agreement

LICSIR and R 2 values differed between observer 1 and the other observers (Table 3). However, these differences (median values: 80–85 µmol/g and 33–34 Hz for R 2) would be negligible in clinical practice. This was confirmed by high ICCs for SIR, R 2, and R 2* of 0.998, 0.997, and 0.999, respectively. Bland–Altman analysis between pairs of observers showed a single outlier for SIR, while R 2 and R 2* showed differences up to 5% for higher values, reflecting the uncertainties in the data fit at very high LIC (Online Resource 1).

Table 3 MRI interobserver agreement: median (IQR) values

LICSIR, R 2, and R 2*

Median (IQR) LIC SIR , R 2, and R 2* (given for observer 1 and LICSIR <350 µmol/g) were 84 (30–205), 33 (23–48), and 123 (56–321). LICSIR correlated positively with R 2 and R 2* with r S of 0.90 (95% confidence interval (CI) 0.84–0.94, P < 0.0001, n = 57) and 0.98 (95% CI 0.97–0.99, P < 0.0001, n = 87), respectively. R 2 correlated positively with R 2*: r S of 0.95 (95% CI 0.93–0.97, P < 0.0001, n = 71). Figure 2 A, B shows scatter plots of (SIR-based or biopsy-proven) LIC against R 2 and R 2*. Solid lines indicate regression analysis results (95% CI bands as dashed lines). In our patient cohort, R 2 increased linearly with LICSIR (Eq. 5), while R 2* appeared to have a clear non-linear relationship with LICSIR, well described by a quadratic polynomial (Eq. 6).

Fig. 2
figure 2

LICSIR or LICBIOPSY against R 2 and R 2*. A, B Scatter plots of LICSIR against R 2 (A, top) and R 2* (B, bottom) for all successful baseline measurements. Data points are grouped by SIR LIC type: T1; PD; T2; T2+; and T2++. Regression results (equations given in the figures) are shown by solid lines, with dotted 95% CI bands indicating the goodness of the fit. Additional dotted regression lines are based on regression analyses reflecting LICBIOPSY [8, 9, 1921] or LICSIR [14]

$$ R_{2} = 15.5 + 0.107 \cdot {\text{LIC}}_{\text{SIR}} $$
(5)
$$ {R_{2}}^{*} = 42.7 + 0.142 \cdot {\text{LIC}}_{\text{SIR}} + 4.02 \times 10^{ - 3} \cdot {{\text{LIC}}_{\text{SIR}}}^{2} $$
(6)

The LICSIR upper threshold of 350 µmol/g was reached in 15/102 (15%) measurements. In these measurements, only the T1W SIR correlated with R 2*, with r S of −0.72 (95% CI −0.9 to −0.31, P = 0.003, n = 15). Figure 3 shows the T1 W SIR against R 2*, indicating that for LICSIR >350 µmol/g, the discriminatory value of the T1W SIR becomes progressively smaller.

Fig. 3
figure 3

T1W liver-to-muscle SIR against R 2 *. This shows a scatter plot of R 2* values (x-axis) against the liver-to-muscle SIR (y-axis) of successful baseline T1W SIR measurements. Data are grouped into the following: LICSIR <350 and LICSIR >350 µmol/g

Comparison with the literature

Figure 2 A, B also shows published regression lines between either LICSIR or LICBIOPSY and R 2 (Fig. 2 A) and R 2* (Fig. 2 B). Contrary to our finding, these lines indicate a linear increase of R 2* as LIC increases, and a non-linear increase of R 2 as LIC increases. To assess whether this is caused by LICSIR or by R 2 or R 2*, we applied established conversion formulae to convert our R 2 (Eq. 7) and R 2* (Eq. 8) values to LIC values [8, 20]. We then compared these LIC R2* and LIC R2 values to our LICSIR values.

$$ {\text{LIC}}_{{R_{2} }} \, (\upmu {\text{mol/g}}) = 17.91 \cdot \left( {29.75 + \sqrt {\left( {900.7 - 2.283 \cdot R_{2} } \right)} } \right)^{1.424} $$
(7)
$${\text{LIC}}_{{R_2}^{*}} \, (\upmu {\text{mol/g}}) = \frac{0.029 \cdot {R_{2}}^{{*}^{1.014}}}{5.585\cdot 10^{-2}}$$
(8)

These established conversion formulae show a non-linear relation between R 2 and true LIC (Eq. 7) and linear relation between R 2* and true LIC (Eq. 8). Hence, the scatter plot between LIC R2* and LICSIR also revealed a quadratic relation, and that between LICSIR and LIC R2 a linear one (data not shown).

Diagnostic accuracies of LICSIR, R 2, and serum values

Serum total iron, transferrin, transferrin-saturation, and ferritin were available for 56, 56, 54, and 96 out of 114 measurements. All four correlated significantly with R 2*, with best correlation for ferritin at r S = 0.80 (P < 0.0001, n = 94).

Increased R 2* (≥44 Hz) was present in 91 subjects. Of the MRI and serum methods, R 2 and ferritin had best diagnostic accuracies to detect increased R 2* (Table 4). Figure 4 AC shows true and false positive and negative results of R 2 (Fig. 4 A), LICSIR (Fig. 4 B), and ferritin (Fig. 4 C) for establishing increased R 2*.

Table 4 Diagnostic accuracy values to correctly identify increased R 2* (≥44 Hz)
Fig. 4
figure 4

R 2, LICSIR, and ferritin against R 2*. AC Scatter plots between R 2* (x-axes) and R 2, LICSIR, and serum ferritin (y-axes). Dotted lines at x = 44 and at y = 18.3 (A), y = 36 (B), and y = 524 (C) indicate the thresholds for R 2, LICSIR, and serum ferritin to identify increased R 2* (Table 4). Data points are grouped by SIR LIC type: T1; PD; T2; T2+; T2++; >350; and no LICSIR available. Regression results are shown by the solid lines with dotted 95% CI bands indicating the goodness of the fit. Shaded areas indicate true positive (), true negative (), false positive (), and false negative (), respectively

Discussion

This study shows that for routine clinical MRI-based LIC measurements SIR and R 2* are more often successful than R 2. Interobserver agreement was near perfect (ICC > 0.9) for all methods. R 2 and R 2* methods provided relaxation rates when the SIR-threshold (>350 µmol/g) was already exceeded. This gives them an advantage over SIR in subjects with transfusional hemosiderosis (at least 55% of our population), when LIC values can easily surpass 350 µmol/g. The combination of high success rate, high interobserver agreement, ability to detect changes in LIC over a wide range of LIC values, and single breath-hold acquisition favors the R 2* method for LIC measurement.

In our study, the relationship between R 2* and LICSIR was quadratic and remained quadratic when R 2* was expressed as a LIC value using a previously published (biopsy-proven) conversion formula. Other authors report linear relationships. Given the physics of the R 2*–iron relationship, which is basically linear [25], this discrepancy arises either from our R 2* acquisition and analysis or from the reference standard. To rule out the former, we compared three fit routines. The exponential + Rician noise factor fit provided identical results in a fraction of the required time to the established and widely applied but labor-intensive method of manual truncation before exponential fitting.

With respect to reference standard, St. Pierre et al. [8], Wood et al. [9], Hankins et al. [19], Garbowski et al. [20], and Anderson et al. [21] all used biopsy-determined LICBIOPSY as reference standard, whereas we and Christoforidis et al. [14] used the LICSIR according to Gandon. Given the similarity of our MRI protocols, it is unsurprising that Christoforidis’ and our data points show considerable overlap. Arguably, their linear relation between LICSIR and R 2* could also be described by a quadratic polynomial.

Apart from the linear relationship, the other authors report much steeper increase of R 2* as LIC increases [9, 1921]. Anderson et al.’s very steep increase could be due a long TE1 of 2.2 ms compared to all other studies (range of TE1: 0.8–0.99 ms) that hampers the ability to accurately estimate high R 2* values. The fact that the control values of R 2* in subjects without iron overload in those studies but also in this paper hover around 40 Hz is a further argument that the observed difference in LIC–R 2* does not arise from the R 2* acquisition or analysis but from the reference standard.

Hence, the most likely cause of the deviating quadratic relation between R 2* and estimated LIC is the piecewise sampling of the LIC range with five differently weighted GRE-sequences for LICSIR. This has artificially imposed a quadratic behavior on the actually linear relationship between R 2* and true LICBIOPSY. If one looks at the fundamental GRE signal equation (Eq. 9), where PD is proton density and α is flip angle and applies this to the liver-to-muscle signal intensity ratio, the PD and sin(α) terms drop out. By taking the natural logarithm, we find Eqs. 10 and 11. The latter proves that the relationship between R 2* and SIR is logarithmic. Indeed, plotting Fig. 3 with a log-scale for the signal intensity ratio on the y-axis linearized the line (data not shown).

$$ S\left( {\text{TE}} \right) = \frac{{{\text{PD}} \cdot \sin \left( \alpha \right) \cdot \left( {1 - e^{{ - {\text{TR}}/T_{1} }} } \right)}}{{\left( {1 - \cos \left( \alpha \right) \cdot e^{{ - {\text{TR}}/T_{1} }} } \right)}} \cdot e^{{ - {R_{2}}^{*} \cdot {\text{TE}}}} $$
(9)
$$ \ln \left( {\frac{{S_{\text{LIVER}} }}{{S_{\text{MUSCLE}} }}} \right) = f\left( {{\text{TR}},\alpha ,T_{1} } \right) + {\text{TE}} \cdot \left( {{R_2}^{*}_{{,\,{\text{LIVER}}}} - {{R_2}^{*}_{,\,\text{MUSCLE}}}} \right) $$
(10)
$$ {{R_2}^{*}_{,\,\text{LIVER}}} = \frac{{\ln \left( {\frac{{S_{\text{LIVER}} }}{{S_{\text{MUSCLE}} }}} \right) - f\left( {{\text{TR}},\alpha ,T_{1} } \right)}}{\text{TE}} + {{R_2}^{*}_{,\,\text{MUSCLE}}} $$
(11)

For R 2, single- and multiecho SE acquisitions are possible: multiecho SE decreases R 2 due to residual signal of stimulated echoes at a given TE. Single-echo SE increases R 2 because long TEs cause increased sensitivity to diffusion, hence increased signal loss at a given TE. Reported single-echo SE R 2 values [8, 9] were concordantly higher for the same estimated LIC compared to multiecho SE results as in this study and in [14]. In terms of R 2 data fitting, we as many others applied a biexponential model and we did not assess non-exponential decay models as for instance proposed by Jensen et al. [26].

The main limitation of our study is the lack of biopsy confirmation. In our center, liver biopsy for iron determination is seldom performed. Both the national, European and American guidelines recommend reluctance in performing biopsy and underline the high sensitivity of MRI [15, 27, 28]. Moreover, differing processing steps to obtain LICBIOPSY are reported, compromising generalizability. In Gandon’s method, paraffin-embedded liver biopsy specimens are dewaxed using a protocol with a triple xylene wash to remove lipid solids from the sample. This approach was shown to have an elevating effect on the dry weight liver iron calculation compared to processing fresh tissue samples [29]. Another limitation is the fact that we did not perform multipeak fat-correction on complex data [10]. This was not feasible with only magnitude data available. Comparison to other literature is further hampered by the use of different image acquisition and postprocessing protocols which directly influence the calibration curves between the reference standard and the index test. We have opted to compare our findings to calibration curves obtained with similar postprocessing protocols.

ROC-analyses showed that R 2 and ferritin have the highest diagnostic accuracy to identify increased R 2* (≥44 Hz). Both ferritin (≥524 µg/L) and R 2 (≥18.3 Hz) had positive predictive values of 100%, but the wide distribution of ferritin levels for R 2* ≥ 44 Hz indicates that it cannot be used confidently to follow-up treatment nor accurately determine the LIC. In contrast, R 2 shows a different picture with a close distribution around the regression line. In addition, ferritin lacks the spatial information that MRI provides, allowing segmental LIC measurement and follow-up.

R 2 datasets were missing (i.e., not scanned) in 42/114 (37%) subjects. As R 2 is part of our routine scan protocol, this illustrates that the long and artifact-prone R 2 series is skipped first by the radiographer. This makes the R 2 series less suited as first choice for LIC measurement.

Our results favor the use of R 2* measurements for daily clinical practice with the use of an exponential + Rician noise fit method to save time in analysis. The recommendation to (only) use R 2* comes with cautions. It requires careful consideration of scan parameters which should be kept equal for all measurements. Ideally, routine quality control with phantom testing should be performed.

In conclusion, as R 2* can be obtained in a single breath-hold with excellent success rates, high interobserver agreement, and ability to detect changes over a wide range of LIC values and is available from all major vendors without additional per-scan costs, it is our first choice for LIC measurement.