Non-invasive measurement of liver iron concentration using 3-Tesla magnetic resonance imaging: validation against biopsy
- 226 Downloads
To evaluate the performance and limitations of the R2* and signal intensity ratio (SIR) methods for quantifying liver iron concentration (LIC) at 3 T.
A total of 105 patients who underwent a liver biopsy with biochemical LIC (LICb) were included prospectively. All patients underwent a 3-T MRI scan with a breath-hold multiple-echo gradient-echo sequence (mGRE). LIC calculated by 3-T SIR algorithm (LICSIR) and by R2* (LICR2*) were correlated with LICb. Sensitivity and specificity were calculated. The comparison of methods was analysed for successive classes.
LICb was strongly correlated with R2* (r = 0.95, p < 0.001) and LICSIR (r = 0.92, p < 0.001). In comparison to LICb, LICR2* and LICSIR detect liver iron overload with a sensitivity/specificity of 0.96/0.93 and 0.92/0.95, respectively, and a bias ± SD of 7.6 ± 73.4 and 14.8 ± 37.6 μmol/g, respectively. LICR2* presented the lowest differences for patients with LICb values under 130 μmol/g. Above this value, LICSIR has the lowest differences.
At 3 T, R2* provides precise LIC quantification for lower overload but the SIR method is recommended to overcome R2* limitations in higher overload. Our software, available at www.mrquantif.org, uses both methods jointly and selects the best one.
• Liver iron can be accurately quantified by MRI at 3 T
• At 3 T, R2* provides precise quantification of slight liver iron overload
• At 3 T, SIR method is recommended in case of high iron overload
• Slight liver iron overload present in metabolic syndrome can be depicted
• Treatment can be monitored with great confidence
KeywordsIron Liver Magnetic resonance imaging Haemosiderosis Haemochromatosis
area under the curve
body mass index
dysmetabolic iron overload syndrome
liver iron concentration
LIC assessed by biopsy using biochemical analysis
LIC calculated by T2* conversion
LIC calculated by SIR method
multiple-echo gradient-echo sequence
magnetic resonance imaging
signal intensity ratio
Liver iron content (LIC) is a surrogate marker of whole-body iron load. In overload diseases such as primary or secondary haemochromatosis, LIC measurement is mandatory for guiding therapeutic decisions. Liver iron overload may also be present in non-alcoholic steatohepatitis (NASH) and dysmetabolic iron overload syndrome (DIOS), which are both highly prevalent in the Western population . The main complications are cirrhosis and hepatocellular carcinoma. Many studies [2, 3] have suggested a close correlation between iron deposition and carcinogenesis.
The gold standard for detecting and quantifying liver iron overload is histopathological analysis of a liver sample collected by biopsy with biochemical analysis of the core fragment. The biopsy procedure is both invasive and painful and carries some risk of complications . In addition, the very small liver sample may not be representative of the whole liver in cases of heterogeneous iron distribution .
Non-invasive, quantitative assessment of LIC by 1.5-T magnetic resonance imaging (MRI) has been extensively validated against histology by calculating the relaxation rates R2 and R2* [6, 7, 8, 9, 10, 11] and/or the signal intensity ratio (SIR) between the liver and paraspinal muscles [12, 13, 14]. MRI is thus now used in routine clinical practice to diagnose, quantify and monitor iron overload .
In recent years, 3-T MRI has become more widespread. In view of the shift in magnetic field strength, acquisition parameters need to be adapted and new reference values proposed.
Better sensitivity and accuracy can be expected at 3 T, improving diagnosis of DIOS with low iron burden. Conversely, quantification of high overload cases may prove more difficult .
Recently, the SIR method, based on several single-echo GRE sequences, has been validated against histology at 3 T .
The purpose of our study was to evaluate the ability of the R2* method to detect and quantify liver iron at 3 T using biochemical quantification as the reference method. Our secondary goal was to compare, at the 3-T field strength, two major LIC quantification methods: R2* and SIR.
Materials and methods
Between January 2007 and January 2013, all patients referred for liver biopsy and in whom liver iron overload was suspected according to their disease were prospectively recruited. All patients provided written informed consent to participate in this prospective single-centre clinical trial. In addition to usual care, an MRI scan was scheduled to assess hepatic iron stores. Age, sex and body mass index were recorded.
Biochemical liver iron concentration
Liver biopsy was indicated as per the guidelines of the American Association for the Study of Liver Diseases [18, 19]. A biopsy sample was taken from the right lobe of the liver using a 16-gauge needle (Hepafix 16G, Braun, Melsingen, Germany) under ultrasound guidance. Biochemical liver iron concentration (LICb) was measured using Barry and Sherlock’s method for biopsy samples taken from paraffin-embedded blocks . Liver iron overload was defined as a LICb greater than 35 μmol/g (dry liver). Biochemical analysis was blinded to MRI results.
Magnetic resonance imaging protocol
The study was performed with two 3-T MRI scanners: first with Achieva (Philips, Best, Netherlands) and then with Magnetom Verio (Siemens Healthcare, Erlangen, Germany). The body coil was used as the receive coil to achieve homogeneous signal intensity in the imaged section and avoid signal depth fall-off. Only the Siemens scanner had a compensation method for better B1 homogeneity. There was a slight difference in resonance frequency (127.79 vs. 123.24 MHz) between the two scanners. Using the body coil, we performed one multi-echo gradient echo (mGRE) sequence, with 11 echoes. The selected TEs were slightly different depending on the scanner: a multiple of 1.15 ms for the Philips system and 1.23 ms for the Siemens system. Pixel bandwidth was 1161 Hz for the Philips system and 1048 Hz for the Siemens system. The remaining parameters were identical for both machines: 400 × 400 mm2 field of view; 128 × 121 acquisition matrix; 256 × 256 reconstruction matrix with a pixel size of 1.56 × 1.56 mm2; 120 ms repetition time; 20° flip angle; 7 mm slice thickness; one excitation. The breath-hold acquisition lasted 15 s.
MRI data analysis
Measurements were conducted using an in-house Java program integrating ImageJ functions (NIH, Bethesda, USA). All data were analysed by a radiologist (with 10 years’ experience in abdominal radiology) who was blind to clinical information and to the biopsy result.
Before performing fitting, we applied a noise subtraction algorithm to subtract the mean background noise from the liver signal. Then T2* values were automatically calculated using a simplex non-linear algorithm to fit the magnitude of the complex signal from all echoes or only from in-phase echoes when the signal of the first out-of-phase echo was lower than the signal of the first in-phase echo.
R2* was calculated as follows: R2* = 1/T2*, and we used the linear correlation with LICb to determine LICR2*.
The liver-to-muscle signal intensity ratio (SIR) method was used to calculate LICSIR with the algorithm derived from the same patient series using five single-echo GRE sequences . Only the first four echoes of this formula were used to calculate LICSIR since the longest fifth echo (14 ms) was not obtained in the mGRE acquisition.
Statistical analyses were performed using SAS 9.4 (SAS Institute, Cary, NC).
Qualitative variables were expressed as numbers and percentages. Quantitative data were expressed as means ± standard deviations (SD) if normally distributed and medians (Q1–Q3) if not normally distributed.
Given that LIC quantification variables were not normally distributed, we calculated non-linear correlation coefficients (Spearman) to estimate the strength of the linear relationship between LICR2* or LICSIR and LICb.
Similarly, in order to compare measurements using the Philips or Siemens scanner, generalized Poisson mixed models (GLIMMIX procedure) were used with or without adjustment for sex, BMI and age.
Agreement between LIC quantifications was assessed using the Bland–Altman method, calculating the mean difference (estimated bias, d), the standard deviation of the differences (precision, SD) and the limits of agreement (d ± 1.96SD). Student’s t test was used to determine whether the bias between measurement methods was significant.
Optimal cut-off values for the threshold of LICb at 36 μmol/g were obtained by optimisation of the Youden index from area under the receiver operating characteristic curve (AUROC) analysis.
The area under the curve, sensitivity, specificity, positive and negative predictive values were calculated for both LICR2* and LICSIR.
In order to compare the two methods at different levels, the cohort was divided into equal successive classes according to the values of LICb. Then, LICb − LICR2* and LICb − LICSIR were calculated and compared at the different LICb levels. A similar comparison, corresponding more to the practical intent to diagnose, was also done by using LICR2* classes.
P values less than 0.05 were considered statistically significant.
Patient characteristics (n = 105)
Mean age (SD)
52.1 (± 13)
MRI scanner manufacturer (Siemens/Philips)
Body mass index, mean (SD)
LICb μmol/g, median [interquartile range]
LICSIR μmol/g, median [interquartile range]
LICR2* μmol/g, median [interquartile range]
Our analysis without/with the adjustment for BMI, sex and age yielded no difference in the distribution of LICb (p = 0.65/p = 0.19), LICR2* (p = 0.49/p = 0.14) and LICSIR (p = 0.50/0.27) results between the two groups using MRI machines from different manufacturers.
R2* and LICR2* measurements
The Spearman correlation coefficient (r = 0.95, p < 0.001) indicates a strong positive correlation between LICb and R2*.
Figure 3b shows the Bland–Altman plot of the difference vs. mean values of LICb and LICR2* measurements. The bias (SD) or average difference between the results of the two methods was 7.6 (73.4) μmol/g and the 95% limits of agreement were − 136.4 μmol/g and 151.5 μmol/g. The bias was not statistically significantly different to zero (p = 0.74).
With the reference threshold established at LICb = 36 μmol/g, ROC curves obtained with LICR2* results showed an area under the curve (AUC) of 0.987. The best threshold was given for LICR2* at 32 μmol/g, corresponding to an R2* of 77 s−1 and a T2* of 13 ms, with 47 true positives, 4 false positives, 52 true negatives and 2 false negatives. The sensitivity was 0.96 (95% CI 0.9; 1.01) and the specificity was 0.93 (95% CI 0.86; 1.0).
The best threshold was then given for LICR2* at 27 μmol/g, corresponding to an R2* of 89 s−1 and a T2* of 11 ms, with 44 true positives, 2 false positives, 25 true negatives and 5 false negatives. The sensitivity was 0.90 (95% CI 0.81; 0.98) and the specificity was 0.93 (95% CI 0.83; 1.0).
Linear regression between LICb and LICSIR is shown in Fig. 3c. The Spearman correlation coefficient (r = 0.92, p < 0.001) indicates a strong positive correlation between LICSIR and LICb. Figure 3d shows the Bland–Altman plot of the difference vs. mean values of LICb and LICSIR measurements. The bias (SD) or average difference between the results of the two methods was 14.8 (37.6) and the 95% limits of agreement were − 59.0 and 88.5 μmol/g. The bias was statistically significantly different to zero (p < 0.0001).
With the reference threshold established at LICb = 36 μmol/g, the ROC curves obtained with LICSIR results showed an AUC of 0.965. The best threshold was given for LICSIR = 20 μmol/g with 45 true positives, 3 false positives, 53 true negatives and 4 false negatives. The sensitivity was 0.92 (95% CI 0.84; 0.99) and the was specificity 0.95 (95% CI 0.89; 1.0).
Comparison between LICR2* and LICSIR measurements
With a shortest TE of 1.2 ms, liver iron overload can be reliably quantified by MRI at 3 T with the R2* for patients with biopsy-proven LIC under 130 μmol/g, but the SIR method appears more robust for higher iron overload.
The R2* calculation is well known and its clinical use is well established at 1.5 T. In the literature, there are five main publications validating R2* against LIC determined by biopsy [6, 8, 9, 10, 11]. Conversion formulas have been proposed to estimate LIC from R2* (s−1) with a slope of 0.025 to 0.032 to obtain the LIC value in milligrams per gram. Pooling the data from the main publications, Henninger et al. found a mean slope of 0.029 . Then, to obtain the LIC in micromoles per gram instead of milligrams per gram, we multiplied this mean slope by 18 to obtain 0.52. So, at 1.5 T, simply by dividing the value of R2* expressed per second by 2, we have a correct approximation of LIC expressed in micromoles per gram.
No such validation with biopsies has been done at 3 T. Theoretical calculations suggest a doubling of R2* from 1.5 to 3 T . Then the mean slope to obtain the LIC value in micromoles per gram should be divided by 2 and should be approximately 0.26. Anwar et al.’s  results in five patients seem to confirm this hypothesis but with significant delay between MRI and biopsy. However, in our series we obtained a slope of 0.316, slightly higher than the slope expected by extrapolation of 1.5-T polled data but close to half the higher slope proposed at 1.5 T by Garbowski et al., who used the same laboratory reference . In our series, the background noise subtraction leads to a higher value of R2* and partly explains the residual difference with Garbowski et al.’s results. This emphasises the need for a standardised protocol to obtain more comparable results.
The higher magnetic susceptibility observed at 3 T introduced limitations in the R2* calculation. For high overloads, there is a strong decrease in liver signal intensity. It is then difficult to obtain a correct exponential curve fit.
The SIR method is also widely recognised and used for hepatic iron quantification at 1.5 T. Our study evaluated this method at 3 T using the algorithm defined from single-echo sequences . The results we obtained with an mGRE sequence showed good correlations but with a slight overall overestimation and a slight underestimation for low values because the longest TE, around 14 ms, was not included in the mGRE sequence. For slight to moderate overloads, below 130 μmol/g, almost exclusively patients with DIOS, our study showed a better correlation of the R2* method than the SIR method to LICb. However, in patients with high LICb above 130 μmol/g, corresponding exclusively, in our study, to patients with genetic haemochromatosis, the SIR method provides a better correlation to LICb. At 1.5 T, quantification was possible by SIR up to 350 μmol/g by using the shortest in-phase TE of 4 ms. Rose et al. overcome this limit by using a shorter first TE of 1.8 ms . At 3 T, a first TE of 1.2 ms is short enough to give a liver signal over the signal noise and to allow a SIR estimation in high overload.
Our study is the largest series calibrating R2* versus LICb, for any magnetic field strength. It validates the use of 3-T MRI for hepatic iron quantification. In comparison to the biopsy with biochemical determination of iron, we propose a formula to convert R2* at 3 T to LICb. Despite variation in technical characteristics, there was no significant difference between the two machines used. The use of 3-T MRI is becoming more widespread and some centers use this magnetic field for routine abdominal imaging. Thus, there is a strong need for reference values at 3 T. Moreover, the use of a 3-T magnetic field allows for more accurate quantification of slight to moderate overloads. Improving sensitivity is clinically relevant regarding the increasing incidence of DIOS with low iron overload.
Our study has certain limitations. First, the shortest TE was about 1.2 ms, a value which is also the first TE usually proposed by MR vendors in most built-in protocols dedicated to hepatic iron and fat quantification. Obviously, this TE is not short enough at 3 T to correctly calculate R2* in the case of high overload. It is technically difficult to use a first TE of 0.4 ms, which is half the shortest TE of 0.8 ms proposed by Wood et al. at 1.5 T . Very short TEs will be available using ultrashort echo time (UTE) imaging . In the meantime, the main risk is not being unable to quantify correctly a high overload, which has only a small impact on patient management, but miscalculating R2* and hence underestimating liver iron overload. For example a patient with an R2* of 512 s−1, corresponding to a LICR* of 130 μmol/g, actually had a LICb of 480 μmol/g. This type of error explains how the difference between LICR2* and LICb increases faster with LICR2* classes than with LICb classes. So, to overcome this limitation, we propose either greatly reducing the shortest TE or combining both SIR and T2* methods. Second, we used two different machines with a slight magnetic field difference (3%). Acquisition parameters were as close as possible. However, there were also slight TE differences (8%). This could have produced errors particularly for the SIR method which does not take into account TE differences between the two units. The absence of B1 heterogeneity correction with the first machine may also lead, in some cases, to an overestimation of LICSIR through reduction of the paraspinal muscle signal, as described with single-echo sequences . Third, we used the body coil for both methods. This coil is necessary for the SIR method. A surface coil allows a higher signal for R2* calculation, but this is offset by larger voxels (17 mm3) and T2* fitting to the entire ROI instead of producing a pixel-wise map. Fourth, we only use four of the five echoes used by the 3-T SIR algorithm based on single-echo sequences. This explains the bias observed for the low values of LICb with a LICSIR cut-off of 20 μmol/g for determining overloaded patients. A new version of the algorithm taking into account the reduction in the number of echoes obtained has now been incorporated into our dedicated software. Nevertheless, this has no practical impact since at that level of overload R2* is the most precise method.
This study validates hepatic iron quantification by MRI at 3 T, with a conversion formula to LICb obtained from biopsy material. With the selected TEs, the R2* method is more accurate for slight to moderate hepatic iron overload, whereas the SIR method is more accurate for high overloads. Shorter TEs are needed to improve performance for quantifying massive iron overload by R2* . In the meantime, both methods should be used simultaneously with a breath-hold mGRE sequence acquired using the body coil. The sequence protocol we propose can be applied to the majority of MRI scanners without the need to purchase a specific option. Detailed sequence parameters and a dedicated DICOM software program, incorporating both calculations with cross-checks, are available at www.mrquantif.org.
We received support from the national clinical research program for public hospitals of France. Thanks to Tracey Westcott for the language help. Thanks to all the MRI team of University Hospital of Rennes.
Compliance with ethical standards
The scientific guarantor of this publication is Prof Yves Gandon
Conflict of interest
The authors of this manuscript declare no relationships with any companies whose products or services may be related to the subject matter of the article.
Statistics and biometry
One of the authors (MJ) is a senior biostatistician and has significant statistical expertise.
Written informed consent was obtained from all subjects (patients) in this study.
The study protocol (Clinical trial NCT00401336) was approved by the local institutional review board (ref. 05/17-544).
Study subjects or cohorts overlap
This series of patients have been previously used to define a liver-to-muscle signal intensity ratio (SIR) algorithm from five different monoecho sequences. Here we report the R2* results calculated from a multiecho sequence. We also calculated SIR results from this unique sequence using the previously reported algorithm based on monoecho sequences.
• diagnostic or prognostic study
• performed at one institution
- 24.Krafft AJ, Loeffler RB, Song R et al (2017) Quantitative ultrashort echo time imaging for assessment of massive iron overload at 1.5 and 3 Tesla. Magn Reson Med. https://doi.org/10.1002/mrm.26592