Performance of different Dixon-based methods for MR liver iron assessment in comparison to a biopsy-validated R2* relaxometry method

Objectives To prospectively evaluate a 3D-multiecho-Dixon sequence with inline calculation of proton density fat fraction (PDFF) and R2* (qDixon), and an improved version of it (qDixon-WIP), for the MR-quantification of hepatic iron in a clinical setting. Methods Patients with increased serum ferritin underwent 1.5-T MRI of the liver for the evaluation of hepatic iron overload. The imaging protocol for R2* quantification included as follows: (1) a validated, 2D multigradient-echo sequence (initial TE 0.99 ms, R2*-ME-GRE), (2) a 3D-multiecho-Dixon sequence with inline calculation of PDFF and R2* (initial TE 2.38 ms, R2*-qDixon), and optionally (3) a prototype (works-in-progress, WIP) version of the latter (initial TE 1.04 ms, R2*-qDixon-WIP) with improved water/fat separation and noise-corrected parameter fitting. For all sequences, three manually co-registered regions of interest (ROIs) were placed in the liver. R2* values were compared and linear regression analysis and Bland-Altman plots calculated. Results Forty-six out of 415 patients showed fat-water (F/W) swap with qDixon and were excluded. A total of 369 patients (mean age 52 years) were included; in 203/369, the optional qDixon-WIP was acquired, which showed no F/W swaps. A strong correlation was found between R2*-ME-GRE and R2*-qDixon (r2 = 0.92, p < 0.001) with Bland-Altman revealing a mean difference of − 3.82 1/s (SD = 21.26 1/s). Correlation between R2*-GRE-ME and R2*-qDixon-WIP was r2 = 0.95 (p < 0.001) with Bland-Altman showing a mean difference of − 0.125 1/s (SD = 30.667 1/s). Conclusions The 3D-multiecho-Dixon sequence is a reliable tool to quantify hepatic iron. Results are comparable with established relaxometry methods. Improvements to the original implementation eliminate occasional F/W swaps and limitations regarding maximum R2* values. Key Points • The 3D-multiecho-Dixon sequence for 1.5 T is a reliable tool to quantify hepatic iron. • Results of the 3D-multiecho-Dixon sequence are comparable with established relaxometry methods. • An improved version of the 3D-multiecho-Dixon sequence eliminates minor drawbacks.


Introduction
In recent years, magnetic resonance imaging (MRI) has been established for the evaluation of hepatic iron overload [1,2]. The benefits of MRI are at hand: non-invasive, nowadays widely available, no relevant risk factors, additional information on iron overload of the spleen and pancreas, reduction of sampling errors to a minimum [3][4][5].
Nevertheless, there are also some limitations that have been addressed in the last years. One problem is the wide range of different techniques, as e.g. R2 or R2* relaxometry or the signal-intensity-ratio method [6][7][8][9]. Further, most of these approaches do not have regulatory approval for iron quantification, which limits their use in larger multicenter studies or clinical trials. Small institutions or private practices without the availability of experts in the field mostly do not provide iron quantification due to the seemingly complex sequences and post-processing procedures. In addition, the variety of different measurement sequences and software solutions complicates the comparability of the various methods. Consensus is still missing, which makes it even more difficult for each institution to find the best approach.
Most vendors of MR scanners have recently developed Dixonbased solutions with integrated post-processing where PDFF and R2* are simultaneously calculated [10], and may be used for quantifying iron [11]. The corresponding products for different vendors are thereby known under the following brand names: "IDEAL-IQ" from General Electric, "StarQuant" (or mDixon-Quant) from Philips, and "LiverLab" (or qDixon) from Siemens Healthcare. These sequence techniques are promising to fulfill the requirements for an accurate evaluation of iron, however, with rather high purchase prices. In the literature, there is only limited data on the clinical usefulness and accuracy of these approaches [11][12][13]. The qDixon sequence used in our institution is based on a 3D multigradient-echo acquisition and uses controlled aliasing undersampling [14], which allows acquisition in a single breathhold. Further, advanced inline processing via a multistep adaptive fitting approach facilitates evaluation without further postprocessing [11]. Any image-viewing software, that allows region of interest (ROI)-based signal intensity measurements, can be used for measuring R2* and proton density fat fraction (PDFF) values.
As studies evaluating clinical applications of commercial Dixon-based sequences for hepatic iron quantification are rare, it was the purpose of our study to evaluate qDixon and an improved (works-in-progress) version of this sequence (qDixon-WIP) for the assessment of hepatic iron overload in daily clinical routine to enhance confidence in these methods. For this purpose, we compared results from qDixon/qDixon-WIP with an established, biopsy-calibrated 2D multiecho R2* relaxometry method [9].

Materials and methods
This prospective study was approved by our Institutional Review Board (Medical University of Innsbruck). Written informed consent was obtained from each patient.

Patients
All patients were referred to our department (Department of Radiology, Medical University of Innsbruck) for the evaluation of hepatic iron overload between December 2015 and September 2019. The inclusion criteria were as follows: (1) increased serum ferritin (> 300 μg/L in male patients and > 200 μg/L in female patients, (2) age > 18 years, (3) acquisition of our MRI protocol for the evaluation of diffuse liver disease as listed below, where qDixon-WIP was available only from November 2017 and therefore an optional sequence. General contraindications to MRI were used as exclusion criteria. Further, patients that showed a complete fat/water swap (F/W swap) at the qDixon sequence were not included in our study.

MR examination and image analysis
All patients were examined with a 1.5-T whole-body MR scanner (MAGNETOM Avanto fit , Siemens Healthcare). Patients were scanned in supine position using an 18element body matrix coil and 12-16 elements of the integrated 32-channel spine matrix coil. The technicians carefully instructed the patients to suspend respiration at end expiration and to be consistent in their breath-holds. Our protocol for diffuse liver disease is provided in Table 1. We aimed at evaluating three sequences, which are relevant for the quantification of hepatic iron: qDixon, qDixon-WIP, and our reference sequence R2*-ME-GRE. Each sequence was acquired in breath-hold and in transversal orientation. For the comparison between the sequences, R2*-ME-GRE was considered reference because it was already evaluated in a clinical setting and correlated to biopsy data in earlier studies [9]. The qDixon sequence automatically calculates PDFF and R2* maps during image reconstruction without the need of further post-processing. Though the sequence is focused on the quantification of liver fat fraction, the sequence parameters suggested by the vendor (in particular the long initial echo time) were, not changed for this study, which would also be the case in small institutions or private practices without special technical expertise in the field. qDixon-WIP is a prototype version with the same MR sequence part as for the qDixon product sequence, however, with several improvements integrated into the inline image reconstruction: global fat/water (F/W) swaps during the initial Dixon water/fat separation stage of the multistep fitting approach [10] are detected using an AI-based classificator [15] and reversed if necessary. To mitigate noise bias in the subsequent magnitude fitting stage, a noise map is calculated. It is based on the system's built-in adjustment functionality, which measures noise for the given receive coil setup, in combination with knowledge about the noise propagation through the individual image reconstruction steps as described in [16]. First-moment noise-corrected parameter fitting is then performed analogous to the approach described in [17], but with the noise level being a value known via the noise map, rather than a free parameter of the signal model. Also, the fat signal dephasing term is retained in the signal model, which then reads |s n | is the magnitude signal measured at echo time TE n , w and f are the (unknown) water and fat signal components, respectively, and c n is the complex-valued fat signal dephasing factor at echo time TE n . E σ {…} denotes the expectation value of the term in brackets given the (known) noise level σ. Finally, an additional inline calculation of liver iron concentration (LIC) maps was implemented, which allows ROI measurements in iron units. In addition to the modified inline image reconstruction, the initial TE and ΔTE were reduced for qDixon-WIP to 1.04 ms and 1.17 ms, respectively, without changes of receive bandwidth. The reduced TE values subsequently lead to a decrease of TR which could be exploited to reduce the total acceleration factor while still obtaining a slightly shorter acquisition time (Table 1). R2* maps for the R2*-ME-GRE sequence were calculated using a custom-written plugin for ImageJ (Wayne Rasband, National Institutes of Health) by fitting on a pixel-wise basis with a truncation model [18]. For image analysis of qDixon and qDixon-WIP, our local picture archiving and communication system (PACS) was used (IMPAX; Agfa-Gevaert). Image analysis was performed independently by a radiologist (P.M.) with 9 years of experience in liver MRI (ROI placement) and by a physicist (C.K.) with 14 years of experience in liver MRI post-processing (calculation of the R2* maps). First, the liver was reviewed concerning possible focal liver lesions or artifacts. Then, three manually co-registered regions of interest (ROIs) were placed within the liver for all sequences, two in the right lobe and one in the left lobe. Major vessels were avoided. The diameter was 10-13 mm with an area of 0.72-1.15 cm 2 . The mean R2* value (1/s) was calculated using the available three ROI measurements.
Further, we calculated the LIC for qDixon using a crosscalibration with the reference R2*-ME-GRE sequence and additionally correlated the obtained results using different available calibration equations from studies by Wood et al, Henninger et al, Hankins et al and Garbowski et al. [6,9,19,20]. Agreement between all LIC results was calculated based on direct LIC values and based on two different evaluation criteria: (1) a simple iron yes/no classification defined by a LIC of > 36 μmol/g (2 mg/g) and (2) based on the classification system proposed by the EASL [21].

Statistical analysis
Statistical calculations were performed using the R Project for Statistical Computing [22]. To analyze the correlation and agreement between the different methods, the mean value of the three measured ROIs within the liver was used for each patient. Linear regression analysis was performed by fitting a linear model to the data, and Bland-Altman plots were calculated to visualize the agreement between the respective methods. In addition to Bland-Altman plots, Lin's concordance correlation coefficient [23] was calculated to assess the degree of agreement between methods using the epiR package for R [24]. Concordance correlation coefficients were rated as follows: < 0.9: poor agreement; 0.9-0.95: moderate agreement; 0.95-0.99: substantial agreement; > 0.99: almost perfect agreement.
To determine the agreement of iron classification based on different published calibration data, contingency tables between pairs of these calibrations were generated and Cohen's kappa coefficient with equal weights was calculated using the rel package for R [25].

Results
Forty-six out of 415 patients showed a F/W swap with qDixon and were therefore excluded. A total of 369 patients (283 males, 86 females, mean age 52 years, range 18-82 years) were prospectively included in our study. In 203/369 patients, the optional qDixon-WIP sequence was also acquired. No F/ W swap was encountered with the qDixon-WIP in any of the 203 patients. A drawback of the qDixon sequence is that it seems to be limited to a maximum R2* value of around 400 1/s. For the qDixon-WIP sequence, no such limitation was observed.
Results of the LIC-based analysis for qDixon are provided in Tables 2, 3, 4, and 5. Based on a simple yes/no decision ( Table 2) as well as EASL classification (Table 3) concerning pathologic LIC, we found strong  Table 4; overall agreement 83-100%, Cohen's kappa: 0.83-1). Only for the EASL classification, the overall agreement between the calibration of  Garbowski and Hankins was < 90%, while in all the other cases, an agreement of > 90% was found. In particular, regarding EASL classification maximum disagreement was always at most one severity class. For direct LIC quantification, the concordance correlation coefficient (

Discussion
In this study, the qDixon sequence has been proven as a reliable approach for the calculation of hepatic iron in daily clinical routine. In general, our results showed an excellent agreement between qDixon and our reference sequence. This excellent agreement thereby cannot be automatically assumed, as the used methods differ in several technical aspects like 2D versus 3D acquisition mode, number of acquired echoes, significantly different echo times (especially initial TE), and the used post-processing algorithms (inline Table 3 LIC analysis with overall agreement based on the EASL classification [21] Overall agreement Agreement in detail (no. of patients) 3 Used formula: Fe (μmol/g) = 0.573 * R2* − 2.507 [20] 4 Used formula: Fe (μmol/g) = 0.502 * R2* − 8.145 [19] Dixon water/fat separation with multifat peak modeling vs. offline truncated exponential fit). Further, we showed that the improved version qDixon-WIP delivered far more robust results than the original sequence: we encountered no F/W swap with qDixon-WIP, and our results were not limited to a maximum R2* value. The R2* values of qDixon-WIP also had an excellent agreement with values from our reference sequence (r = 0.95).
In contrast to the qDixon-WIP, the current version of qDixon does not deliver maps in LIC units; the operator is still required to use a formula from the literature to convert R2* to LIC [9,20], which is frequently required by the referring clinician. In addition to cross-calibration with our reference sequence, we compared different calibration equations from literature to obtain LIC values based on the qDixon sequence. Thereby, we found the highest agreement between the calibrations by Wood et al and Henninger et al. Based on a simple pathologic iron yes/no decision, only the overall agreement between the calibration of Hankins and our cross-calibration for qDixon was < 90%. For all other calibration equations, agreement was always > 90%. The agreement for EASL severity classes was < 90% only between the calibration of Hankins et al and Garbowski et al and the calibration of Hankins and our cross-calibration. It was > 90% for all other cases. In case that no cross-calibration is available, our LICbased results cannot give a direct recommendation for the ideal calibration equation, but show that agreement among the different equations is very high and the differences in the various LIC results are small. This was also shown in the fact that using the EASL classification, only changes of at most one severity grade were found. Therefore, any of the calibration curves applied in this work can reliably be used for LIC quantification with the qDixon sequence, but we should keep in mind that changing the equation in the follow-up process during therapy can lead to wrong decisions in clinical management.
The study by Serai et al evaluated a 3D multiecho Dixonbased imaging sequence (mDixon) in a pediatric and young 2 Used formula: Fe (μmol/g) = 0.455 * R2* − 3.617 [6] 3 Used formula: Fe (μmol/g) = 0.573 * R2* − 2.507 [20] 4 Used formula: Fe (μmol/g) = 0.502 * R2* − 8.145 [19] 5 Used formula: Fe (μmol/g) = 0.434 * R2* + 6.135 (see "Results") adult population [27]. They compared a commercially available mDixon sequence with a conventional GRE-based relaxometry. In agreement with our study, they found no statistically significant difference in T2* values between the two sequences. The main differences to our study are the patient population and size and the different sequence parameters. Further, in contrast to our study, the used reference sequence was not calibrated by liver biopsy and no correlation analysis concerning the LIC and the use of different calibration curves was applied. Jhaveri et al compared a R2* sequence, similar to our qDixon-WIP, with the R2 FerriScan method [12]. They observed that both provide equivalent quantification of the LIC within the limits of random uncertainty and concluded that iron heterogeneity is the primary source of the uncertainty. One limitation of this study was that ROIs could not be coregistered between the two techniques, which lead to uncertainties. In our study, we used a different reference sequence and manually co-registered ROIs between the different sequences. We observed an excellent agreement among all three sequences.
Surprisingly, we also found an excellent agreement between qDixon and qDixon-WIP, although the initial TE of both sequences differs markedly with a long TE of 2.38 ms for qDixon and a short TE of 1.04 ms for qDixon-WIP. This may be an indication for the appropriateness of the combined signal model containing both PDFF and R2*, which should minimize the impact of acquisition settings on the results. The longer TEs in qDixon are likely the cause for the observed upper R2* limit of approximately 400 1/s. Further, both the qDixon-WIP and our reference sequence R2*-GRE-GRE have an almost identical initial TE which could be the reason for the slightly better correlation between these two sequences.
One limitation of our study is the reference sequence employed. Its implementation, using fat saturation and a particular fitting procedure, is only one of many options, but this is also the case for most other R2* relaxometry methods that were correlated with histopathology. In this context, it has to be pointed out that the used reference method was calibrated by means of biopsy in an earlier study [9] and is now already used at our hospital successfully for years in daily clinical routine. Confidence in the method has reached such a level that our clinical partners usually do not perform liver biopsies anymore. In this respect, biopsy of the liver with histopathology is no longer considered justifiable due to the known drawbacks [1,[28][29][30]. Another limitation is that we only had the possibility to evaluate one vendor solution, which may raise the question of vendor bias. Since only MR scanners from a single vendor are used in our hospital, a multi-center study would be necessary to compare the different vendor solutions including "IDEAL-IQ" from General Electric, "StarQuant" (or mDixon-Quant) from Philips and "LiverLab" (or qDixon) from Siemens Healthcare. As this was far beyond the scope of this study, inter-scanner reproducibility was not investigated. Further, we did not focus on the evaluation of fat, which is also possible with qDixon and the original focus of this sequence.
Conclusion qDixon with 1.5 T is a reliable and exact method to quantify hepatic iron. Improvements of the implementation promise to eliminate its minor drawbacks of occasional F/W swaps, its limitation to R2* values of about 400 1/s, and missing inline LIC calculation.
Funding Open access funding provided by University of Innsbruck and Medical University of Innsbruck.

Compliance with ethical standards
Guarantor The scientific guarantor of this publication is Dr. Michaela Plaikner.
Conflict of interest One of the authors is an employee of Siemens Healthcare; he developed the prototype sequence and had no control of any data and was not involved in the execution of the study.
All others declare no relationships with any companies whose products or services may be related to the subject matter of the article.
Statistics and biometry One of the authors has significant statistical expertise.
Informed consent Written informed consent was obtained from all subjects (patients) in this study.
Ethical approval Institutional Review Board approval was obtained.

Methodology
• Prospective • Cross-sectional study • Performed at one institution Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.