Comparative diagnostic performance of ultrasound shear wave elastography and magnetic resonance elastography for classifying fibrosis stage in adults with biopsy-proven nonalcoholic fatty liver disease

Objectives To compare the diagnostic accuracy of US shear wave elastography (SWE) and magnetic resonance elastography (MRE) for classifying fibrosis stage in patients with nonalcoholic fatty liver disease (NAFLD). Methods Patients from a prospective single-center cohort with clinical liver biopsy for known or suspected NAFLD underwent contemporaneous SWE and MRE. AUCs for classifying biopsy-determined liver fibrosis stages ≥ 1, ≥ 2, ≥ 3, and = 4, and their respective performance parameters at cutoffs providing ≥ 90% sensitivity or specificity were compared between SWE and MRE. Results In total, 100 patients (mean age, 51.8 ± 12.9 years; 46% males; mean BMI 31.6 ± 4.7 kg/m2) with fibrosis stage distribution (stage 0/1/2/3/4) of 43, 36, 5, 10, and 6%, respectively, were included. AUCs (and 95% CIs) for SWE and MRE were 0.65 (0.54–0.76) and 0.81 (0.72–0.89), 0.81 (0.71–0.91) and 0.94 (0.89–1.00), 0.85 (0.74–0.96) and 0.95 (0.89–1.00), and 0.91 (0.79–1.00) and 0.92 (0.83–1.00), for detecting fibrosis stage ≥ 1, ≥ 2, ≥ 3, and = 4, respectively. The differences were significant for detecting fibrosis stage ≥ 1 and ≥ 2 (p < 0.01) but not otherwise. At ≥ 90% sensitivity cutoff, MRE yielded higher specificity than SWE at diagnosing fibrosis stage ≥ 1, ≥ 2, and ≥ 3. At ≥ 90% specificity cutoff, MRE yielded higher sensitivity than SWE at diagnosing fibrosis stage ≥ 1 and ≥ 2. Conclusions In adults with NAFLD, MRE was more accurate than SWE in diagnosing stage ≥ 1 and ≥ 2 fibrosis, but not stage ≥ 3 or 4 fibrosis. Key Points • For detecting any fibrosis or mild fibrosis, MR elastography was significantly more accurate than shear wave elastography. • For detecting advanced fibrosis and cirrhosis, MRE and SWE did not differ significantly in accuracy. • For excluding advanced fibrosis and potentially ruling out the need for biopsy, SWE and MRE did not differ significantly in negative predictive value. • Neither SWE nor MRE had sufficiently high positive predictive value to rule in advanced fibrosis. Supplementary Information The online version contains supplementary material available at 10.1007/s00330-021-08369-9.


Introduction
With an estimated global prevalence of 25%, nonalcoholic fatty liver disease (NAFLD) is the most common chronic liver disease worldwide [1]. NAFLD comprises both nonalcoholic fatty liver and nonalcoholic steatohepatitis (NASH), the latter of which is characterized by hepatocellular injury, inflammation, and higher potential for developing fibrosis. The severity of liver fibrosis in NASH strongly predicts long-term outcomes including liver transplantation and overall mortality [2]. Left untreated, liver fibrosis may progress to cirrhosis, conferring increased risk of hepatocellular carcinoma and liver-related mortality. Early therapeutic intervention in patients with NASH-related fibrosis may stabilize or even reverse fibrosis [3,4]. Accurate diagnosis and staging of liver fibrosis enable risk stratification, monitoring for progression, and targeting interventions in these patients.
Histology is the current clinical standard for assessing fibrosis stage but liver biopsy is invasive, costly, and associated with non-negligible complication risk [5]. These drawbacks make histology impractical for screening patients with NAFLD, and noninvasive methods for assessing liver fibrosis are needed. To address this need, several elastography techniques have been developed for detecting and staging fibrosis noninvasively. Two-dimensional shear wave elastography (SWE, also known as sonoelastography), an advanced ultrasound-based technique, has comparable to superior accuracy for diagnosing fibrosis in NAFLD patients compared to older ultrasound-based techniques such as transient elastography (TE) and point shear wave elastography (pSWE) [6,7]. Magnetic resonance elastography (MRE), an MR-based technique, has shown excellent performance for diagnosing and staging fibrosis in NAFLD patients [8,9]. While comparative evidence is accumulating, the optimal selection of SWE versus MRE remains unclear in the context of NAFLD. NAFLD may pose specific challenges to noninvasive techniques due to its association with steatosis (which alters sonographic echoes and MR signals) and obesity (which increases the abdominal wall thickness leading to potentially less reliable results). Two recent studies comparing the diagnostic performance of MRE and SWE showed either significant difference at staging cirrhosis only or no difference between the two methods in cohorts where the majority had at least significant fibrosis (stage ≥ 2) [10,11]. These studies provide important comparative data on the performance of these methods in assessing fibrosis in patient populations with relatively advanced fibrosis stage distributions. However, studies are lacking that compare the diagnostic accuracy of MRE and SWE at staging the full spectrum of liver fibrosis severity in patients with NAFLD, particularly those who might have earlier stages of fibrosis, using histopathology as the reference standard.
The purpose of this study is to compare the diagnostic performance of SWE versus MRE for staging fibrosis in a well-characterized cohort of American adults with suspected or known NAFLD using histopathology as the reference standard. Secondarily, we explored the impact of obesity and hepatic steatosis on performance.

Study design
This is a cross-sectional study of a prospectively recruited cohort of patients with known or suspected NAFLD who underwent liver biopsy for clinical care and contemporaneous SWE and MRE for research within 180 days of liver biopsy between July 2016 and June 2019. Confoundercorrected chemical-shift-encoded (CSE)-MRI was performed as part of the MRE exam in order to estimate proton-density fat fraction (PDFF), which was used to stratify the cohort in the exploratory analyses. This study was approved by the Institutional Review Board and is compliant with the Health Insurance Portability and Accountability Act. Study participants were recruited at the University of California, San Diego (UCSD), NAFLD Research Center. The screening process consisted of a standardized clinical evaluation which included a detailed physical examination, biochemical profiling, and an alcohol history assessment performed using the Alcohol Use Disorders Identification Test and Skinner Lifetime Drinking questionnaires. Eligible participants provided written informed consent to undergo SWE and MRE. Participants were instructed to fast for at least 8 h prior to SWE and MRE exams.

Histologic analysis
For this research, a single experienced hepatopathologist (M.A.V., > 10 years of experience) reviewed the clinically obtained biopsy specimen and scored the histologic features using the NASH Clinical Research Network histologic scoring system [12]. Fibrosis was scored from 0 to 4, steatosis from 0 to 3, lobular inflammation from 0 to 3, and hepatocellular ballooning from 0 to 2.
Details of eligibility criteria, clinical assessments, liver biopsy protocol, and histology interpretation are available as Supplemental Methods.

SWE exam
SWE exams were performed on a clinical ultrasound system (GE Logiq E9, GE Healthcare) provided by GE for this research through an equipment loan agreement. The ultrasound system was equipped with the transducer and software required for SWE.
One of four certified diagnostic medical sonographers (each with > 10 years of clinical experience in abdominal US exams and at least 1 year of research experience in SWE) performed SWE using a convex transducer (C1-6). Sonographers were scheduled for each exam based on availability.
For SWE, participants were imaged in the dorsal decubitus position with the right arm fully abducted to facilitate a right intercostal approach. The transducer was oriented perpendicular to the liver capsule to optimize the acoustic window. Then, SWE was activated and, once a real-time colorized stiffness map of the right liver parenchyma had stabilized during an 8-10-s breath hold at shallow expiration, the sonographer recorded the stiffness map with a button press. The sonographer then placed a circular ROI at least 1 cm below the liver capsule but no more than 8 cm from the skin surface that overlaid as much of the homogeneous color map as possible while avoiding large blood vessels, portal tracts, and rib shadowing. The mean and standard deviation of shear wave speed values within the ROI were recorded.
The above steps were repeated until 10 sequential shear wave speed (SWS) measurements were acquired per participant (out of a maximum of 20 attempts), as recommended by the manufacturer. A study was considered adequate if the IQR for the 10 measurements was less than 30% of the median (IQR/median < 0.30) [13,14]. The entire SWE exam lasted about 10 min.

MR exam: MRE and chemical-shift-encoded MRI
MR exams were performed using a 3-T research scanner with a 60-cm bore (GE Discovery MR750; GE Healthcare) and a 32-channel torso radiofrequency coil array. The scanner was fitted with MRE hardware and software licensed for research (Resoundant) [15,16]. The entire MR exam including participant positioning, MRE driver placement, and MRE and CSE-MRI acquisition lasted about 20 min.

MRE sequence and analysis
An active acoustic driver set to the standard frequency of 60-Hz delivered vibrations via a passive pneumatic driver that was centered over the liver and secured snugly to the abdominal wall by an elastic band. A two-dimensional (2D) gradient-recalled-echo (GRE) MRE sequence modified with bipolar motion encoding gradients synchronized to the applied vibration imaged the shear wave displacement. Four 10-mm contiguous axial slices were acquired through the widest transverse section of the liver, each with a 16-s breath hold performed at relaxed end-expiration. Acquisition parameters are listed in Supplemental Methods. Using MRE reconstruction software, the MR scanner automatically processed the wave images into cross-sectional 2D shearstiffness maps; unreliable pixels (goodness-of-fit R 2 < 95%) were cross-hatched to exclude them from analysis [17].
One of two trained image analysts (each with > 1 year of experience in MRE analysis) downloaded the raw and processed MRE data for offline analysis. Using MRE analysis software ("MRE Quant", Resoundant), the analyst manually drew free-form ROIs on portions of the right hepatic lobe on the wave images while avoiding the liver edge (outer 1 cm), major vessels, and areas of nonplanar or low amplitude wave propagation [4,18,19]. The ROIs were drawn on all four slices and colocalized to the shear-stiffness maps. The mean of liver stiffness in the ROIs (shear stiffness, in kilopascals) and cumulative ROI size over four slices (in pixels) were automatically reported by the software. An MRE exam was considered adequate if the total number of pixels over four slices acquired in a participant was greater than or equal to 700 pixels [20].

Chemical-shift-encoded MRI acquisition and analysis
A 2D multi-echo spoiled gradient-recalled-echo sequence with magnitude reconstruction was performed through the entire liver. Using a previously described custom algorithm, the MR scanner automatically processed the source images into cross-sectional PDFF maps [21][22][23][24], which were analyzed offline to calculate mean liver PDFF values. Acquisition and analysis details are described in Supplemental Methods.

Blinding
The pathologist was blinded to imaging data. Sonographers were blinded to clinical, histological, and MRI data. MR analysts were blinded to clinical, histological, and ultrasound data. Sample size was based on feasibility. The target enrollment was set to ≥ 100 participants who complied with the study protocol and completed SWE and MRE.

Diagnostic performance
Analyses of diagnostic performance were performed in participants in whom both SWE and MRE were adequate, as defined earlier. Spearman's correlation was used to evaluate the relationship between SWE, MRE, and fibrosis stages.
ROCs and AUCs with DeLong 95% confidence intervals (CIs) were computed for SWE and MRE for classifying each dichotomized fibrosis stage. AUCs were compared using the DeLong test for dependent ROCs. The shear wave speed cutoffs (SWE) or stiffness cutoffs (MRE) providing at least 90% sensitivity or at least 90% specificity for each dichotomization were identified. Performance parameters at those cutoffs were compared using McNemar's test for paired proportions. The Bonferroni correction was applied to each grouped comparison of AUC, sensitivity at 90% specificity, and specificity at 90% sensitivity. A p value less than 0.05 (or individual p value < 0.05/3 = 0.017 after the Bonferroni correction) was considered statistically significant. We chose a priori not to formally compare additional performance metrics (PPV, NPV, total accuracy) to reduce the number of comparisons.

Exploratory analyses
To evaluate the impact of obesity and steatosis on both techniques, the above analyses were repeated separately in obese (BMI ≥ 30) and nonobese (BMI < 30) participants and in those with none-to-mild and moderate-to-severe steatosis as determined noninvasively by published PDFF cutoffs (none-to-mild: PDFF < 17.43%; moderate-to-severe: PDFF ≥ 17.43%) [21].

Diagnostic performance
Mean shear wave speed and stiffness values are shown in Fig. 2. Mean shear wave speed and stiffness values increased monotonically with fibrosis stages (Spearman's correlation coefficient for shear wave speed values and fibrosis stages is 0.392 (p < 0.01), and for stiffness values and fibrosis stages is 0.654 (p < 0.01). Representative SWE and MRE images are shown in Fig. 3. The mean (± SD) area of the captured SWE ROIs was 1.19 (± 0.39) cm 2 , and the mean (± SD) cumulative ROI size over 4 slices of MRE for each participant was 3350 (± 1498) pixels (469 cm 2 ± 210 cm 2 ). The AUC point estimates of MRE were nominally higher than those of SWE for stage ≥ 3 and stage = 4 fibrosis, but the differences were not significant ( Table 2).
Cutoffs and performance parameters for the classification of dichotomized fibrosis stages given predefined sensitivity or specificity ≥ 90% are summarized in Tables 3 and 4.
At sensitivity of at least 90%, the SWE cutoffs were 1.27, 1.49, 1.46, and 1.59 m/s for stage ≥ 1 fibrosis, stage ≥ 2 fibrosis, stage ≥ 3 fibrosis, and stage 4 fibrosis, respectively; the MRE cutoffs were 2.01, 2.77, 2.77, and 2.77 kPa, respectively. MRE had higher specificity than SWE for all stages of fibrosis, and the difference was significant for stage ≥ 1, ≥ 2, and ≥ 3 (p < 0.001). The point estimate for PPV was higher for MRE than for SWE for all stages of fibrosis among this particular cohort, though formal statistical comparisons were not performed.
At specificity of at least 90%, the SWE cutoffs were 1.75, 1.79, 1.78, and 1.81 m/s for stage ≥ 1 fibrosis, stage ≥ 2 fibrosis, stage ≥ 3 fibrosis, and stage 4 fibrosis, respectively; the MRE cutoffs were 2.60, 3.06, 3.17, and 3.42 kPa, respectively. MRE had higher sensitivity than SWE for all stages of fibrosis, and the difference was significant for fibrosis stages ≥ 1 and ≥ 2 (p ≤ 0.01). The point estimate for NPV was higher for MRE than for SWE for all stages of fibrosis except cirrhosis (stage 4) among this particular cohort, though formal statistical comparisons were not performed.

Discussion
Noninvasive imaging methods for estimating fibrosis in NAFLD patients have been suggested both for initial detection and staging and for longitudinal monitoring, a scenario in which invasive tests like biopsy are not feasible. Patients with NAFLD pose several challenges (e.g., obesity, steatosis) that may impact imaging study performance. Hence, the optimal test or combination of tests has yet to be defined.
Our study aimed to compare MRE and SWE against histological reference standard in a NAFLD population. While MRE was significantly more accurate than SWE for diagnosing lower stages of fibrosis (stage ≥ 1 and ≥ 2), the two techniques did not differ significantly at higher stages of fibrosis (stage ≥ 3 and = 4). In exploratory analyses, MRE also showed a trend towards better performance than SWE in all participant subgroups regardless of the presence of obesity or the severity of steatosis, though the differences between subgroups were sometimes significant only in the lower fibrosis stages. Our study is the first to detect a significant difference in performance between SWE and MRE at diagnosing lower stages of fibrosis. A previous study by Furlan et al.on American adults with NAFLD examined the diagnostic performance of SWE and MRE at detecting significant fibrosis (stage ≥ 2) and advanced fibrosis (stage ≥ 3) and did not find a statistically significant difference, while a recent study by Imajo et al. on Japanese adults with NAFLD examined the diagnostic performance of SWE and MRE at detecting the full spectrum of fibrosis and found that MRE offered superior performance at staging cirrhosis only [10,11] Conversely, the small number of participants in our study with cirrhosis (6 out of 100) likely reduced our power to detect differences in performance for diagnosing cirrhosis, and may explain why our results do not replicate the finding by Imajo et al. that MRE is superior to SWE for diagnosing cirrhosis. Despite the small number of participants with cirrhosis, our overall cohort was comparatively large, which allowed for exploratory analysis of NAFLD patients stratified by fibrosis severity and obesity, two potential confounders for noninvasive techniques.
Compared to published studies on the diagnostic performance of SWE for fibrosis staging in NAFLD patients, we found lower diagnostic accuracy as assessed with AUCs [7,10,[25][26][27]. Differences in stage distribution may account in part for the discrepancy. A majority of participants (58.3% to 78.4%) in published studies had stage ≥ 2 fibrosis compared to a minority (21%) in our cohort. The higher proportion of patients with more severe fibrosis in published studies is expected to increase the observed AUC, since greater separation between shear wave speed or shear stiffness values are observed at higher fibrosis stages [28]. Compared to study cohorts that skew towards the more severe spectrum of liver fibrosis, our results may be most applicable to the outpatient NAFLD hepatology clinic from which we enrolled our participants.
The diagnostic performance of MRE across all dichotomized fibrosis stages in our study was consistent with prior studies on NAFLD patients and overweight-to-obese patients, which included patients with similar fibrosis stage distribution [8,9,29]. We intentionally reported two sets of cutoff values for MRE and SWE-one set that would yield at least 90% sensitivity and one set that would yield at least 90% specificityfor each dichotomized fibrosis stage instead of the Youden index. While the Youden index maximizes the combination of sensitivity and specificity for a particular test, it is not as helpful in informing clinical application and interpretation. For instance, recent guidelines from the American Association for the Study of Liver Diseases (AASLD) suggest the use of noninvasive tests to detect patients with high likelihood of advanced stage fibrosis-i.e., those patients who may have the greatest benefit-to-risk ratio for biopsy [30]. This context of use requires high sensitivity and NPV to rule out fibrosis in order to appropriately direct biopsy to those at high risk. For this purpose, SWE and MRE did not differ in performance: SWE can accurately exclude stage ≥ 3 fibrosis with sensitivity of 94-100% and NPV of 97-100% while MRE can do so with sensitivity of 94-100% and NPV of 99-100%.
As opposed to ruling out disease, ruling in disease requires high specificity and PPV. Although we identified high-specificity (≥ 90%) cutoffs, our cohort was assembled from an outpatient setting, where the pre-test probability of advanced fibrosis tends to be low. In this situation, despite applying high-specificity cutoffs, the PPVs for ruling in advanced fibrosis (62% PPV for MRE, 56% PPV for SWE) are not sufficient to avoid biopsy altogether. Our results are consistent with those reported by Loomba et al., where an MRE stiffness cutoff of 3.63 kPa yielded a specific result (91%) and a high NPV of 97% for excluding stage ≥ 3 fibrosis in NAFLD patients, but a PPV of only 68% for ruling it in [31]. Furlan et al.reported similar MRE stiffness cutoff of 3.4 kPa for excluding stage ≥ 3 fibrosis with a specificity of 91.7%, but a lower NPV of 91.7% and a much higher PPV of 87% compared to our results [10]. The higher prevalence of stage ≥ 3 fibrosis in Furlan et al. compared to this study (39% versus 16%) contributed at least in part to the differences in reported NPV and PPV. Thus, if confirmation of advanced fibrosis is desired, then further evaluation possibly including a liver biopsy may be needed. Combining noninvasive tests with clinical decision support tools such as the NAFLD fibrosis score or the FIB-4 test might also improve the PPV [32,33].
Our study has several limitations. First, the small sample sizes of obese and nonobese subsets as well as the nonuniform stratification of steatosis severity by PDFF cutoff values limited our assessment of obesity and steatosis and their confounding effects on SWE and MRE. Future studies are needed to verify our preliminary finding from the exploratory analyses that MRE is superior to SWE regardless of body habitus and steatosis severity. Also, the distribution of liver fibrosis in our cohort is skewed towards the milder end of the spectrum. Although this may increase the applicability of our results to common clinical contexts such as fibrosis screening, the relatively low number of participants with fibrosis stage ≥ 2 compared to fibrosis stage 0-1 limits discrimination between adjacent advanced fibrosis stages. For instance, for a predefined sensitivity ≥ 90%, MRE cutoff is the same for fibrosis stages 2, 3, and 4 (2.77 kPa) while SWE cutoff for fibrosis stage 3 is lower than for fibrosis stage 2. For the purposes of comparing SWE and MRE, fibrosis distribution affects both techniques equally and would not introduce a bias in favor of one method. Second, this study was conducted using US and MRI systems from a single manufacturer at a single subspecialty center focused on NAFLD research, which may limit the generalizability of its results to other settings such as community centers or sites with systems from other vendors. Third, as technology advances rapidly, it is possible that newer technologies would have provided more accurate performance. For SWE, this might include the use of software that provides realtime feedback on the quality of shear wave propagation and the use of time-harmonic elastography techniques in obese  In conclusion, this prospective study provided direct comparison of SWE versus MRE for staging fibrosis in a cohort of participants with known or suspected NAFLD and clinically indicated liver biopsy. We showed that in patients in whom both methods are adequate, MRE had significantly higher accuracy than SWE for diagnosing earlier (≥ 1 and ≥ 2) fibrosis stages. For purposes of directing biopsy to detect advanced fibrosis, SWE and MRE performed equally well, both demonstrating high NPV for excluding disease. Future studies that aim to evaluate the relative reproducibility of these modalities for longitudinal monitoring and the cost-effectiveness of various diagnostic approaches using combinations of SWE, MRE, biopsy, and clinical decision support will further inform optimal usage of both methods for clinical care and clinical trials.
Statistics and biometry Tanya Wolfson, MA, kindly provided statistical advice for this manuscript and is one of the authors.
Informed consent Written informed consent was obtained from all patients in this study.
Ethical approval Institutional Review Board approval was obtained. Table 4 Diagnostic performance of SWE and MRE at classifying dichotomized fibrosis stages for predefined specificity ≥ 90% MRE magnetic resonance elastography, SWE shear wave elastography, PPV positive predictive value, NPV negative predictive value; kPa kilopascals, unit for shear stiffness as measured by MRE; m/s meters per second, unit for shear wave speed as measured by SWE * Sensitivity of MRE is significantly higher than that of SWE based on two-tailed McNemar's test, p ≤ 0.01. Using Bonferroni correction, individual p value < 0.05/3 (for grouped AUC, sensitivity, and specificity) is considered significant Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.