Introduction

With an estimated global prevalence of 25%, nonalcoholic fatty liver disease (NAFLD) is the most common chronic liver disease worldwide [1]. NAFLD comprises both nonalcoholic fatty liver and nonalcoholic steatohepatitis (NASH), the latter of which is characterized by hepatocellular injury, inflammation, and higher potential for developing fibrosis. The severity of liver fibrosis in NASH strongly predicts long-term outcomes including liver transplantation and overall mortality [2]. Left untreated, liver fibrosis may progress to cirrhosis, conferring increased risk of hepatocellular carcinoma and liver-related mortality. Early therapeutic intervention in patients with NASH-related fibrosis may stabilize or even reverse fibrosis [3, 4]. Accurate diagnosis and staging of liver fibrosis enable risk stratification, monitoring for progression, and targeting interventions in these patients.

Histology is the current clinical standard for assessing fibrosis stage but liver biopsy is invasive, costly, and associated with non-negligible complication risk [5]. These drawbacks make histology impractical for screening patients with NAFLD, and noninvasive methods for assessing liver fibrosis are needed. To address this need, several elastography techniques have been developed for detecting and staging fibrosis noninvasively. Two-dimensional shear wave elastography (SWE, also known as sonoelastography), an advanced ultrasound-based technique, has comparable to superior accuracy for diagnosing fibrosis in NAFLD patients compared to older ultrasound-based techniques such as transient elastography (TE) and point shear wave elastography (pSWE) [6, 7]. Magnetic resonance elastography (MRE), an MR-based technique, has shown excellent performance for diagnosing and staging fibrosis in NAFLD patients [8, 9]. While comparative evidence is accumulating, the optimal selection of SWE versus MRE remains unclear in the context of NAFLD. NAFLD may pose specific challenges to noninvasive techniques due to its association with steatosis (which alters sonographic echoes and MR signals) and obesity (which increases the abdominal wall thickness leading to potentially less reliable results). Two recent studies comparing the diagnostic performance of MRE and SWE showed either significant difference at staging cirrhosis only or no difference between the two methods in cohorts where the majority had at least significant fibrosis (stage ≥ 2) [10, 11]. These studies provide important comparative data on the performance of these methods in assessing fibrosis in patient populations with relatively advanced fibrosis stage distributions. However, studies are lacking that compare the diagnostic accuracy of MRE and SWE at staging the full spectrum of liver fibrosis severity in patients with NAFLD, particularly those who might have earlier stages of fibrosis, using histopathology as the reference standard.

The purpose of this study is to compare the diagnostic performance of SWE versus MRE for staging fibrosis in a well-characterized cohort of American adults with suspected or known NAFLD using histopathology as the reference standard. Secondarily, we explored the impact of obesity and hepatic steatosis on performance.

Materials and methods

Study design

This is a cross-sectional study of a prospectively recruited cohort of patients with known or suspected NAFLD who underwent liver biopsy for clinical care and contemporaneous SWE and MRE for research within 180 days of liver biopsy between July 2016 and June 2019. Confounder-corrected chemical-shift-encoded (CSE)-MRI was performed as part of the MRE exam in order to estimate proton-density fat fraction (PDFF), which was used to stratify the cohort in the exploratory analyses. This study was approved by the Institutional Review Board and is compliant with the Health Insurance Portability and Accountability Act.

Study participants were recruited at the University of California, San Diego (UCSD), NAFLD Research Center. The screening process consisted of a standardized clinical evaluation which included a detailed physical examination, biochemical profiling, and an alcohol history assessment performed using the Alcohol Use Disorders Identification Test and Skinner Lifetime Drinking questionnaires. Eligible participants provided written informed consent to undergo SWE and MRE. Participants were instructed to fast for at least 8 h prior to SWE and MRE exams.

Histologic analysis

For this research, a single experienced hepatopathologist (M.A.V., > 10 years of experience) reviewed the clinically obtained biopsy specimen and scored the histologic features using the NASH Clinical Research Network histologic scoring system [12]. Fibrosis was scored from 0 to 4, steatosis from 0 to 3, lobular inflammation from 0 to 3, and hepatocellular ballooning from 0 to 2.

Details of eligibility criteria, clinical assessments, liver biopsy protocol, and histology interpretation are available as Supplemental Methods.

SWE exam

SWE exams were performed on a clinical ultrasound system (GE Logiq E9, GE Healthcare) provided by GE for this research through an equipment loan agreement. The ultrasound system was equipped with the transducer and software required for SWE.

One of four certified diagnostic medical sonographers (each with > 10 years of clinical experience in abdominal US exams and at least 1 year of research experience in SWE) performed SWE using a convex transducer (C1-6). Sonographers were scheduled for each exam based on availability.

For SWE, participants were imaged in the dorsal decubitus position with the right arm fully abducted to facilitate a right intercostal approach. The transducer was oriented perpendicular to the liver capsule to optimize the acoustic window. Then, SWE was activated and, once a real-time colorized stiffness map of the right liver parenchyma had stabilized during an 8–10-s breath hold at shallow expiration, the sonographer recorded the stiffness map with a button press. The sonographer then placed a circular ROI at least 1 cm below the liver capsule but no more than 8 cm from the skin surface that overlaid as much of the homogeneous color map as possible while avoiding large blood vessels, portal tracts, and rib shadowing. The mean and standard deviation of shear wave speed values within the ROI were recorded.

The above steps were repeated until 10 sequential shear wave speed (SWS) measurements were acquired per participant (out of a maximum of 20 attempts), as recommended by the manufacturer. A study was considered adequate if the IQR for the 10 measurements was less than 30% of the median (IQR/median < 0.30)[13, 14]. The entire SWE exam lasted about 10 min.

MR exam: MRE and chemical-shift-encoded MRI

MR exams were performed using a 3-T research scanner with a 60-cm bore (GE Discovery MR750; GE Healthcare) and a 32-channel torso radiofrequency coil array. The scanner was fitted with MRE hardware and software licensed for research (Resoundant) [15, 16]. The entire MR exam including participant positioning, MRE driver placement, and MRE and CSE-MRI acquisition lasted about 20 min.

MRE sequence and analysis

An active acoustic driver set to the standard frequency of 60-Hz delivered vibrations via a passive pneumatic driver that was centered over the liver and secured snugly to the abdominal wall by an elastic band. A two-dimensional (2D) gradient-recalled-echo (GRE) MRE sequence modified with bipolar motion encoding gradients synchronized to the applied vibration imaged the shear wave displacement. Four 10-mm contiguous axial slices were acquired through the widest transverse section of the liver, each with a 16-s breath hold performed at relaxed end-expiration. Acquisition parameters are listed in Supplemental Methods. Using MRE reconstruction software, the MR scanner automatically processed the wave images into cross-sectional 2D shear-stiffness maps; unreliable pixels (goodness-of-fit R2 < 95%) were cross-hatched to exclude them from analysis [17].

One of two trained image analysts (each with > 1 year of experience in MRE analysis) downloaded the raw and processed MRE data for offline analysis. Using MRE analysis software (“MRE Quant”, Resoundant), the analyst manually drew free-form ROIs on portions of the right hepatic lobe on the wave images while avoiding the liver edge (outer 1 cm), major vessels, and areas of nonplanar or low amplitude wave propagation [4, 18, 19]. The ROIs were drawn on all four slices and colocalized to the shear-stiffness maps. The mean of liver stiffness in the ROIs (shear stiffness, in kilopascals) and cumulative ROI size over four slices (in pixels) were automatically reported by the software. An MRE exam was considered adequate if the total number of pixels over four slices acquired in a participant was greater than or equal to 700 pixels [20].

Chemical-shift-encoded MRI acquisition and analysis

A 2D multi-echo spoiled gradient-recalled-echo sequence with magnitude reconstruction was performed through the entire liver. Using a previously described custom algorithm, the MR scanner automatically processed the source images into cross-sectional PDFF maps [21,22,23,24], which were analyzed offline to calculate mean liver PDFF values. Acquisition and analysis details are described in Supplemental Methods.

Blinding

The pathologist was blinded to imaging data. Sonographers were blinded to clinical, histological, and MRI data. MR analysts were blinded to clinical, histological, and ultrasound data.

Statistical analyses

Statistical analysis was performed by a postdoctoral fellow (Y.N.Z., 2 years of experience) under the supervision of a biostatistician (T.W., with 25 years of experience) using “R” statistical computing software (R version 3.4.2 [2016]; R Foundation for Statistical Computing).

Sample size was based on feasibility. The target enrollment was set to ≥ 100 participants who complied with the study protocol and completed SWE and MRE.

Diagnostic performance

Analyses of diagnostic performance were performed in participants in whom both SWE and MRE were adequate, as defined earlier. Spearman’s correlation was used to evaluate the relationship between SWE, MRE, and fibrosis stages.

ROCs and AUCs with DeLong 95% confidence intervals (CIs) were computed for SWE and MRE for classifying each dichotomized fibrosis stage. AUCs were compared using the DeLong test for dependent ROCs. The shear wave speed cutoffs (SWE) or stiffness cutoffs (MRE) providing at least 90% sensitivity or at least 90% specificity for each dichotomization were identified. Performance parameters at those cutoffs were compared using McNemar’s test for paired proportions. The Bonferroni correction was applied to each grouped comparison of AUC, sensitivity at 90% specificity, and specificity at 90% sensitivity. A p value less than 0.05 (or individual p value < 0.05/3 = 0.017 after the Bonferroni correction) was considered statistically significant. We chose a priori not to formally compare additional performance metrics (PPV, NPV, total accuracy) to reduce the number of comparisons.

Exploratory analyses

To evaluate the impact of obesity and steatosis on both techniques, the above analyses were repeated separately in obese (BMI ≥ 30) and nonobese (BMI < 30) participants and in those with none-to-mild and moderate-to-severe steatosis as determined noninvasively by published PDFF cutoffs (none-to-mild: PDFF < 17.43%; moderate-to-severe: PDFF ≥ 17.43%) [21].

Results

Participants

Between July 2016 and June 2019, 118 patients with liver biopsies to evaluate known or suspected NAFLD met inclusion criteria, of whom 18 were excluded (Fig. 1), leaving 100 participants in the final analysis.

Fig. 1
figure 1

Flowchart of participant selection. NAFLD, nonalcoholic fatty liver disease; NASH, nonalcoholic steatohepatitis; SWE, shear wave elastography; MRE, magnetic resonance elastography

Table 1 summarizes the baseline demographic, biochemical, histological, and imaging data for these 100 participants. Sixty-four participants (64%) were obese (BMI ≥ 30.0). The median PDFF was 14.2%. Median time intervals were 27 days between SWE and biopsy, 28 days between MRE and biopsy, and 0 days between SWE and MRE. The average (± standard deviation [SD]) of biopsy size and number of portal triads were 21.8 (± 7.3) mm and 14.5 (± 3.7), respectively. In total, 43, 36, 5, 10, and 6 had fibrosis stages 0, 1, 2, 3, and 4, respectively.

Table 1 Demographic, biochemical, histological, and imaging characteristics of study participants

Diagnostic performance

Mean shear wave speed and stiffness values are shown in Fig. 2. Mean shear wave speed and stiffness values increased monotonically with fibrosis stages (Spearman’s correlation coefficient for shear wave speed values and fibrosis stages is 0.392 (p < 0.01), and for stiffness values and fibrosis stages is 0.654 (p < 0.01). Representative SWE and MRE images are shown in Fig. 3. The mean (± SD) area of the captured SWE ROIs was 1.19 (± 0.39) cm2, and the mean (± SD) cumulative ROI size over 4 slices of MRE for each participant was 3350 (± 1498) pixels (469 cm2 ± 210 cm2).

Fig. 2
figure 2

Distribution of shear wave speed measurements by shear wave elastography (a) and liver stiffness measurements by magnetic resonance elastography (b) stratified by biopsy-determined fibrosis stage (Nonalcoholic Steatohepatitis Clinical Research Network)

Fig. 3
figure 3

Transverse colorized MR elastograms (3-T GE 750 scanner using 2D GRE technique, top) and ultrasound-based SWE images with colorized elasticity in the ROIs (GE Logiq E9 with C1-6 transducer, bottom) demonstrate increasing shear stiffness estimates (kPa) or shear wave speed estimates (m/s) as histologically determined liver fibrosis stage (Nonalcoholic Steatohepatitis Clinical Research Network) increases in patients with nonalcoholic fatty liver disease. From left to right: stage 0 in a 46-year-old woman; stage 1 in a 56-year-old woman; stage 2 in a 44-year-old man; stage 3 in a 42-year-old woman; stage 4 in a 68-year-old woman. Magnitude of complex modulus in kPa, ROIs, and an automated confidence grid set to 95% are overlain on the MR elastograms. ROIs depicted in the MRE imaging examples for stages 0, 1, 2, 3, and 4 are 158 cm2, 63 cm2, 80 cm2, 90 cm2, and 126 cm2, respectively. Shear wave speed estimates are overlain on the SWE images. ROIs depicted in the SWE imaging examples for stages 0, 1, 2, 3, and 4 are 1.2 cm2, 0.7 cm2, 0.9 cm2, 1.1 cm2, and 0.9 cm2, respectively

The AUCs of MRE were significantly higher than those of SWE for diagnosing any fibrosis (0.81 [95% CI: 0.72–0.89] vs. 0.65 [95% CI: 0.54–0.76], p = 0.005) and stage ≥ 2 fibrosis (0.94 [95% CI: 0.89–1.00] vs. 0.81 [95% CI: 0.71–0.91], p = 0.009). The AUC point estimates of MRE were nominally higher than those of SWE for stage ≥ 3 and stage = 4 fibrosis, but the differences were not significant (Table 2).

Table 2 AUCs and AUC comparisons for SWE and MRE by dichotomized fibrosis stage

Cutoffs and performance parameters for the classification of dichotomized fibrosis stages given predefined sensitivity or specificity ≥ 90% are summarized in Tables 3 and 4.

Table 3 Diagnostic performance of SWE and MRE at classifying dichotomized fibrosis stages for predefined sensitivity ≥ 90%
Table 4 Diagnostic performance of SWE and MRE at classifying dichotomized fibrosis stages for predefined specificity ≥ 90%

At sensitivity of at least 90%, the SWE cutoffs were 1.27, 1.49, 1.46, and 1.59 m/s for stage ≥ 1 fibrosis, stage ≥ 2 fibrosis, stage ≥ 3 fibrosis, and stage 4 fibrosis, respectively; the MRE cutoffs were 2.01, 2.77, 2.77, and 2.77 kPa, respectively. MRE had higher specificity than SWE for all stages of fibrosis, and the difference was significant for stage ≥ 1, ≥ 2, and ≥ 3 (p < 0.001). The point estimate for PPV was higher for MRE than for SWE for all stages of fibrosis among this particular cohort, though formal statistical comparisons were not performed.

At specificity of at least 90%, the SWE cutoffs were 1.75, 1.79, 1.78, and 1.81 m/s for stage ≥ 1 fibrosis, stage ≥ 2 fibrosis, stage ≥ 3 fibrosis, and stage 4 fibrosis, respectively; the MRE cutoffs were 2.60, 3.06, 3.17, and 3.42 kPa, respectively. MRE had higher sensitivity than SWE for all stages of fibrosis, and the difference was significant for fibrosis stages ≥ 1 and ≥ 2 (p ≤ 0.01). The point estimate for NPV was higher for MRE than for SWE for all stages of fibrosis except cirrhosis (stage 4) among this particular cohort, though formal statistical comparisons were not performed.

Exploratory analyses

In stratified analysis of obese (n = 64) and nonobese (n = 36) participants, MRE was superior to SWE at diagnosing stage ≥ 1 fibrosis among obese participants (p = 0.008). In stratified analysis of participants with none-to-mild steatosis using PDFF cutoff < 17.43% (n = 68) and moderate-to-severe steatosis using PDFF cutoff ≥ 17.43% (n = 32), MRE was superior to SWE at diagnosing stage ≥ 2 fibrosis among participants with none-to-mild steatosis (p = 0.009), and at diagnosing stage ≥ 1 fibrosis among participants with moderate-to-severe steatosis (p = 0.024).

Discussion

Noninvasive imaging methods for estimating fibrosis in NAFLD patients have been suggested both for initial detection and staging and for longitudinal monitoring, a scenario in which invasive tests like biopsy are not feasible. Patients with NAFLD pose several challenges (e.g., obesity, steatosis) that may impact imaging study performance. Hence, the optimal test or combination of tests has yet to be defined. Our study aimed to compare MRE and SWE against histological reference standard in a NAFLD population. While MRE was significantly more accurate than SWE for diagnosing lower stages of fibrosis (stage ≥ 1 and ≥ 2), the two techniques did not differ significantly at higher stages of fibrosis (stage ≥ 3 and = 4). In exploratory analyses, MRE also showed a trend towards better performance than SWE in all participant subgroups regardless of the presence of obesity or the severity of steatosis, though the differences between subgroups were sometimes significant only in the lower fibrosis stages.

Our study is the first to detect a significant difference in performance between SWE and MRE at diagnosing lower stages of fibrosis. A previous study by Furlan et al.on American adults with NAFLD examined the diagnostic performance of SWE and MRE at detecting significant fibrosis (stage ≥ 2) and advanced fibrosis (stage ≥ 3) and did not find a statistically significant difference, while a recent study by Imajo et al. on Japanese adults with NAFLD examined the diagnostic performance of SWE and MRE at detecting the full spectrum of fibrosis and found that MRE offered superior performance at staging cirrhosis only [10, 11]. The small number of participants in both studies who had no liver fibrosis (1 in Furlan et al. and 9 in Imajo et al. had no liver fibrosis) may have limited their statistical power for comparing the diagnostic performance of SWE and MRE for detecting any fibrosis. Similarly, the relatively small number of participants in both studies who had mild disease (fibrosis stage < 2) (18 in Furlan et al. and 59 in Imajo et al., compared to 79 in our study) may have reduced the power to detect differences in performance at diagnosing earlier stages of fibrosis. At advanced fibrosis (stage ≥ 3), we—like Furlan et al.and Imajo et al.—found that there is no statistically significant difference in performance between SWE and MRE. Conversely, the small number of participants in our study with cirrhosis (6 out of 100) likely reduced our power to detect differences in performance for diagnosing cirrhosis, and may explain why our results do not replicate the finding by Imajo et al. that MRE is superior to SWE for diagnosing cirrhosis. Despite the small number of participants with cirrhosis, our overall cohort was comparatively large, which allowed for exploratory analysis of NAFLD patients stratified by fibrosis severity and obesity, two potential confounders for noninvasive techniques.

Compared to published studies on the diagnostic performance of SWE for fibrosis staging in NAFLD patients, we found lower diagnostic accuracy as assessed with AUCs [7, 10, 25,26,27]. Differences in stage distribution may account in part for the discrepancy. A majority of participants (58.3% to 78.4%) in published studies had stage ≥ 2 fibrosis compared to a minority (21%) in our cohort. The higher proportion of patients with more severe fibrosis in published studies is expected to increase the observed AUC, since greater separation between shear wave speed or shear stiffness values are observed at higher fibrosis stages [28]. Compared to study cohorts that skew towards the more severe spectrum of liver fibrosis, our results may be most applicable to the outpatient NAFLD hepatology clinic from which we enrolled our participants.

The diagnostic performance of MRE across all dichotomized fibrosis stages in our study was consistent with prior studies on NAFLD patients and overweight-to-obese patients, which included patients with similar fibrosis stage distribution [8, 9, 29]. We intentionally reported two sets of cutoff values for MRE and SWE—one set that would yield at least 90% sensitivity and one set that would yield at least 90% specificity—for each dichotomized fibrosis stage instead of the Youden index. While the Youden index maximizes the combination of sensitivity and specificity for a particular test, it is not as helpful in informing clinical application and interpretation. For instance, recent guidelines from the American Association for the Study of Liver Diseases (AASLD) suggest the use of noninvasive tests to detect patients with high likelihood of advanced stage fibrosis—i.e., those patients who may have the greatest benefit-to-risk ratio for biopsy [30]. This context of use requires high sensitivity and NPV to rule out fibrosis in order to appropriately direct biopsy to those at high risk. For this purpose, SWE and MRE did not differ in performance: SWE can accurately exclude stage ≥ 3 fibrosis with sensitivity of 94–100% and NPV of 97–100% while MRE can do so with sensitivity of 94–100% and NPV of 99–100%.

As opposed to ruling out disease, ruling in disease requires high specificity and PPV. Although we identified high-specificity (≥ 90%) cutoffs, our cohort was assembled from an outpatient setting, where the pre-test probability of advanced fibrosis tends to be low. In this situation, despite applying high-specificity cutoffs, the PPVs for ruling in advanced fibrosis (62% PPV for MRE, 56% PPV for SWE) are not sufficient to avoid biopsy altogether. Our results are consistent with those reported by Loomba et al., where an MRE stiffness cutoff of 3.63 kPa yielded a specific result (91%) and a high NPV of 97% for excluding stage ≥ 3 fibrosis in NAFLD patients, but a PPV of only 68% for ruling it in [31]. Furlan et al.reported similar MRE stiffness cutoff of 3.4 kPa for excluding stage ≥ 3 fibrosis with a specificity of 91.7%, but a lower NPV of 91.7% and a much higher PPV of 87% compared to our results [10]. The higher prevalence of stage ≥ 3 fibrosis in Furlan et al. compared to this study (39% versus 16%) contributed at least in part to the differences in reported NPV and PPV. Thus, if confirmation of advanced fibrosis is desired, then further evaluation possibly including a liver biopsy may be needed. Combining noninvasive tests with clinical decision support tools such as the NAFLD fibrosis score or the FIB-4 test might also improve the PPV [32, 33].

Our study has several limitations. First, the small sample sizes of obese and nonobese subsets as well as the nonuniform stratification of steatosis severity by PDFF cutoff values limited our assessment of obesity and steatosis and their confounding effects on SWE and MRE. Future studies are needed to verify our preliminary finding from the exploratory analyses that MRE is superior to SWE regardless of body habitus and steatosis severity. Also, the distribution of liver fibrosis in our cohort is skewed towards the milder end of the spectrum. Although this may increase the applicability of our results to common clinical contexts such as fibrosis screening, the relatively low number of participants with fibrosis stage ≥ 2 compared to fibrosis stage 0–1 limits discrimination between adjacent advanced fibrosis stages. For instance, for a predefined sensitivity ≥ 90%, MRE cutoff is the same for fibrosis stages 2, 3, and 4 (2.77 kPa) while SWE cutoff for fibrosis stage 3 is lower than for fibrosis stage 2. For the purposes of comparing SWE and MRE, fibrosis distribution affects both techniques equally and would not introduce a bias in favor of one method. Second, this study was conducted using US and MRI systems from a single manufacturer at a single subspecialty center focused on NAFLD research, which may limit the generalizability of its results to other settings such as community centers or sites with systems from other vendors. Third, as technology advances rapidly, it is possible that newer technologies would have provided more accurate performance. For SWE, this might include the use of software that provides real-time feedback on the quality of shear wave propagation and the use of time-harmonic elastography techniques in obese patients. For MRE, this might include a spin-echo echo planar imaging sequence rather than a GRE sequence and/or the use of thin flexible blanket-like torso phased array coils. Finally, we did not test the longitudinal reproducibility of the two modalities, a factor that would be important for determining the best test for monitoring treatment response. The comparative performance of SWE and MRE for longitudinal monitoring remains a gap in knowledge for which future research is needed.

In conclusion, this prospective study provided direct comparison of SWE versus MRE for staging fibrosis in a cohort of participants with known or suspected NAFLD and clinically indicated liver biopsy. We showed that in patients in whom both methods are adequate, MRE had significantly higher accuracy than SWE for diagnosing earlier (≥ 1 and ≥ 2) fibrosis stages. For purposes of directing biopsy to detect advanced fibrosis, SWE and MRE performed equally well, both demonstrating high NPV for excluding disease. Future studies that aim to evaluate the relative reproducibility of these modalities for longitudinal monitoring and the cost-effectiveness of various diagnostic approaches using combinations of SWE, MRE, biopsy, and clinical decision support will further inform optimal usage of both methods for clinical care and clinical trials.