Correlation of histologic, imaging, and artificial intelligence features in NAFLD patients, derived from Gd-EOB-DTPA-enhanced MRI: a proof-of-concept study

Bastati, Nina; Perkonigg, Matthias; Sobotka, Daniel; Poetter-Lang, Sarah; Fragner, Romana; Beer, Andrea; Messner, Alina; Watzenboeck, Martin; Pochepnia, Svitlana; Kittinger, Jakob; Herold, Alexander; Kristic, Antonia; Hodge, Jacqueline C.; Traussnig, Stefan; Trauner, Michael; Ba-Ssalamah, Ahmed; Langs, Georg

doi:10.1007/s00330-023-09735-5

Correlation of histologic, imaging, and artificial intelligence features in NAFLD patients, derived from Gd-EOB-DTPA-enhanced MRI: a proof-of-concept study

Magnetic Resonance
Open access
Published: 26 June 2023

Volume 33, pages 7729–7743, (2023)
Cite this article

Download PDF

You have full access to this open access article

European Radiology Aims and scope Submit manuscript

Correlation of histologic, imaging, and artificial intelligence features in NAFLD patients, derived from Gd-EOB-DTPA-enhanced MRI: a proof-of-concept study

Download PDF

Nina Bastati¹,
Matthias Perkonigg²,
Daniel Sobotka²,
Sarah Poetter-Lang¹,
Romana Fragner¹,
Andrea Beer³,
Alina Messner¹,
Martin Watzenboeck¹,
Svitlana Pochepnia¹,
Jakob Kittinger¹,
Alexander Herold¹,
Antonia Kristic¹,
Jacqueline C. Hodge¹,
Stefan Traussnig⁴,
Michael Trauner⁴,
Ahmed Ba-Ssalamah ORCID: orcid.org/0000-0002-3527-404X^1,5 &
…
Georg Langs^1,2

1566 Accesses
1 Citation
2 Altmetric
Explore all metrics

Abstract

Objective

To compare unsupervised deep clustering (UDC) to fat fraction (FF) and relative liver enhancement (RLE) on Gd-EOB-DTPA-enhanced MRI to distinguish simple steatosis from non-alcoholic steatohepatitis (NASH), using histology as the gold standard.

Materials and methods

A derivation group of 46 non-alcoholic fatty liver disease (NAFLD) patients underwent 3-T MRI. Histology assessed steatosis, inflammation, ballooning, and fibrosis. UDC was trained to group different texture patterns from MR data into 10 distinct clusters per sequence on unenhanced T1- and Gd-EOB-DTPA-enhanced T1-weighted hepatobiliary phase (T1-Gd-EOB-DTPA-HBP), then on T1 in- and opposed-phase images. RLE and FF were quantified on identical sequences. Differences of these parameters between NASH and simple steatosis were evaluated with χ²- and t-tests, respectively. Linear regression and Random Forest classifier were performed to identify associations between histological NAFLD features, RLE, FF, and UDC patterns, and then determine predictors able to distinguish simple steatosis from NASH. ROC curves assessed diagnostic performance of UDC, RLE, and FF. Finally, we tested these parameters on 30 validation cohorts.

Results

For the derivation group, UDC-derived features from unenhanced and T1-Gd-EOB-DTPA-HBP, plus from T1 in- and opposed-phase, distinguished NASH from simple steatosis (p ≤ 0.001 and p = 0.02, respectively) with 85% and 80% accuracy, respectively, while RLE and FF distinguished NASH from simple steatosis (p ≤ 0.001 and p = 0.004, respectively), with 83% and 78% accuracy, respectively. On multivariate regression analysis, RLE and FF correlated only with fibrosis (p = 0.040) and steatosis (p ≤ 0.001), respectively. Conversely, UDC features, using Random Forest classifier predictors, correlated with all histologic NAFLD components. The validation group confirmed these results for both approaches.

Conclusion

UDC, RLE, and FF could independently separate NASH from simple steatosis. UDC may predict all histologic NAFLD components.

Clinical relevance statement

Using gadoxetic acid–enhanced MR, fat fraction (FF > 5%) can diagnose NAFLD, and relative liver enhancement can distinguish NASH from simple steatosis. Adding AI may let us non-invasively estimate the histologic components, i.e., fat, ballooning, inflammation, and fibrosis, the latter the main prognosticator.

Key Points

• Unsupervised deep clustering (UDC) and MR-based parameters (FF and RLE) could independently distinguish simple steatosis from NASH in the derivation group.

• On multivariate analysis, RLE could predict only fibrosis, and FF could predict only steatosis; however, UDC could predict all histologic NAFLD components in the derivation group.

• The validation cohort confirmed the findings for the derivation group.

Fully automated prediction of liver fibrosis using deep learning analysis of gadoxetic acid–enhanced MRI

Article 17 November 2020

Image-based AI diagnostic performance for fatty liver: a systematic review and meta-analysis

Article Open access 11 December 2023

Deep learning enables automated scoring of liver fibrosis stages

Article Open access 30 October 2018

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Non-alcoholic fatty liver disease (NAFLD) has become a significant public health problem as its incidence continues to increase [1, 2]. NAFLD comprises simple steatosis, with relatively low liver-related morbidity, and non-alcoholic steatohepatitis (NASH), which may lead to progressive hepatic dysfunction and liver-related mortality [3]. While simple steatosis typically improves with lifestyle changes, NASH may require additional pharmacotherapy [1, 2]. The sequelae of NASH, i.e., end-stage liver cirrhosis, liver failure, hepatocellular carcinoma (HCC), and/or eventual liver transplantation, can be mitigated through early diagnosis and management [4, 5].

Currently, NASH is still routinely diagnosed by liver biopsy, an invasive procedure which increases the risk of bleeding in patients already prone to coagulopathy. Thus, patient acceptance is poor, restricting its utility for long-term monitoring. Further limitations include sampling errors due to uneven distribution of steatosis and high inter-observer variability [6, 7]. Moreover, universal liver biopsy is not feasible in a high-prevalence disease such as NAFLD. In addition, serum markers are largely nonspecific and conventional imaging, including US, CT, and gadolinium chelate-enhanced MRI, cannot differentiate between NASH and simple steatosis [8, 9]. Thus, a non-invasive diagnostic test, with both high sensitivity and specificity for detection and monitoring of NASH, is urgently needed [10].

Multiparametric magnetic resonance imaging (MRI) with its ability to quantify proton density fat fraction (PDFF), using a gamut of techniques such as dual-echo chemical shift imaging (CSI), i.e., in- and opposed-phase [11] [12], multi-echo technique, or MR proton spectroscopy (MRS) [13], as well as detecting fibrosis and inflammation with MR elastography [14], has emerged as a powerful tool.

Gd-EOB-DTPA-MRI, initially used to detect and characterize focal liver lesions, such as HCC complicating NAFLD, has been shown to distinguish between simple steatosis and NASH from the calculated relative liver enhancement (RLE) [15]. Also, CSI was able to differentiate between both entities using the fat fraction (FF) [16]. Furthermore, artificial intelligence (AI), including deep learning, may shed light on the imaging features of NAFLD. Recently, an unsupervised predictive texture discovery, proposed by Perkonigg et al, was introduced [17]. This approach is based on the deep clustering networks (DCN) [18] and uses random forests to link the histologically-relevant information to the texture patterns extracted by this approach [19]. Therefore, the aim of this study was to investigate in a derivation group whether a hybrid unsupervised and supervised deep learning approach could identify predictive patterns that could differentiate simple steatosis from NASH using the CSI technique, as well as unenhanced T1 and Gd-EOB-DTPA-MR images in the hepatobiliary phase (T1-Gd-EOB-DTPA-HBP). Furthermore, we compared the ability of UDC with that of RLE and FF, all data derived from identical MR sequences, to distinguish between NASH and simple steatosis in NAFLD patients. Histopathology was used as the gold standard. After identifying simple steatosis vs NASH predictors in the derivation group, we applied this model to a validation group.

Materials and methods

Patients

Written informed consent was obtained from all patients and the study protocol approved by the local ethics committee for this single-center study. Whereas the derivation cohorts were enrolled prospectively, the validation cohorts, imaged on another scanner with different software and exam parameters, were gathered retrospectively.

Patients with clinical features suspicious for fatty liver on ultrasound and elevated serum levels of aspartate and alanine aminotransferase were recruited from the Division of Gastroenterology and Hepatology of our tertiary academic institution. Inclusion criteria included histologic proof of simple steatosis or NASH and use of a standardized complete Gd-EOB-DTPA-enhanced MR protocol. Exclusion criteria were age < 18 years, pregnancy, alcohol consumption of ≥ 20 g/day, presence of hepatitis B and C infection, autoimmune liver diseases, hemochromatosis, Wilson’s disease, α-1 antitrypsin deficiency, toxic liver diseases, primary biliary cirrhosis, and primary sclerosing cholangitis, respectively, according to American and European current guidelines [1, 2]. There were 49 derivation and 30 validation patients. We excluded three derivation-group patients, two with incomplete MRI and one who refused biopsy. The final derivation cohort included 46 patients and 30 validation patients, all with complete MRI and histology reports.

Reference standard: biopsy and histopathological analysis

All liver biopsy specimens were evaluated by an experienced pathologist using the Steatosis Activity Fibrosis (SAF) scoring system as the gold standard [20], including steatosis grade (mild, moderate, and severe), and two of these three features: (1) necro-inflammation with mononuclear cells and/or polymorphonuclear leukocytes, (2) ballooning degeneration of hepatocytes, and (3) perisinusoidal and/or bridging fibrosis.

Blood markers

For blood markers, we considered common biochemical parameters, including levels of total bilirubin, aspartate aminotransferase, alanine aminotransferase, alkaline phosphatase, g-glutamyl transpeptidase, triglycerides, high-density lipoprotein cholesterol, and glucose. In all patients, the serum markers were measured in the same laboratory within 1 week of MR imaging. Furthermore, we used the FIB-4 score, the NAFLD Fibrosis Score (NFS), the ALBI score, and the APRI score as established non-invasive biomarkers for accurate stratification of patients at higher risk of NASH and advanced fibrosis.

MRI protocol

All derivation-group MR examinations were performed on a 3-T scanner (Magnetom Trio, A Tim) and all validation-group exams were done on a 3-T (Magnetom Prisma Fit) Siemens Healthineers. The MRI protocol included a chemical shift imaging (CSI) technique, with in-phase and opposed-phase transverse T1-weighted, dual gradient-echo sequence pre-contrast media. Furthermore, unenhanced and dynamic contrast-enhanced, three-dimensional, breath-hold, T1-weighted spoiled gradient-echo volumetric (VIBE) sequences, including the hepatobiliary phase, i.e., 20 min after CM injection, diffusion-weighted images (DWI), and conventional T2-weighted images, were acquired. A standard dose of Gd-EOB-DTPA (0.025 mmol/kg; Primovist® in Europe and Eovist® in the USA; Bayer Healthcare, Berlin, Germany) was administered as a bolus intravenously, for all patients of both groups using a power injector at a rate of 1.0 mL/s, immediately followed by a 20-mL saline flush. MR acquisition parameters are given in Table 1 and Table 1S.

Table 1 Derivation group. MR protocol with exam parameter

Full size table

Image analysis

Computational image analysis and UDC had two main steps combining supervised and unsupervised machine learning, as follows [17]:

• First, in a pre-processing step, the liver was automatically segmented on MR sequences in all image volumes using a convolutional neural network architecture called U-Net which is particularly well-suited for image segmentation tasks [21].

• Then, unsupervised machine learning, using a combined deep learning and clustering method, identified a set of image patterns frequent on liver MRI across NAFLD patients. For our 46 NAFLD patients, 50,000 2D patches in the axial orientation were randomly extracted [22]. The clusters of every liver were also linked to the histological target variables for that liver.

• Then an autoencoder network that had been trained to reconstruct low-dimensional input accurately used three convolutional layers and three upsampling operations to rebuild the liver images in the latent space.

• Simultaneously, the DCN method assigned patches with similar appearances in this latent space into 10 distinct clusters.

• Lastly, we had the trained network use a sliding window to parse (i.e., search) the entire axial liver slice of all 46 NAFLD patients. At each position, it extracted, processed, and assigned the patch to one of the 10 clusters derived during the training. The UDC signature of each liver was the relative proportion of that liver image that belonged to each of the 10 clusters, i.e., a histogram. An overview of the method is illustrated in Fig. 1.

• Then, we created 46 × 3 UDC, one for each MRI sequence of each cohort: unenhanced T1-, T1-Gd-EOB-DTPA-HBP, and [unenhanced T1-in-phase and unenhanced T1-opposed phase]. To combine information from unenhanced T1- and Gd-EOB-DTPA-HBP scans, we created a 10-component UDC signature for each, and combined them by concatenation resulting in a 20-component UDC signature for each patient. UDC signatures for T1 in- and opposed-phase images were calculated independently from the Gd-EOB-DTPA-enhanced images and resulted in an additional 10-dimensional feature vector per patient.

• In the second step, the UDC signatures of liver scans were used as feature vectors to perform supervised machine learning with a Random Forest regression model [19]. Then, those feature vectors were tested to see if and how accurately they could predict histologically-relevant features and grades of steatosis, inflammation, fibrosis, and ballooning to classify the patient as simple steatosis or NASH. In other words, this cross-validation tested the model’s performance.

Conventional MRI quantification analysis used signal intensity (SI) measurements performed on a commercially available workstation (PACS system, AGFA-Healthcare, version 5.2) by two independent observers: a fellowship-trained radiologist with more than 8 years of experience (N.B.) in abdominal MR imaging, and a technologist with 3 years’ MR experience (R.F.). Both observers were blinded to patients’ clinical history, laboratory data, and histopathology characteristics.

•The liver parenchymal SI was measured on unenhanced (PreSI), then on contrast-enhanced images obtained 20 min after contrast medium administration (PostSI) [15]. Measurements were performed by positioning nine separate circular regions of interest (ROIs) ≥ 1 cm in diameter in each Couinaud liver segment, including segments 4a and b separately (Fig. 2). ROIs were drawn to avoid vascular motion and abdominal wall artifacts and were positioned far from visible vascular and biliary structures. Liver SIs were calculated as the relative enhancement reported on the unenhanced images, according to the formula: Relative Liver Enhancement (RLE) = (PostSI-PreSI)/PreSI, as previously described in detail [15].

•The hepatic fat fraction (FF) was calculated by both radiologists independently. Again, they placed the ROIs as described above in all liver segments on the in- and opposed-phase sequences. Liver fat was quantified as follows: [(SIin-SIopp)/2 × SIin] × 100 as the percentage of relative signal intensity loss of the liver parenchyma on opposed-phase images. SIin and SIopp were liver parenchyma signal intensity on in-phase or opposed-phase images, respectively [23].

•Finally, we calculated the average liver SI for RLE and FF by adding the mean signal intensity of all Couinaud segments for RLE and FF, respectively.

Statistical analysis

Categorical variables are presented as numbers and percentages, and continuous variables as means and standard deviations. Differences between NASH and simple steatosis were evaluated by the χ² test for categorical data, and differences in continuous data between both groups were assessed using Student’s t-test. Mean RLE or mean FF was first tested with univariate and then with multiple regression analysis to see whether there was an association with NAFLD’s histologic features and to identify independent imaging predictors to distinguish NASH from simple steatosis. For UDC signatures, we used a Random Forest classifier to link those features to histology and evaluate their predictive values. To assess the diagnostic performance of the two methods (UDC features and conventional MRI quantification methods, i.e., mean FF and mean RLE) to accurately separate NASH from simple steatosis, a receiver-operating characteristic (ROC) curve analysis was performed and optimal cutoff values were chosen by using a common optimization step that maximized the Youden index for predicting which patients had NASH. Subsequently, sensitivity, specificity, accuracy, positive predictive values (PPV), and negative predictive values (NPV) for the appropriate cutoffs and area under the curve (AUC) for both methods were calculated. The inter-rater variability was assessed by two-way mixed intraclass correlation coefficient (ICC) with absolute agreement [24]. The DeLong test was performed to compare the AUC for the combined UDC, RLE, and FF features for the derivation and validation groups [25]. All statistical analyses were performed for the derivation and validation in SPSS 25.0 (SPSS Inc) or Python v3.7.0. Statistical significance was set at a p value of less than 0.05.

Results

Derivation group

Characteristics

Forty-six patients prospectively enrolled, consisting of M = 29 (63%), mean age of 49 years (range, 18–78 years). The mean age for women was 44.62 years (range, 18–64 years), and for men 51.52 years (range, 23–81 years). Histologically, 28 (61%) met the criteria for NASH, leaving 18 classified as simple steatosis.

There were more men than women in the NASH group, but the differences between gender, age, and BMI were not statistically significant (Table 2). The interval between MRI and liver biopsy was 1 to 3 days.

Table 2 Derivation group. Anthropometric, clinical, and laboratory characteristics of 46 patients of the two groups of NAFLD (simple steatosis, and NASH)

Full size table

The liver enzymes were generally higher in NASH than in simple steatosis patients. However, the difference was not statistically significant in the majority of these data (Table 2). Established clinical scores, including AST/ALT ratio, and APRI, ALBI, NFS, and Fib-4 scores, were also higher in NASH patients. However, only the NFS score reached statistical significance (Table 2).

Table 3 Derivation group. Histological characteristics of NAFLD patients according to SAF score

Full size table

The final liver histology diagnosis and the distribution of fatty infiltration, lobular inflammation, ballooning, and fibrosis stage according to the SAF score (i.e., S ≥ 1, A ≥ 1 + ≥ 1, any F score for NASH) were used as the gold standard (Table 3). The NASH group had a significantly higher number of patients with increased lobular inflammation (p < 0.0001), steatosis (p = 0.002), and ballooning (p = 0.005), as well as fibrosis (p = 0.001), compared to those with simple steatosis.

Results of liver segmentation

The U-Net used for liver segmentation was trained on the derivation liver cohort. We randomly sampled 7 of the 46 patients and created ground truth labels for the evaluation of the segmentation accuracy on both T1-Gd-EOB-DTPA-HBP and unenhanced T1 sequences. We found an increased accuracy for T1-Gd-EOB-DTPA-HBP (Dice: 0.960, recall: 0.945, precision: 0.976) compared to unenhanced T1 sequence (Dice: 0.897, recall: 0.961, precision: 0.842).

Results of UDC

In the derivation group (p ≤ 0.001) overall, we were able to find features that distinguished NASH from simple steatosis using Student’s t-test (Table 4). The results derived from unenhanced T1- and T1-Gd-EOB-DTPA MRI in the hepatobiliary phase (T1-Gd-EOB-DTPA-HBP) for fibrosis, steatosis, lobular inflammation, and hepatocyte ballooning using Random Forest regression were calculated. Using the UDC in the derivation group, we could predict variables differentiating between low- (grade 0, 1) and high-grade steatosis (p < 0.001), low- (grade < 3) and high-grade fibrosis (p = 0.0005), and also gradations of lobular inflammation (p = 0.001) and ballooning (p = 0.04).

Table 4 Derivation group. MR imaging and UDC parameters demonstrating the differences between simple steatosis and NASH of 46 patients with NAFLD according to the SAF score for both readers (R1 and R2) using the t-test

Full size table

Furthermore, Random Forest classifier was able to differentiate NASH from simple steatosis patients with an accuracy of 85.2% [AUROC = 0.854 (95% CI: 0.76–0.98)], a sensitivity of 89.2%, a specificity of 72.2%, a PPV of 83.3%, and a NPV of 81.3% (Fig. 3a).

In the derivation group, UDC signatures derived from CSI (T1-weighted chemical shift imaging, i.e., in- and opposed-phases) were able to differentiate between NASH and simple steatosis using Student’s t-test (p = 0.02) (Table 4). Using Random Forest regression, we could distinguish only between low- and high-grade of steatosis (p = 0.02) and inflammation (p = 0.01). UDC based on CSI failed to capture features that could reliably separate the various grades of fibrosis (p = 0.13) or hepatocyte ballooning (p = 0.65).

Random Forest classifier allowed us to distinguish NASH from simple steatosis patients with an accuracy of 80.4% [AUROC = 0.792 (95%CI 0.76–0.98)], a sensitivity of 89.3%, a specificity of 66.6%, a PPV of 80.6%, and a NPV of 80.0%. The ROC curve is depicted in Fig. 3b.

The Random Forest classifier, based on unenhanced T1- and T1-Gd-EOB-DTPA-HBP combined with CSI, was able to differentiate NASH from simple steatosis patients with an accuracy of 78.3% [AUROC = 0.84], a sensitivity of 75.0%, a specificity of 83.3%, a PPV of 87.5%, and a NPV of 68.2%. The combined ROC curve is depicted in Fig. 3c.

Results of MR-derived measurements (RLE and FF)

MRI parameters, derived from the same images as those used in the UDC, i.e., unenhanced T1- and Gd-EOB-DTPA-enhanced MRI (T1-Gd-EOB-DTPA-HBP) and CSI sequences, were significantly different in NASH compared to simple steatosis patients for both readers. Moreover, there was excellent inter-reader agreement for these measurements, with high ICC (0.8–0.9) values (Table 4).

Univariate and multivariate analyses of the relationship between RLE, FF, and histopathologic parameters are summarized in Table 5. In the univariate analysis, RLE was negatively correlated with the degree of liver steatosis (Beta = − 0.422, p = 0.004), lobular inflammation (Beta = − 0.408, p = 0.005), and degree of fibrosis (Beta = − 0.500, p ≤ 0.001), but not with the activity score for ballooning (Beta = − 0.282, p = 0.059).

Table 5 Derivation group. Correlation of conventional MR parameters using RLE/FF and histologic parameters according to univariate and multiple regression analyses for reader 1

Full size table

In the multiple regression analysis using backward elimination, only fibrosis (Beta = − 0.397, p = 0.040, Beta − 0.574, p ≤ 0.001) remained a significant predictor of NASH. Likewise, in the univariate analysis, FF was positively correlated with steatosis (Beta = 0.733, p = 0.001) and inflammation (Beta = 0.367, p = 0.012), but not with ballooning (Beta = 0.105, p = 0.485) or fibrosis (Beta = 0.069, p = 0.647). In the multiple regression analysis using backward elimination, only steatosis (Beta = 0.723, p ≤ 0.001) remained significant.

ROC analysis of RLE, derived from unenhanced T1 and Gd-EOB-DTPA-HBP-T1 sequences, and FF quantification, yielded the diagnostic performance of differentiating between NASH and simple steatosis. For RLE, accuracy was 83.1% [AUROC = 0.808 (95% CI: 0.76–0.98)], sensitivity 85.7%, specificity 83.3%, PPV 88.9%, and NPV 78.9% (Fig. 4a).

The FF was able to differentiate NASH from simple steatosis patients with an accuracy of 78.3% [AUROC = 0.778 (95%CI 0.81–0.98)], a sensitivity of 85%, a specificity of 66.7%, a PPV of 80.0%, and a NPV of 75.0% (Fig. 4b).

Results of combined UDC, RLE, and FF

Finally, with the DeLong method, we compared the efficacy of UDC, using unenhanced T1 and T1-Gd-EOB-DTPA-HBP, combined with CSI based on the Random Forest classifier, as well as RLE and FF [25]. The combined ROC curves, as well as the DeLong p values, can be found in Fig. 4c. While none of the p values reached the nominal threshold of statistical significance (p < 0.05), there was a trend showing an improvement in classification accuracy when combining RLE and FF with UDC features from both in-phase and opposed-phase images and unenhanced T1-weighted images/Gd-EOB-DTPA-enhanced HBP images against the UDC features alone (AUC UDC features combined + RLE + FF = 0.94, AUC UDC features combined = 0.83, DeLong p value = 0.06).