Introduction

Focal nodular hyperplasia (FNH) is the second most common benign tumor of the liver, and it is found at autopsy with a prevalence of 0.3–3% [1, 2]. It is considered a hyperplastic reaction resulting from arterial malformation, with 60–80% of cases being asymptomatic and discovered by chance [3, 4]. A typical FNH is a solitary well-defined, unencapsulated mass, with a characteristic “spoke-wheel” central scar that contains dystrophic arterial vessels on computed tomography (CT), magnetic resonance imaging (MRI), and ultrasonography (US) [5]. MRI has a higher sensitivity than US and CT and a specificity of almost 100% for the diagnosis of FNH. However, its sensitivity is lower (70–80%), especially in small FNHs where the central scar is often missing. Previous reports show 35–70% FNHs do not have this imaging feature, and atypical findings including strong hyper-intensity on T2-weighted imaging, a pseudocapsule mimicking a true capsule, and washout can result in confusion with HCC [6,7,8]. Due to atypical radiological features, correct diagnosis of FNH on CT and MRI may not even be possible in about 30% and 20% of cases, respectively [3, 9]. The hepatobiliary phase (HBP) of gadoxetic acid-enhanced MRI (Gd-EOBDTPA-MRI) provides valuable diagnostic information for differentiation between FNH and HCC. As 10–15% of HCCs show iso- or hyperintensity on the HBP, and approximately 73–90% of FNHs show iso- or hyperintensity on the HBP, differential diagnosis is difficult because of the overlapping features [10].

However, intractable cases must be diagnosed accurately because they require entirely different medical management. HCC is the most common primary liver cancer and the third most common cause of cancer death worldwide. Once a diagnosis of HCC has been made, intervention must be initiated. Surgical resection is a recommended treatment option in patients with resectable HCC in the absence of clinically significant portal hypertension. Other treatments including ablation, transarterial embolization and radiotherapy, transplantation, and systemic pharmacological treatment also benefit some HCC patients. Chronic hepatitis B (CHB) is the leading etiology of HCC worldwide, and most cases of HBV-related HCC (70–90%) occur in patients with cirrhosis [11, 12]. However, FNH usually occurs in livers without cirrhosis. Therefore, the need to differentiate between HCC and FNH in liver with cirrhosis background is very rare, so our study population was limited to non-cirrhotic liver. Compared with other liver lesions, the diameter of the FNH is stable in most patients and complications are extremely rare [13, 14]. The American College of Gastroenterology (ACG) Clinical Guidelines suggest that asymptomatic FNH does not require intervention [15]. Therefore, in atypical cases difficult to diagnose on imaging in a non-cirrhotic liver, biopsy is necessary [7], but it is invasive with a potential for pain and other complications [16]. Other liver lesions, such as hepatocellular adenoma (HCA), should be taken into account in the differential diagnosis. The reported prevalence of HCA is between 0.001 and 0.004%, and it is approximately 10 times less common than FNH. There are few reports on the differential diagnosis between HCA and other liver lesions. Hence, HCA was not included in our study as there were not enough cases. In summary, a non-invasive method that can distinguish HCC from FNH is desperately needed.

Radiomics using a large number of quantitative features not available to the naked eye, has been used in tumor molecular classification, differential diagnosis, treatment selection, therapeutic effect detection, and prognosis evaluation. To our best knowledge, there are few studies on the differentiation between HCC and FNH on MRI using radiomics methods. This study aimed to develop and validate a radiomics model that is non-invasive and has high accuracy for differentiating HCC from FNH in non-cirrhotic liver.

Materials and methods

Patients

In this single-center retrospective study, medical records were viewed to identify all consecutive cases seen between May 2015 and May 2019. The inclusion criteria applied to HCCs were (a) diagnosis with postoperative pathological evidence, (b) without radiological features of liver cirrhosis, (c) no previous history of hepatectomy or radiotherapy, and (d) HCC without blood vessels, bile duct invasion, or distant metastasis radiologically, which strongly supported the diagnosis and there was no need for it to be distinguished from benign disease. The inclusion criteria applied to FNHs were (a) postoperative pathological evidence or liver biopsy and (b) typical FNH diagnosis according to the European Association for the Study of the Liver (EASL) Clinical Practice Guidelines [7] (to improve the applicability of the model). The exclusion criteria for both HCCs and FNHs were as follows: (a) absence of high-quality pretreatment Gd-DTPA contrast-enhanced MRI (ceMRI) performed in our center, (b) MRI data obtained at least 2 months prior to the acquisition of pathological evidence, and (c) incomplete medical records and unavailability of the required clinical data. Up to three imaging studies per patient were included as long as studies were more than 6 months apart.

This study was approved by the Institutional Ethics Committee of our hospital, and written informed consent was obtained from all study participants. The studies were performed in accordance with the ethical standards outlined in the 1964 Declaration of Helsinki and its later amendments or comparable ethical standards.

MR image acquisition and image processing

MRI examinations were performed using a 3.0 T magnetic resonance scanner (Magnetom Verio, Siemens Healthcare, Erlangen, Germany), an 8-channel phased array body coil, and a high-pressure syringe. The contrast agent was Gd-DTPA (Gd-DTPA, BeiLu Pharmaceutical Co., Ltd., Beijing, China), the dosage was 0.2 mL/kg; the speed was 2.5 mL/s, and the follow-up was rinsed with 20 mL normal saline. Preparation before the scan included fasting and no drinking for >4 h, psychological guidance, and breathing training (calm breath-holding at the end of the breath). Contrast-enhanced axial T1-weighted images (CE-T1) were acquired using a three-dimensional volumetric interpolated breath-hold examination (3D-VIBE) sequence (TR=4.16 ms, TE=2.01 ms, FOV=380×308 mm, matrix=320×320×75%, slice thickness=3 mm, spacing=3 mm, FA=16, and NEX=1) with multiphase contrast. Arterial phase (AP), portal venous phase (PVP), and delayed phase images were acquired after contrast administration at 20–30, 60–70, and 120–180 s for each patient, with breath-holding in all phases.

The FNH and HCC lesions were segmented manually using a 3D-Slicer (version 4.10.2; http://www.slicer.org). The AP and PVP of T1 images were used to indicate the volumes of interest (VOIs) by drawing the outline of tumor tissue layer-after-layer and avoiding the bile duct and vessels by Radiologists 1 and 2. If there were multiple lesions, only the largest lesions were segmented. PyRadiomics (version 2.1; http://www.radiomics.io/) implementation in 3D-Slicer was utilized for further preprocessing and radiomics feature extraction. We adopted resampling as a preprocessing method, which was performed to obtain a voxel size of 1×1×1 mm3 via trilinear interpolation before feature calculation [17]. A fixed bin width of 25 was used for the image discretization. Image reconstruction was performed by applying wavelet decomposition filtering and Laplacian of Gaussian filtering with sigma values of 0.5, 1.0, and 1.5. Seven common feature groups were extracted from filtered and original images in three dimensions, including a first order, gray-level dependence matrix (GLDM), gray-level co-occurrence matrix (GLCM), gray-level run length matrix (GLRLM), gray-level size zone matrix (GLSZM), neighboring gray tone difference matrix (NGTDM), and shape (Fig. 1).

Fig. 1
figure 1

Workflow of this study. Firstly, manual segmentation was performed on arterial and portal venous phases MR image. Secondly, image preprocessing and feature extraction are carried out in the volume of interest (VOIs), including seven common feature groups: first order, shape, GLDM, GLCM, GLRLM, GLSZM, NGTDM. Thirdly, in training set, random forest algorithm and MRMR algorithm were used for pre-screening, and then, correlation analysis and LASSO regression were performed to screen out key features for modeling. Finally, three models were established: Clinical Model, Radiomics Model and Combined Model, and model performance were evaluated in validation set. Note: GLDM gray-level dependence matrix, GLCM gray-level cooccurence matrix, GLRLM gray-level run length matrix, GLSZM gray-level size zone matrix, NGTDM neigboring gray tone difference matrix, mRMR Max-Relevance and Min-Redundancy, LASSO the least absolute shrinkage and selection operator algorithm

Inter-observer and intra-observer agreement

Inter- and intra-observer correlation coefficients (ICC) were used to evaluate the inter-observer reliability and intra-observer reproducibility of feature extraction [18]. Thirty samples were randomly chosen and delineated by two radiologists. Radiologist 1 delineated the VOIs on AP and PVP of T1 images twice within 1 week under the same standard to assess intra-observer reproducibility, and Radiologist 2 independently delineated the VOIs once to assess inter-observer agreement by comparing the results with the radiomics features extracted from the VOIs delineated by Radiologist 1 [18]. Radiomics features were selected when the ICC was >0.8. Radiologist 1 finished the remaining samples.

Radiomics feature selection and model construction

Before radiomics feature selection, z score normalization was employed to eliminate different feature magnitudes by scaling values to a mean of 0 and a standard deviation of 1 [18]. Then, the samples were randomly grouped into training (n=156) and validation sets (n=68). The training set was used for radiomics feature selection and construction of the three models. The validation set was used to evaluate the diagnostic performance of the three models. We used two machine learning algorithms: (i) Max-Relevance and Min-Redundancy (mRMR) and (ii) random forests (RF). Each algorithm selects the top 20 features with the highest score or the highest importance features. A correlation analysis was carried out to exclude the features with high correlation. Least absolute shrinkage and selection operator (LASSO) regression was employed for the next step selection of features, with penalty parameter tuning conducted by 10-fold cross-validation to compile a radiomics signature [18,19,20,21,22,23,24]. The optimal radiomics signature was used to create the radiomics model.

Construction of the clinical model and combined model

Univariate analysis was applied to compare the differences in clinical factors (including clinical information and MR features) between the two groups, and a multiple logistic regression analysis was used to build the clinical model using the significant variables from the univariate analysis as inputs. Odds ratios (ORs) as estimates of relative risk with 95% confidence intervals (CI) were obtained for each risk factor. The combined model was built using the clinical factors in the clinical model and Rad score in the radiomics model.

Statistics

Statistical analysis was performed using R (version 3.6.3; R Foundation for Statistical Computing, Vienna, Austria). Categorical variables were compared using the χ2 test or Fisher’s exact test. Continuous variables were expressed as the median [Q1, Q3] and compared using the Student’s t test or Mann-Whitney U test. Variables that reached statistical significance in the univariate analysis were considered for the multivariate binary logistic regression model. mRMR, RF, and LASSO were implemented using “mRMRe,” “randomForest,” and “glmnt,” respectively. The Delong test was used to measure the differences in the ROC curves [25]. P<0.05 was considered statistically significant.

Results

Patient characteristics

Our study selection process is described in Fig. 2. The cases search in our medical records generated 1261 HCC and 36 FNH with pathological evidence between May 2015 and May 2019. Of those HCC candidates, 1058 cases with liver cirrhosis; 3 cases with previous treatment; 15 cases with blood vessels, bile duct invasion or distant metastasis; and 36 cases with unsatisfied images or incomplete records were excluded. Based on 36 cases of FNH, 45 typical FNH diagnosed according to the EASL Clinical Practice Guidelines were added, and 6 cases were excluded due to unsatisfied images or incomplete records.

Fig. 2
figure 2

Flow chart of patient recruitment in this study. Note: HCC hepatocellular carcinoma in noncirrhotic liver, FNH focal nodular hyperplasia

Finally, a total of 224 patients with HCC (n=149, 124 men and 25 women; mean age, 56.8±11.9 years) and FNH (n=75, 30 men and 45 women; mean age, 37.0±12.1 years) were enrolled. The clinical factors of FNHs and HCCs in the training and validation sets are shown in Table 1. All clinical factors showed no significant difference between sets (P>0.05).

Table 1 Clinical factors of the training and validation sets

Inter- and intra- observer agreement

We extracted 2260 radiomics features in each patient from the AP and PVP of the ceMRI. Intra-observer agreement was 85.3% (mean ICC=0.90). Inter-observer agreement for all 2260 features reached 78.4% (mean ICC=0.85). Seven hundred and thirty-one features were excluded.

Radiomics feature selection and model construction

After combining the top 20 engineered features ranked by the mRMR and RF algorithms, 33 features were identified from the training set, with seven features selected simultaneously by two algorithms (Fig. 3). Pearson correlation analysis of the 33 features showed that 11 pairs of features were highly correlated (coefficients >0.80). Twenty-two features were subjected to LASSO regression, and eight features were selected with the best tuned regularization parameter λ of 0.041 under the 1-SE criterion found by 10-fold cross validation. The selected features were calculated according to the following formula to build a radiomics model: Rad score = −6.68 * (PVP-glcm-wavelet-HHL-InverseVariance) – 3.87 * (AP-firstorder-original-10Percentile) – 2.81 * (PVP-glcm-log-sigma-1-5-mm-3D-MaximumProbability) – 1.65 * (PVP-glcm-MaximumProbability) + 0.08 * (AP-glcm-log-sigma-1-0-mm-3D-ClusterShade) + 0.11 * (PVP-first order-wavelet-HLL-Median) + 0.54 * (AP-firstorder-log-sigma-0-5-mm-3D-Median) + 1.81 * (AP-shape-original-Elongation). The Mann-Whitney U test of the Rad score was performed in the training and validation sets, and statistically significant differences were found between the two sets (Fig. 4).

Fig. 3
figure 3

Dimensionality reduction and Radiomics Model construction. a The 20 features selected by the mRMR algorithm according to features score. b The 20 features selected by the Random forests algorithm according to features importance. c The correlation analysis heatmap of 33 features screened by the two algorithms above (seven overlapping features were removed). d LASSO regression analysis of 33 features, the vertical line shows the optimal value of λ= 0.041 and 8 corresponding features with non-zero coefficients. e The AUC curve was plotted by tuning parameter (λ) selection performed by 10-fold cross-validation. Vertical lines on the left and right denote the minimum criterion and 1-standard error criterion (1-SE), respectively. The 1-SE criterion was applied

Fig. 4
figure 4

Rad score of NC-HCC and FNH in training (a) and validation (b) set. NC-HCC has a higher Rad score than FNH both in training and validation set. Rad score=-6.68*(PVP-glcm-wavelet-HHL-InverseVariance)-3.87*(AP-firstorder-original-10Percentile)-2.81*(PVP-glcm-log-sigma-1-5-mm-3D-MaximumProbability)-1.65*(PVP-glcm-MaximumProbability)+0.08*(AP-glcm-log-sigma-1-0-mm-3D-ClusterShade)+0.11*(PVP-firstorder-wavelet-HLL-Median)+0.54*(AP-firstorder-log-sigma-0-5-mm-3D-Median)+1.81*(AP-shape-original-Elongation)

Construction of the clinical model and combined model

Univariate analysis showed that age, sex, HbsAg, MRI tumor size, MRI tumor number, location, lesion homogeneity, and enhancement pattern reached statistical significance (P<0.05). Multivariate analysis showed that age (OR=11.09 [3.13–49.40], P<0.001), sex (OR=5.57 [1.74–19.85], P=0.005), HbsAg (OR=14.75 [4.43–60.94], P<0.001), and enhancement pattern (OR=0.21 [0.07–0.52], P=0.001) were independent predictors for differential diagnosis of HCC and FNH, and they were used to build the clinical model. A combined model was also built using the four clinical factors and Rad score by logistic regression.

Diagnostic performance of the three models

Good performance of the clinical model, radiomics model, and combined model for the training set was observed, with an area under the curve (AUC) of 0.937 (95% CI 0.887–0.970), 0.960 (95% CI 0.916–0.985), and 0.984 (95% CI 0.949–0.997), with a classification accuracy of 0.853, 0.917, and 0.956, respectively. When comparing the AUCs between the three models, the combined model proved to be significantly better than the clinical model (P=0.002), but the difference between the clinical model and radiomics model was not statistically significant.

Consistent results were obtained in the validation set. The AUC of the clinical model, radiomics model, and combined model for the validation set was 0.903 (95% CI 0.807–0.962), 0.931 (95% CI 0.843–0.978), and 0.972 (95% CI 0.900–0.997), with a classification accuracy of 0.853, 0.868, and 0.941, respectively. When comparing the AUCs between the three models, the combined model proved to be significantly better than the clinical model (P=0.032), but the difference between the clinical model and radiomics model was not statistically significant (Table 2).

Table 2 Model performance in the training and validation sets

Discussion

In this study, we established three models to distinguish HCC from FNH in non-cirrhotic liver using four clinical factors and a Rad score, which was combined with eight radiomics features filtrated from AP and PVP on MRI. In comparison to the clinical model, the combined model showed overall superiority in the evaluation of accuracy, sensitivity, specificity, and AUC in both the training and validation sets (Table 2, Fig. 5). The addition of radiomics features improved the performance of the diagnostic model, but the radiomics model did not bring significant improvement compared to the clinical model.

Fig. 5
figure 5

ROC curves comparing the three models in training (a) and validation (b) set. The hollow point represents the optimal cut-off value on the curve

Many previous studies have provided several ways to differentiate HCC from FNH. Li et al. [26] enrolled 38 patients with HCC and 65 with FNH to assess the diagnostic ability of contrast-enhanced US (ceUS) and microflow (MF) imaging and found that MF imaging had an excellent diagnostic performance in terms of differentiating between atypical HCC and FNH compared to routine ceUS. Yu et al. [27] included 42 HCCs and 16 FNHs and performed spectral CT during the arterial and portal venous phases and found that CT spectral imaging increased the detectability and accuracy of differentiation between HCC and FNH. Nie et al. [28] developed and validated a CT-based radiomics nomogram for preoperative differentiation of FNH from HCC in livers without cirrhosis, achieving an AUC of 0.917 in the validation group. Several studies [29,30,31] have indicated that Gd-EOBDTPA-MRI is helpful for the diagnosis of FNH, as most FNHs show high- or iso-signal intensity (SI) compared to liver parenchyma in the hepatobiliary phase (HBP). However, Lee et al. [32] found that 85% of well-differentiated HCCs were hypointense on HBP, and about 15% of well-differentiated HCCs were iso- or hyperintense on HBP, illustrating that there is still some overlap between them, even in the HBP. In this study, we established a combined model for differential diagnosis of HCC from FNH in non-cirrhotic livers. Our model is non-invasive and easy to implement, and it achieved excellent performance with an AUC of 0.972 in the validation set.

In our study, the clinical model did not achieve the best AUC (0.937 and 0.903 in the training and validation sets, respectively), but it was still relatively high. Although we included as many of the radiological features that we could to help identify the two diseases as mentioned in the EASL Clinical Practice Guide for benign liver tumors, such as liver hemangioma, steatosis in lesions, and the liver, they turned out not to be strong predictors. The presence of a central scar is a typical feature of FNH, which is identified on MRI in approximately 30–50% of FNH cases [8]. On the other hand, about 50% of non-cirrhotic HCCs have a central scar detectable by MRI, especially in fibrolamellar carcinoma [33]. In our study, 32% of FNHs had a central scar, which was consistent with previous studies. In our study, the central scar was ultimately not included in the model.

Our results were consistent with the study reported by Nie et al. [28]. They also included only one radiological feature in their model, which was enhancement pattern, as we did. The epidemiological and clinical characteristics of these two diseases are also important references for differential diagnosis. FNH mainly occurs in females (up to 90% of cases), with an average age between 35 and 50 years. HCC mainly occurs in elderly males, usually accompanied by hepatitis B virus infection. Clinical factors—age, sex, and HbsAg—were consistent with the epidemiological differences between the two diseases, indicating the interpretability of our models.

Radiomics includes an enormous amount of data with high-dimensional characteristics, so it is important to know how to extract the key features from such a huge amount of data. In order to ensure the reproducibility of the selected features and avoid the interference by other subjective factors, we implemented rigorous feature selection in combination with machine learning. First, inter-observer and intra-observer agreements were evaluated, and features with an ICC >0.8 were included. Second, two machine learning algorithms, mRMR and RF, were used for feature filtering. Third, a correlation analysis of the features screened by the two algorithms was performed to exclude features of high correlation. Finally, LASSO regression, one of the most commonly used methods for dimensionality reduction in radiomics, was used to obtain the optimal radiomics signature.

There are some differences between our model and existing diagnostic techniques. The existing diagnostic techniques mainly use enhanced CT, enhanced MR, or contrast-enhanced ultrasound examination to observe the imaging findings of the lesions. Meanwhile, the baseline data of the patients, such as gender, age, AFP, and background of cirrhosis, are also important references. LI-RADS standard was used as the diagnostic criteria for HCC, and EASL Clinical Practice Guidelines was used for FNH diagnosis [7, 34]. Since imaging diagnosis depends on subjective judgment, not all HCC or FNH have typical imaging findings, and heterogeneity between observers is strong, atypical cases can only be confirmed by invasive pathological evidence (surgery or biopsy). Unlike existing diagnostic technology, we established “An MR-based radiomics model” based on radiomics features extracted form MR images, combined with patient baseline characteristics, made diagnosis using mathematical model based on the objective parameters, and achieved an AUC of 0.984 and 0.972 in the training and validation sets. Furthermore, our model is the first study to use MR radiomics model for the differential diagnosis of HCC and FNH.

Our study had several limitations. First, the number of samples was still limited compared to the large number of features. A large-scale clinical study enrolling more samples would help validate and improve the applicability of our model as an effective tool for differentiating between FNH and HCC. Second, external validation is needed to further verify the accuracy and clinical practicability of the model. Finally, sample selection bias was unavoidable in this retrospective study. Therefore, a prospective study should be conducted to further prove the practicability of the model.

In conclusion, our novel MR-based radiomics model demonstrated a powerful diagnostic capability because of its excellent performance, with a certain reference value for differentiating HCC from FNH in clinical studies.