Introduction

According to the World Cancer Report published by the International Agency for Research on Cancer (IARC), hepatocellular carcinoma (HCC), the third leading cause of cancer-related mortality worldwide, accounts for approximately 80% of primary liver cancers. In 2020, there were an estimated 410,038 new cases and 391,152 deaths from HCC in China [1]. Cirrhosis secondary to hepatitis B virus (HBV) infection is a major risk factor for HCC and affects the majority of middle-aged and elderly Chinese men [2]. Additional main risk factors for HCC observed worldwide include external exposure to toxins (aflatoxin exposure, alcohol consumption) and hepatitis C virus (HCV) infection. However, the early manifestations of the process of HCC development from dysplastic nodules (DNs) to small hepatocellular carcinoma (SHCC), which is defined by the very early HCC criteria of the Barcelona Clinic Liver Cancer (BCLC) criteria as a single HCC nodule less than 2 cm in diameter, are difficult to detect clinically. Thus, a large number of Chinese patients are diagnosed at an advanced stage and receive only palliative care [3]. The 5-year survival rate for HCC is only 33% in all races [4], but in China, it drops to 14.1% [5]. Therefore, it is specifically essential to find a new measurement to detect these changes in the early stage for diagnosis and intervention.

Medical imaging technologies, including contrast-enhanced computed tomography (CE-CT), contrast-enhanced magnetic resonance imaging (CE-MRI), and contrast-enhanced ultrasound imaging (CE-US), can achieve satisfactory diagnostic results for HCC. If the imaging profile on dynamic MR is specific for HCC (intense contrast-enhanced agent uptake in the arterial phase followed by extracellular contrast wash-out in the venous and/or delayed phase), a diagnosis can be made even without histological confirmation [6,7,8]. However, it is still difficult to identify very early lesions by medical imaging, even for an experienced radiologist. Although these lesions may have already become malignant, they remain microscopic on dynamic MRI and lack typical radiological markers [9].

The term “radiomics” was first proposed by Lambin et al. [10]. It is a field that focuses on improving image analysis and the high-throughput extraction of a vast number of quantitative features from medical images. The underlying basis of radiomics is that the analysis of these quantitative features can provide more and better information than physicians can by visually analysing these images. It has been reported that radiomics features of the tumour area present significant predictive efficacy in the classification of HCC [11,12,13,14,15]. Li Yang et al. [16] and Huang et al. [17] showed the best areas under the receiver operating characteristic curve (AUROCs) of 0.861 and 0.784 in the validation cohort, respectively. However, these previous studies were conducted on the assumption that HCC lesions could be observed on MR images, and few studies have focused on early or very early HCC lesions with no imaging changes.

Hence, we aimed to develop a multiphase MRI-based radiomics model to evaluate the risk of microscopic pre-HCC lesions for HCC and/or other liver-related disease patients.

Materials and methods

Patient cohort

The institutional review board of our institution approved this retrospective study. The requirement for informed consent was waived.

From September 2018 to June 2021, we retrieved the data of hospitalized patients screened for HCC due to cirrhosis secondary to HBV infection from Shandong Provincial Hospital affiliated to Shandong First Medical University.

The inclusion criteria for SHCC patients were as follows: (1) patients were diagnosed with SHCC by a radiologist with more than ten years of experience. (2) Patients received two MR examinations, of which SHCC was not apparent on the former but was diagnosed on the latter. The interval between the two examinations was not more than one year. (3) Patients had never been treated with transcatheter arterial chemoembolization (TACE) in the newly developed SHCC lesion. (4) Patients’ MR imaging at the newly developed SHCC lesion site was consistent with the ‘wash-in and wash-out’ phenomenon described in the guidelines. (5) Patients had complete T1-weighted, arterial-phase, portal venous–phase, and delayed-phase MR images at both timepoints. The SHCC lesions on the first MR scan of eligible SHCC patients were included in the SHCC cohort; a normal volume of interest (VOI) from the first MR scan of each eligible SHCC patient was included in the internal-control cohort. Moreover, we also set up an external-control cohort to ensure the robustness of the research findings. The inclusion criteria are as follows: (1) patients were diagnosed with hepatic cyst or haemangioma by a radiologist with more than ten years of experience. (2) There were one to three lesions, and each one measured less than 2.3 cm in diameter. (3) Patients had no history of hepatitis, drug-induced liver damage, or alcohol abuse. (4) Patients had complete T1-weighted, arterial-phase, portal venous–phase, and delayed-phase MR images. The exclusion criteria were as follows: (1) the tumour outline was unclear on MR images in the SHCC cohort. (2) The quality of the MR images was poor in the internal- or external-control cohort. Figure 1 shows the whole experimental design.

Fig. 1
figure 1

The flow chart of the whole experiment includes several steps of data acquisition, registration, outlining, feature extraction and selection, model construction, and prediction

MR image acquisition and registration and delineation of volumes of interest (VOIs)

The MR image acquisition parameters are detailed in Table 1. The MR images were retrieved from the picture archiving and communication (PACS) system, including non-contrast-enhanced T1-weighted (T1WI), arterial phase, portal venous phase, and delayed phase images. In patients with SHCC, we collected two MR scans within an interval of less than 12 months. Imaging characteristics consistent with SHCC lesions could be detected in the 2nd scan but no visible changes could be observed by the naked eye in the exact lesion location in the 1st scan.

Table 1 The detail of the MR image acquisition

To ensure accurate VOI delineation, we used the Elastix toolbox [18] in the 3D slicer open-source software [19] version 4.1.1 (http://www.slicer.org) to register the images successively. The arterial phase images of the 2nd scan were used as a template, and the arterial and other phases of the 1st scan were the targets to be successively registered.

Then, a radiologist with 20 years of experience manually delineated the SHCC tumour VOIs on the template images using ITK-SNAP open-source software version 3.8.0 (Yushkevich P and Gerig G). The tumour delineation covered the entire SHCC tumour lesion in all slices. For the internal-control cohort, the radiologist delineated the normal liver VOIs of these patients on MR images. A normal liver VOI was defined as a lack of imaging abnormality changes associated with SHCC on ten successive levels on both scans. The VOI for the external-control cohort was delineated identically as for the internal-control cohort. Consequently, we obtained 68 SHCC VOIs, 54 internal-control VOIs, and 70 external-control VOIs on the MR images from each phase.

Feature extraction

We performed a rescaling operation to normalize the MR images. The resampled voxel sizes were set to 2 × 2 × 2 mm3 to standardize the slice thickness. We mapped the VOIs onto the registered image to extract radiomics features using the PyRadiomics package [20] version 3.0.1. The radiomics features were generated from the original, wavelet-filtered, and Laplacian of Gaussian (LoG)-filtered images. The features included shape, intensity (‘First-order statistics’), and texture. Texture features included grey-level cooccurrence matrix (GLCM), grey-level size zone matrix (GLSZM), grey-level run length matrix (GLRLM), neighbouring grey-tone difference matrix (NGTDM), and grey-level dependence matrix (GLDM) features.

Feature robustness and reproducibility

The robustness and reproducibility of the features were assessed with the intraclass correlation coefficient (ICC) [21,22,23]. Thirty patients from the SHCC cohorts were chosen randomly for VOI re-delineation by another experienced radiologist two weeks after the first delineation. Features with an ICC coefficient greater than 0.9 were retained and considered to have excellent robustness and reproducibility. Moreover, we concatenated the four-phase MR image features to evaluate whether the joint-phase (all-phase) features provided better discriminability than the single-phase features.

Feature selection

We implemented two types of analysis: intra-group classification and inter-group classification. The intra-group classification involved the 68 SHCC VOIs and 54 internal-control VOIs, and the inter-group classification involved the 68 SHCC VOIs and 70 external-control VOIs.

We randomly divided the intra-group classification (n = 122) and the inter-group classification (n = 138) data into a training set (n = 92 for intra-group classification, n = 104 for inter-group classification) and testing set (n = 30 for intra-group classification, n = 34 for inter-group classification) at a 3:1 ratio. These datasets came from the four sets of single-phase features and the single set of all-phase features. In total, there were ten datasets.

The least absolute shrinkage and selection operator (LASSO) regression model was used for feature selection. Employing regularization, the LASSO regression model adjusts the penalty coefficient value (λ), compresses most of the coefficients to zero, and retains the values with nonzero coefficients. Consequently, the retained features are nonredundant and sparse, potentially preventing the classifier from overfitting. By adjusting the parameter (λ) with the training set, the optimal features were screened via the minimum criteria with tenfold cross-validation.

Classifier and assessment of the performance of different models

We chose the radial basis-function, kernel-based support vector machine (RBF-SVM) as the classifier. First, the most valuable features filtered by the LASSO regression model from the four single phase-based and single combined-phase MR image features were used to train the corresponding classification models with the training set (intra-group and inter-group classification). The optimal parameters of the SVM models were selected via tenfold cross-validation. Second, the efficacy of all models was tested with the testing set (intra-group and inter-group classification). Five models were established based on the RBF-SVM classifier: four according to the individual image phases (T1WI, arterial phase, portal venous phase, and delayed phase), and an all-phase model according to the integration of the four single-phase MR image features. The area under the receiver operating characteristic curve (AUROC), area under the precision-recall curve (AUPRC), sensitivity, specificity, and accuracy were calculated to evaluate classifier performance.

Statistical analysis

The SPSS version 26.0 (http://www.ibm.com/, IBM) and R version 4.0.1 (https://www.r-project.org/) statistical software was used for statistical analysis. The chi-square test was used to assess categorical data. The Shapiro–Wilk test was used to assess the normality of continuous data. The t test was used if the continuous data conformed to a normal distribution; otherwise, the Mann–Whitney U test was used. Normally distributed data are described as the mean ± SD; otherwise, the median (IQR) was used. Differences in AUC values between different models on the testing set were estimated using the Delong test. A two-sided p < 0.05 was regarded as significant.

Results

Baseline characteristics

A total of 124 patients were enrolled in this study, including 54 SHCC patients (ranging from 36 to 73 years of age, with a mean of 56.9 ± (9.35) years) and 70 hepatic cyst or haemangioma patients (ranging from 36 to 73 years of age, with a mean of 48.5 ± (14.88) years). In the SHCC group, 4 patients were women and 50 were men. In the hepatic cyst or haemangioma group, 30 patients were women and 40 were men. There were statistically significant differences in age (p < 0.001 [t value: 3.65]) and gender (p < 0.001 [χ2 value: 16.99]) between the two groups. The interval between the two MRI scans was 4.81 ± (2.67) months. Details of the VOI numbers, volumes, and diameters are provided in supplementary Table 2.

Optimal radiomics signature

We extracted 1132 radiomics features from the VOIs at each phase, consisting of 14 shape, 18 intensity, 68 texture, 688 wavelet, and 344 LoG features. Through the first round of screening using the ICC, 179, 325, 277, and 266 features remained among the T1WI and arterial, portal venous, and delayed phase image features, respectively (Fig. 1a of the supplementary material).

Subsequently, after filtration with LASSO regression, the T1WI, arterial phase, portal venous phase, delayed phase, and all-phase models were built with 9, 6, 6, 23, and 36 optimal radiomics features, respectively (Fig. 1b1, b2 of the supplementary material). The retained features of each model were considered the radiomics signatures in the intra-group classification. The all-phase model consisted of 8 T1W and 12 arterial phase, 8 portal venous phase, and 8 delayed phase image features (Fig. 1c1 of the supplementary material).

In addition, 16, 13, 12, 10, and 18 optimal radiomics features (Fig. 1b3, b4 of the supplementary material) were selected by LASSO regression in the models built from the T1WI, arterial phase, portal venous phase, delayed phase, and all-phase imaging features in the inter-group classification, respectively. The all-phase model was constructed from 7 T1W, 3 arterial phase, 1 portal venous phase, and 7 delayed phase imaging features (Fig. 1c2 of the supplementary material). The optimal radiomics signature of each model is detailed in Table 1 of the supplementary material.

Best performance of the models in intra-group and inter-group classification

For intra-group classification, the performance of the all-phase model was significantly greater than that of the single-phase classifiers. The optimal performance was achieved with an AUROC of 1.00 (95% CI, 1.00–1.00) with the training set and 0.93 (95% CI, 0.85–1.00) with the testing set (Fig. 2) and corresponding AUPRCs of 1.00 and 0.94 (Fig. 3), respectively. Basic first-order statistics (‘Range’ and ‘Maximum’) and high-dimensional texture features (Grey-Level Co-occurrence Matrix [‘DifferenceAverage’] and Grey-Level Size Zone Matrix [‘ZoneVariance’]) contributed to the model construction.

Fig. 2
figure 2

Comparison of the AUROC on the training set (a) and testing set (b) of the different models from the intra-group classification, and training set (c), testing set (d) from the inter-group classification

Fig. 3
figure 3

Comparison of the AUPR on the training set (a) and testing set (b) of the different models from the intra-group classification, and training set (c), testing set (d) from the inter-group classification

Likewise, the all-phase model achieved good performance in inter-group classification with the 18 optimal radiomics features. The RBF-SVM classifier achieved an AUROC of 0.99 (0.99–1.00) with the training set and 0.97 (95% CI, 0.92–1.00) on the testing set (Fig. 2) and corresponding AUPRCs of 1.00 and 0.98 (Fig. 3), respectively. In model construction, the most important contributing features were first-order statistics (‘Maximum’ and ‘Variance’) and high-dimensional texture features (Grey-Level Size Zone Matrix [‘ZoneVariance’] and Grey-Level Dependence Matrix [‘HighGrayLevelEmphasis’]). The performance details of the different models are shown in Table 2.

Table 2 The performance details of different models

Differences in the AUROCs between intra-group and inter-group classification

With the testing set, we compared the performance of the same model in intra-group and inter-group classification according to the AUROC values. The Delong test revealed no differences between the values for the two classifications, p > 0.05 (Table 3). In addition, the top 3 radiomics features from the weighted ranking of the all-phase model were analysed to explore whether there were significant differences between normal tissue and undetectable SHCC lesions (Table 4). The results showed a significant difference (p < 0.05) in the values of the radiomics features between normal tissue area and undetectable SHCC lesions (Fig. 4). Figure 5 depicts expression heatmaps for the top two weighted features of the all-phase model for intra-group and inter-group classification. The heatmaps reveal that the heterogeneity within the tumour cannot be reflected by MRI images, increasing the interpretability of the radiomics features.

Table 3 Comparison of the AUROC between inter- and intra-group classification on the testing set
Table 4 The top 3 radiomics features weight of inter- and intra-group classification on all-phase model
Fig. 4
figure 4

Statistical analysis of the top 3 radiomics features a weight of intra-group classification (a) and inter-group classification (b) on the all-phase model

Fig. 5
figure 5

Feature visualization of the gray level dependence matrix (GLDM) of arterial phase MRI by wavelet filter (a). Feature visualization of the first order Range of T1WI phase MRI by Laplacian of Gaussian (LoG; σ = 2 mm) filter (b). Feature visualization of the gray level size zone matrix (GLSZM) of arterial phase MRI by wavelet filter (c). Feature visualization of the first order Maximum of delayed phase MRI by none filter (d)

Discussion

In this study, we aimed to develop a multiphase MRI-based radiomics model to evaluate the risk of microscopic lesions or high-risk nodules in HCC patients with radiologically undetectable lesions. This model showed clearly excellent performance in both intra-group and inter-group classification. These results are clinically significant in that such models can help physicians find potential malignant lesions for HCC and/or other liver-related disease patients in the very early stage.

In a majority of Chinese HCC patients, the disease tends to be secondary to HBV infection [24, 25], and a large proportion of patients are generally identified at an advanced stage [26]. Cirrhosis increases the risk of SHCC/HCC [27]. The transformation of cirrhotic nodules into HCC is a continuous and complex process [28], following the development of low-grade dysplastic nodules (LGDNs) to high-grade dysplastic nodules (HGDNs) to HCC [29]. Among them, DN is considered to be a precancerous lesion of SHCC/HCC [30,31,32,33,34,35]. DNs usually present with nontypical manifestations of influence on gadoxetic acid (GD-EoB-DTPA)-enhanced MRI [36, 37]. According to a study by Eremites SC et al. [38], LGDNs result in only a 4% increase in arterial blood supply, HGDNs, an approximately 17–32% increase, and HCC, an approximately 94% increase. Therefore, a large proportion of DNs presents with uniform T1 and T2 signals on MRI and does not show enhancement in the arterial phase [39], which makes it difficult to distinguish [40, 41]. The diagnosis of SHCC on MR depends on the increase in unmatched arterioles, the decrease in blood supply to the portal vein, the deposition of iron and lipids, and the changes in the formation of envelopes. From our experience, even before the radiologist can recognize SHCC/HCC lesions on MRI by the naked eye, the precancerous lesion may already exist. During the development of DNs or SHCC, few clinical symptoms are present, which can be fatal. Therefore, early inspection, detection, intervention, and treatment are particularly important.

To date, there have been several radiomics studies on hepatocellular carcinoma. Li Yang et al. [16] achieved an AUROC of 0.861 with the validation cohort of an MRI-based model incorporating significant clinical radiological factors and a fusion radiomics signature obtained from hepatobiliary phase (HBP) images. Huang et al. [17] reported that radiomics feature models based on CE-MR images had favourable performance in predicting HCC, with mean AUROCs of 0.712, 0.784, 0.771, and 0.774 when constructed from arterial phase, portal venous phase, delayed phase, and hepatobiliary phase features, respectively. Many studies have been based on analyses of lesions that have already changed on visual imaging [42,43,44,45,46,47]. This makes sense if the lesion is detected before it can be observed by the naked eye.

In the current study, we included both an internal-control and an external-control cohort. If we had only set up an internal-control cohort, there could have been biases in the experimental results because we would not have been able to guarantee that the selected normal VOIs would not be disturbed by the information from underlying malignant lesions (although we attempted to avoid this as much as possible). Hence, we also selected a group of normal liver VOIs from patients diagnosed with hepatic haemangiomas or cysts as the external-control cohort to ensure a more stable experiment. The experimental results also confirm the existence of biases in one aspect; the AUROC for inter-group classification was generally better than that for intra-group classification with the testing set. However, this difference was not significant, with p > 0.05 obtained with the Delong test. This result indicates that despite the existence of biases, these models can still exert a predictive efficacy on the corresponding populations. The goal of the intra-group classification analysis is to screen HCC patients early to determine whether new high-risk lesions have developed, while the purpose of inter-group classification analysis is to determine whether there are risky lesions related to HCC in the screening of non-HCC populations. We invited two radiologists (Mj.X. and Lq.C.) with five years of experience to perform a reader test with the testing set (Fig. 2 of the supplementary material). In intra-group classification, the AUROCs obtained by the two radiologists were 0.62 and 0.75, respectively. In the inter-group classification, the AUROCs obtained by the two radiologists were 0.71 and 0.73, respectively. When lesions are still in the microscopic stage, very early diagnosis can be greatly challenging for radiologists. Our model can assist radiologists in diagnosing these lesions ultra-early, producing great benefit to the patients.

Radiomics features related to image transformation are highly important for revealing information on very small lesions. The most common such transformation is the application of filters, as the image information obtained by different filter transformations may be different. According to Zhang et al. [48], the AUROC of a model constructed from filter-free radiomics features was 0.728 with the validation cohort. In a study of Zhao et al. [45], the radiomics features extracted from images processed by the wavelet filter were not included; only the original features and the features extracted from images transformed by the LoG filter were included. Among them, most of the features for constructing the optimal model were LoG features, comprising approximately ~ 75%, and the AUROC was 0.771 with the validation cohort. Throughout our research, in the best models for intra-group and inter-group classification, the features extracted from images subjected to LoG filtering and wavelet transformation accounted for ~ 97% and ~ 78% of the model construction, and the AUROCs were 0.93 and 0.97 with the testing set, respectively. The information contained in the original images is not sufficient to explain all the phenomena and meet clinical needs. For further image analysis and research, some edge detection methods are often applied, such as LoG filters and wavelet transforms. The LoG is composed of a Gaussian kernel and Laplacian kernel; the latter is sensitive to areas with rapidly changing intensities, highlighting specific texture information in the original texture image and enhancing edges. A Gaussian smoothing filter is usually needed to smooth the image before the Laplacian operation to reduce the susceptibility to noise. Wavelet transformation produces good local characteristics. When the scale of the wavelet function is large, the anti-noise ability is strong, and when the scale of the wavelet function is small, the ability to extract image details is strong. Therefore, a balance between suppressing noise and extracting image edge details can be achieved. We recognized that some features reflecting tumour heterogeneity and microenvironment [7] were intensity (first-order) and texture (GLCM, GLSZM, etc.) features, consistent with other studies [22, 49,50,51,52]. Depending on the organ being imaged and the type of imaging modality, the first-order statistics may or may not have been the same across all applications. In intra-group classification, the top three features used to construct the best model were ‘Large Dependence High Grey Level’, ‘Range’, and ‘Maximum’ (Table 4). We found that the median values of these features in the lesion areas were generally higher than that in the normal areas [367.65 (175.13, 712.37) vs. 247.59 (159.17, 364.83), 32.73 (21.89, 47.89) vs. 16.17 (13.31, 24.88), and 30.61 (21.39, 47.29) vs. 19.83 (13.81, 28.25)]. In inter-group classification, the top three features used to construct the best model were ‘Maximum’, ‘Zone Variance’, and ‘Variance’ (Table 4). The average value of the ‘Maximum’ feature of the lesion areas was greater than that of the normal area [238.23 ± 58.92 vs. 220.63 ± 44.07]. The median of the ‘Variance’ feature value of the lesion areas was greater than that of the normal area [107.86 (38.94, 205.54) vs. 12.99 (5.01, 28.06)], but the ‘Zone Variance’ value was lower than that of the normal area [57.72 (59.56, 193.61) vs. 115.79 (12.76, 119.92)]. ‘Entropy’ and ‘Uniformity’ are two commonly used features computed in medical imaging. In a liver study [53], researchers showed that the ‘Total Entropy’ of the liver of healthy people was higher than that of patients with liver metastases. We calculated ‘Sum Entropy’ instead of ‘Total Entropy’ and found that the results corresponded to the above conclusions [1.39 (1.17, 1.64) vs. 1.44 (1.38, 1.50)]. Griethuysen et al. [18] stated that ‘Zone Entropy’ measures the uncertainty/randomness in the distribution of zone sizes and grey levels. The higher values obtained in the lesion indicates more heterogeneity [3.41 (2.58, 4.05) vs. 2.00 (1.58, 2.58)] than in the normal area.

Limitations

The limitations of this study are as follows: first, the contour of the tumour area relied on manual delineation by an experienced radiologist, which required considerable time and energy expenditures. Second, this study relies on powerful registration algorithms, which can directly affect the accuracy of segmentation. For this reason, we asked the experienced radiologists to double-check the registration effect to minimize the impact of registration uncertainty on the experimental results. Third, a small number of samples were included, and this was a single-centre, retrospective study. The results of this study thus reflect only the patients at the centre and are not representative of the general population. Therefore, multicentre, large sample, and prospective studies are needed to further improve the results of this study.

Indeed, the essence of our experiment is hindsight. When applied to reality, we cannot define the VOI on the first MRI scan because there are no abnormalities when observed by the naked eye. However, the essence of research is to solve practical problems. The clinical transformations of this research in the future could entail the following: first, we need a software algorithm to automatically or semiautomatically identify and segment the liver. Second, a sliding convolution kernel of a specific size, e.g., 5 × 5 or 7 × 7, would extract features from left to right sequentially for the whole liver or a specific liver segment and input these quantitative data into our model to obtain the predicted probability. These probabilities can be visualized with a heatmap, which conveniently displayed which areas are at high risk. This could help clinicians make decisions and intervene to achieve precision medicine.

Conclusions

Although new lesions in SHCC patients cannot be observed on MR imaging, a combination of radiomics features and machine learning algorithms can be sensitive to underlying abnormalities that cannot be detected by the naked eye.

In our study, we reported that the optimal model, based on the integration of radiomics features from four phases of MR images, could achieve excellent performance in evaluating microscopic pre-hepatocellular carcinoma lesions.