Introduction

Primary liver cancer (PLC) includes hepatocellular carcinoma (HCC), intrahepatic cholangiocarcinoma (ICC) and other rare type, and ICC accounts for 10–15% of PLC [1]. ICC originates from the intrahepatic secondary bile duct and has three gross pathological patterns: mass-forming (78%), periductal infiltrating (16%) and intraductal papillary (6%) subtypes, and the overall survival of patients with mass-forming ICC is shorter [2, 3]. The incidence and mortality of ICC have been increasing over the past few decades [4, 5]. Partial hepatectomy is an effective treatment [6], however, many patients with advanced ICC are inoperable due to delayed diagnosis [7]. Recently, molecular profiling has revealed subtypes of ICC [8], therefore, molecular targeted therapies are expected to improve the prognosis of ICC patients [9].

Ki67 protein is a nuclear antigen related to cell proliferation and positively correlated with cancer aggressiveness [10]. Some studies suggest that Ki67 is a poor prognostic predictor in patients with ICC [11, 12]. Besides, Ki67 is an attractive therapeutic target for malignant cancers [13]. For instance, Dinaciclib could suppress ICC growth by suppressing Ki67 protein [14]. Zhang et al. [15] also found that knockout of lncRNA CASC15 could suppress ICC progression by inhibiting Ki67 protein expression. Therefore, accurate prediction of Ki67 status in ICC patients is a predictor for treatment efficacy evaluation and outcome prediction. However, it is difficult to determine Ki67 status of ICC lesions by routine imaging and laboratory tests.

Currently, radiomics is defined as a high-throughput extraction of numerous image features from medical images, and independent features are applied to the construction of diagnostic, predictive and prognostic models [16]. Several studies have achieved good predictive efficiency in the Ki67 prediction of several malignant cancers, including HCC [17], breast cancer [18] and lung cancer [19]. However, the development and validation of Ki67 status prediction model for ICC lesions based on radiomics features has not yet been studied.

Serum carbohydrate antigen 19-9 (CA199) levels are often elevated in ICC patients, rather than elevated serum alpha-fetoprotein (AFP) levels like those in HCC patients. And typical mass-forming ICC lesions exhibit several imaging features like intrahepatic duct dilatation, hepatic capsular retraction and gradual and filling enhancement pattern. Therefore, preoperative diagnosis of ICC is not difficult, whereas, to date, there is no model to predict the Ki67 status of ICC lesions preoperatively. In our study, we aimed to develop and validate a combined Ki67 prediction model for mass-forming ICCs incorporating clinicoradiological features and MRI radiomics. Importantly, the combined Ki67 prediction model will be further validated by a time-independent test cohort.

Materials and methods

Patients

This retrospective-prospective study was approved by the Zhongshan Hospital, Fudan University (Ethics approval No. B2021-325R) ethics committees, and patient informed consent was waived. From June 2015 to July 2019, 178 patients with postoperative pathologically confirmed ICC from the Zhongshan Hospital were randomly divided into the training (n = 124, 79 high Ki67 status group and 45 low Ki67 status group) and validation cohorts (n = 54, 33 high Ki67 status group and 21 low Ki67 status group) in a ratio of 7:3. From August 2019 to April 2022, a test cohort of postoperative pathologically confirmed ICC patients (n = 49, 28 high Ki67 status group and 21 low Ki67 status group) from the Zhongshan Hospital was prospective grouped with the same inclusion criteria, and baseline clinicoradiological features of ICC patients in test cohort are shown in Table S1. The main inclusion criteria (Fig. 1): (a) single lesion with longest diameter ≥ 1.0 cm; (b) without previous treatment history of PLC; (c) complete histopathological description of ICC lesion; (d) all patients underwent MRI examination within 1 month before hepatectomy; (e) adequate MR image quality.

Fig. 1
figure 1

Study flowchart of the enrolled patients

Clinical features retrieval

The demographic data, preoperative serum AFP, carcinoembryonic antigen (CEA), CA199, history of hepatitis B virus (HBV) serum markers and HBV-DNA loads were retrospectively retrieved (Table 1). All PLC samples were sampled using 7-point baseline sampling protocol [20]. Histopathological features including lesion number, Edmondson-Steiner grade and Ki67 status were evaluated by two experienced abdominal pathologists. Anti-human Ki67 rabbit monoclonal antibodies (Maixin Biotech Co., Ltd, Fuzhou, China) were used with a dilution of 1:50 in immunohistochemistry, and the Ki67 labeling index (LI) was recorded. We classified ICC lesions into low Ki67 status group (Ki67 LI < 25%) and high Ki67 status group (Ki67 LI ≥ 25%) as previous studies [11, 21].

Table 1 Baseline clinicoradiological features of ICC patients in training and validation cohorts

Gd-DTPA MR acquisition protocol

Contrast-enhanced MRI was performed by intravenous injection of 0.1 mmol/kg Gd-DTPA, followed immediately by a 20 ml saline flush at 2 ml/s. Taking 3.0 T uMR 770 scanner (United Imaging Healthcare, Shanghai, China) as an example, MR sequences involved in this study include axial T2-weighted imaging with fat suppression (T2WI-FS), diffusion-weighted imaging (DWI, with b values of 0, 50, 500 s/mm2), axial pre-contrast quick three-dimensional T1-weighted imaging (quick3d T1WI) with fat suppression and post-contrast dynamic-enhanced quick 3d T1WI at arterial phase (AP, 20–30 s), portal venous phase (PVP, 60–70 s) and delayed phase (DP, 180 s) (Table S2).

Imaging features analysis

Imaging features were assessed independently by 2 blinded and experienced abdominal radiologists (C.W.Z. and X.L. with 10 and 15 years of experience, respectively). In case of any discrepancy, a consensus was generated after discussion. Imaging features including: (a) tumor size; (b) tumor morphology; (c) signal intensity (SI) on T1WI, T2WI-FS and DWI images; (d) target sign (peripheral hyperintense with central isointense/hypointense) [22]; (e) rim enhancement on AP (complete and incomplete rim); (f) enhancement pattern; (g) the liver imaging reporting and data system (LI-RADS) (Version 2018) [23]; (h) intrahepatic duct dilatation; (i) hepatic capsular retraction (retraction of hepatic capsular adjacent to the lesion); (j) visible vessel penetration [24]; (k) peripherally hepatic enhancement (peritumoral enhancement on any phase).

Radiomics analysis

Tumor segmentation

The tumor segmentation was performed in the ITK-SNAP software. Volumes of interests (VOIs) were manually delineated based on 6 MR sequences (DWI with b value of 500 s/mm2, T2WI-FS, pre-T1WI, AP, PVP and DP, respectively) by an abdominal radiologist with 5 years of experience (X.L.Q.) and checked by a senior abdominal radiologist (X.L.). Besides, 30 MR images of ICC lesions were randomly selected and delineated again by X.L.Q. to assess the test–retest reliability. Blinded to segmentations delineated by X.L.Q., these 30 MR images of ICC lesions were delineated again by C.W.Z. to assess the inter-observer variability.

Feature extraction

To reduce heterogeneity among MR images, all images were resampled to an isotropic voxel size (1 × 1 × 1 mm3) using bilinear interpolation, and intensities were normalized with a fixed bin width. Images were then normalized by z-score to obtain a standard normal distribution of image intensity. The extraction of radiomics features was performed by the uAI Research Portal (Version: 20210730), in which PyRadiomics (https://pyradiomics.readthedocs.io/en/v3.0.1/) was embedded. 2600 radiomics features were extracted from each sequence, and these features were classified into first-order statistics, shape-based features, texture features, and high-order features.

Feature selection

Firstly, extracted radiomics features were applied with a z-score \(\left( {y_{i} = \frac{{x_{i} - \overline{x}}}{\sigma \left( x \right)}} \right)\) normalization to eliminate index dimension difference. Secondly, features with intraclass correlation coefficients ≥ 0.75 in both test–retest and inter-observer settings were considered as reproducible radiomics features and were chosen for further analysis. Finally, the correlation analysis, multicollinearity analysis and least absolute shrinkage and selection operator (LASSO) methods were performed to select optimal prediction features (Table S3, Figure S1).

Model construction

Clinical model and imaging model were constructed by corresponding independent clinicoradiological predictors. For models based on single or multiple MR sequences and fusion models, linear discriminant analysis (LDA) was used to find the best linear combination of the above optimal prediction features to maximize the discrimination between patients with high and low Ki67 status. On the basis that LDA has a certain classification ability, taking the LDA results as the input feature of logistic regression (LR) and random forest (RF) to further accurately identify patients with high or low Ki67 status, and the classification threshold is 0.5.

Model evaluation and verification

Receiver operating characteristics curves (ROC) were plotted and the area under curve (AUC), sensitivity, specificity, accuracy, precision and F1-score were calculated to evaluate the performance of models. Delong test was used to compare the predictive efficiency between models, and we applied the Benjamini–Hochberg method to correct the false discovery rate (FDR) [25]. Hosmer–Lemeshow test was performed to evaluate the consistency between actual Ki67 status and predicted Ki67 status, and calibration curve was plotted. Decision curve and clinical impact curve were plotted to verify the clinical practicability of models by quantifying the net benefits at different risk thresholds. The confusion matrixes of the fusion models in three models were plotted. The workflow of the above radiomics analysis is shown in Fig. 2.

Fig. 2
figure 2

Study flowcharts of radiomics analysis

Correlation analysis

Radiomics features extracted from pre-T1WI, PVP and DP sequences were correlated with HBV, arterial rim enhancement on AP, enhancement pattern and Ki67, respectively, and heatmaps were performed. Because the different ranges of features, correlation coefficients were calculated for continuous versus binary variables, continuous variables, and non-continuous variables by using Biserial, Pearson, and Spearman correlation analyses, respectively.

Statistical analysis

Student’s t test was used when variables were normal distribution, and Mann–Whitney U test was used when variables were non-normal distribution for continuous variables and chi-square test was used for qualitative variables to analyze whether there were statistically significant differences. Univariate and multivariate analysis were used for the selection of independent predictor. Statistical analysis was performed with R software (version 4.1.1). All statistical tests were two-sided, and p value lower than 0.05 were considered statistically significant.

Results

Performance of clinicoradiological features

124 and 54 patients were divided into training and validation cohorts, and baseline clinicoradiological features are shown in Table 1. At the univariate analysis, gender, HBV, CEA, arterial rim enhancement on AP and enhancement pattern are significantly related to Ki67 status. The multivariate analysis shows HBV, arterial rim enhancement on AP and enhancement pattern are independent predictors of Ki67 status in ICC patients (Table 2). In the training cohort and validation cohort, the clinical model constructed with HBV and imaging model constructed with arterial rim enhancement on AP and enhancement pattern both exhibit poor predictive efficiency. The clinical + imaging model also shows an unsatisfactory predictive efficiency (AUCtraining = 0.714, AUCvalidation = 0.535) (Table 3, Fig. 3A, B). Example of representative clinicoradiological features of ICC with high Ki67 status are shown in Figure S2.

Table 2 Univariate and multivariate analyses of clinicoradiological features related with Ki67 status in ICC
Table 3 The performance of clinicoradiological features, radiomics features and the combined predictive models for predicting Ki67 status in ICC patients
Fig. 3
figure 3

Comparison of receiver operating characteristics (ROC) curves for Ki67 status prediction in training (A, D), validation (B, E) and test (C) cohorts by logistic regression

Performance of radiomics features using single MR sequence

Totally, 76 robust radiomics features were selected from 6 single MR sequences (Figure S1, Table S3 and S4), and the AUCs of 6 single MR sequence models constructed with robust radiomics features are displayed (Table 3, Fig. 3A, B). Although all single MR sequence models show poor specificity, which prompts single MR sequence model predictive efficiency is unreliable, T1, T1V and T1D models yield stable AUCs between training and validation cohorts, and T1V model shows the most stable predictive efficiency with ΔAUC = 0.009. Therefore, multiple-sequence models were constructed based on above three single-sequence models.

Performance of radiomics features using multiple MR sequence

Two-sequence models have good predictive efficiency in training and validation cohorts. Among them, T1V + T1 and T1V + T1D models show stable predictive efficacy and higher AUCs in validation than training cohort. Thereby, the final three-sequence radiomics model incorporates T1, T1V and T1D models, and it shows a satisfying predictive performance (AUCtraining = 0.860, AUCvalidation = 0.843) (Table 3, Fig. 3D, E). And the final radiomics model performs better than T1V + T1D (FDR p = 0.018) model in the training cohort (Table 4). However, all of these three models show unsatisfying predictive performance in test cohort (AUC = 0.716–0.814) (Table 3 and Fig. 3C).

Table 4 The comparison of models in training, validation, and test cohorts

Performance of fusion models using clinicoradiological and radiomics features

Clinical + radiomics model and imaging + radiomics model achieve similar predictive efficiency in training, validation and test cohorts, but they are better than clinical + imaging model (FDR p < 0.05). The combined Ki67 prediction model incorporating clinical, imaging and radiomics model achieves excellent predictive efficiency in training (AUC = 0.904, 95% CI 0.849–0.960), validation (AUC = 0.870, 95% CI 0.775–0.965) and test (AUC = 0.815, 95% CI 0.688–0.941) cohorts. The combined Ki67 prediction model exhibits better than clinical + imaging model (FDR p < 0.001), clinical + radiomics model (FDR p = 0.010) and imaging + radiomics model (FDR p = 0.018) (Tables 3, 4, Fig. 3C–E).

Evaluation and verification of the combined Ki67 prediction model

The flowchart of the combined Ki67 prediction model is shown in Fig. 4. Calibration curves show the goodness of fit between the predicted Ki67 status by using the combined Ki67 prediction model and actual Ki67 status in the training (p = 0.787) and validation (p = 0.742) cohorts (Fig. 5A, B). Decision curves show that radiomics model, clinical + radiomics, imaging + radiomics model and the combined Ki67 prediction model could obtain net benefit by predicting Ki67 status of all range risk threshold, and the combined Ki67 prediction model exhibits the highest net benefit (Fig. 5C). To further assess the clinical utility of models, clinical impact curves show that the combined Ki67 model has the largest risk threshold range of 0.5–1.0, and the predicted Ki67 status is highly consistent with the actual Ki67 status with risk thresholds ranging between 0.5 and 1.0 (Fig. 5D–I). The heatmap of the correlation between Ki67 status, clinicoradiological features and radiomics features is shown in Fig. 6A and Table S5. The confusion matrixes of the fusion models in the training, validation and test models are shown in Fig. 6B.

Fig. 4
figure 4

The predictive flowchart of the combined Ki67 prediction model

Fig. 5
figure 5

Evaluation and verification of the fusion models. (A, B) Calibration curves of the fusion models in term of agreement between predicted and actual Ki67 status in the training (A) and validation (B) cohort. X-axis represents predicted Ki67 status, Y-axis represents actual Ki67 status, and dashed line represents the ideal prediction. (C) Decision curves of the fusion models. The grey line represents the assumption that all patients with high Ki67 status, and the horizontal black line represents the assumption that no patient with high Ki67 status. (D–I) Clinical impact curves of the fusion models. The dashed line is the actual number of ICC patients with high Ki67 status. The combined Ki67 model has the largest risk threshold range of 0.5–1.0, and the predicted Ki67 status is highly consistent with the actual Ki67 status with risk thresholds ranging between 0.5 and 1.0

Fig. 6
figure 6

(A) The heatmap of the correlation between Ki67 status, clinicoradiological features and radiomics features. (B) The confusion matrixes of the fusion models in the training, validation and test models

Discussion

In this study, we established a multiparametric model for predicting Ki67 status in ICC patients preoperatively. The final combined Ki67 prediction model incorporates clinical, imaging and radiomics features, and it exhibits an excellent predictive efficiency.

As previous studies [11, 21], we classified ICC lesions into low Ki67 status group and high Ki67 status group by 25% in our study. However, in the majority of studies on predicting Ki67 status of HCC preoperatively, the cut-off value of low and high Ki67 status is usually selected as 10–15% [17, 26]. This may be due to the fact that ICC is a much more aggressive cancer than HCC [27, 28]. Tsokos et al. [29] found that well-differentiated ICC had higher Ki67 LI than benign proliferations (22.7% vs. 1.4%, respectively; p < 0.001), and none of the benign biliary lesions had Ki67 LI greater than 10%. Therefore, it is inferred that 10% is more likely to be the cut-off value for differentiating benign biliary lesion and ICC, and 25% are more rational in differentiating low and high Ki67 status in ICC lesions.

The multivariate analysis shows HBV is the only independent clinical predictor of Ki67 status in our study. HBV and cirrhosis are risk factors for ICC, with overall odds ratios of 5.10 and 22.92, respectively [30], and Tovoli et al. [27] also found that cirrhotic patients with ICCs have different clinical presentation and outcomes. However, Peng et al. [31] showed there was no statistical differences between low and high Ki67 status in ICC patients, it may be due to an irrational cut-off value of 10% was selected in his study. The multivariate analysis also shows that arterial rim enhancement on AP and enhancement pattern are two independent imaging predictors of Ki67 status. Min et al. [32] found that arterial peripheral rim enhancement pattern was prognostic factor for increased risk of death, which is consistent with the relationship between arterial rim enhancement on AP and Ki67. Our study also suggests that ICC lesions with enhancement pattern like HCC (wash-in and wash-out) may yield a higher Ki67 LI.

Since neither clinical model nor imaging model nor clinical + imaging model can achieve desirable AUC, so we further analyze the predictive performance of radiomics features using 6 single MR sequences. Obviously, none of them can meet the prediction requirements. But the single MR sequence models including T1, T1V and T1D models show stable predictive efficiency, more importantly, pairwise combination models of these three models yield stable predictive efficacy and higher AUCs in validation than training cohort. Thereby, the final radiomics model constructed with T1, T1V and T1D models may be more reliable. Although the final radiomics model performs better than T1V + T1D models in the training cohort, it performs poorly in test cohort and its specificity is unsatisfying, so a fusion model incorporates clinicoradiological and radiomics features is needed. Clinical + radiomics model, imaging + radiomics and the combined Ki67 prediction model achieve better than clinical + imaging model, indicating that radiomics features are vital to the prediction of Ki67 status in ICC patients. Decision curves show that the combined Ki67 model could predict Ki67 status in all range risk threshold and obtain the highest net benefit. Clinical impact curves show that the combined Ki67 model has the largest risk threshold range of 0.5–1.0 to identify the Ki67 status accurately.

The principle of the LASSO algorithm is to compress coefficients of features by introducing a regularization parameter, and remove some features with zero coefficients, so as to achieve the purpose of feature selection [33]. Rad-score is obtained by matrix multiplication of the features and their coefficients obtained by the LASSO algorithm, usually as an independent prediction signature. However, Rad-score completely relies on the feature selection results of LASSO, and the category information is not directly correlated. LDA is a supervised feature dimensionality reduction method, which takes the category information studied as a priori knowledge [34]. The main purpose of LDA was to maximize the variation between samples of different categories and further fit a combined signature that was more suitable for discriminating high and low Ki67 status of ICCs in our study. Therefore, the combined signature obtained by the LDA method, with maximum inter-class variance and minimum intra-class variance, may could make the samples to be predicted obtain the best separability.

Recently, there have been several studies on the Ki67 prediction in HCC based on CT [26] and MR [17, 35], but there is no study on the Ki67 prediction in ICC based on CT or MR. In our study, a total of 22 radiomics features (T1WI image: 10, PVP image: 9, DP image: 7) are correlated with the Ki67 status (Fig. 6A and Table S5). And T1V_firstorder_Maximum (r = − 0.225, p = 0.002), T1V_glszm_LowGrayLevelZoneEmphasis (r = 0.216, p = 0.004), T1V_ngtdm_Busyness (r = − 0.210, p = 0.005), and T1D_glszm_LargeAreaEmphasis (r = − 0.212, p = 0.005) have strong correlation, which happens to explain why T1V model shows the most sable predictive efficiency among single MR sequence models. Peng et al. [31] predicted Ki67 status in ICC based on ultrasound radiomics features, principal component analysis as an unsupervised feature dimensionality reduction method, was used before LASSO in the feature selection, therefore, the features finally used to predict Ki67 status were weakly interpretable. The study about Ki67 status prediction in HCC conducted by Ye et al. [17] was similar to our study, only two glrlm features (LongRunHighGrayLevelEmphasis, LongRunLowGrayLevelEmphasis) are identical to our study, suggesting that these two features may be independently related to the Ki67 status, regardless of tumor type.

There are several limitations in our study. Firstly, we define “Ki67 LI ≥ 25%” as high Ki67 status in our study according to previous studies, however, the reason why the Ki67 cut-off value of ICC is higher than that of HCC needs further study. Secondly, selection bias is inevitable in retrospective study, and an estimation of the effect of the splitting procedure may be needed in future study. Thirdly, there is no study on the Ki67 prediction in ICC based on MR in the past, thus more studies in this area are needed to compare and verify our study, and the correlation between radiomics features and Ki67 status, complex clinicoradiological features need to be further explained. Finally and importantly, larger cohorts from other centers are needed to be enrolled for prospective validation of our Ki67 prediction model.

In summary, the combined Ki67 model incorporating clinical feature (HBV), imaging features (arterial rim enhancement on AP and enhancement pattern) and radiomics features (on T1, T1V and T1D sequences) is a potential biomarker in Ki67 prediction, and the flowchart of the combined Ki67 prediction model may be a potential tool in Ki67 stratification of ICC patients (Fig. 4).