Development and validation of combined Ki67 status prediction model for intrahepatic cholangiocarcinoma based on clinicoradiological features and MRI radiomics

Purpose Incidence and mortality of intrahepatic cholangiocarcinoma (ICC) have been increasing over the past few decades, and Ki67 is an adverse prognostic predictor and an attractive therapeutic target for ICC patients. Thus, we aim to develop and validate a combined Ki67 prediction model for ICC patients. Materials and methods Preoperative contrast-enhanced MR images were collected from 178 patients with postoperative pathologically confirmed ICC, and randomly divided into training and validation cohorts in a ratio of 7:3 (124:54). A time-independent test cohort of 49 ICC patients was used for validation. Independent clinicoradiological features of Ki67 status were determined by multivariate analysis. Optimal radiomics features were selected by least absolute shrinkage and selection operator logistic regression and linear discriminant analysis was used to construct combined models. The prediction efficacy of combined model was assessed by receiver operating characteristics curve, and verified by its calibration, decision and clinical impact curves. Results HBV (p = 0.022), arterial rim enhancement (p = 0.006) and enhancement pattern (p = 0.012) are independent clinicoradiological features. The radiomics model achieves good prediction efficacy in the training cohort (AUC = 0.860) and validation cohort (AUC = 0.843). The combined Ki67 prediction model incorporates clinicoradiological and radiomics features, and it yields desirable predictive efficiency in test cohort (AUC = 0.815). Decision curves and clinical impact curves further validate that the combined Ki67 prediction model can achieve net benefits in clinical work. Conclusion The combined Ki67 model incorporating HBV, arterial rim enhancement, enhancement pattern and radiomics features is a potential biomarker in Ki67 prediction and stratification. Supplementary Information The online version contains supplementary material available at 10.1007/s11547-023-01597-7.


Introduction
Primary liver cancer (PLC) includes hepatocellular carcinoma (HCC), intrahepatic cholangiocarcinoma (ICC) and other rare type, and ICC accounts for 10-15% of PLC [1]. ICC originates from the intrahepatic secondary bile duct and has three gross pathological patterns: mass-forming (78%), periductal infiltrating (16%) and intraductal papillary (6%) subtypes, and the overall survival of patients with mass-forming ICC is shorter [2,3]. The incidence and mortality of ICC have been increasing over the past few decades [4,5]. Partial hepatectomy is an effective treatment [6], however, many patients with advanced ICC are inoperable due to delayed diagnosis [7]. Recently, molecular profiling has revealed subtypes of ICC [8], therefore, molecular targeted therapies are expected to improve the prognosis of ICC patients [9].
Ki67 protein is a nuclear antigen related to cell proliferation and positively correlated with cancer aggressiveness [10]. Some studies suggest that Ki67 is a poor prognostic predictor in patients with ICC [11,12]. Besides, Ki67 is an attractive therapeutic target for malignant cancers [13]. For instance, Dinaciclib could suppress ICC growth by suppressing Ki67 protein [14]. Zhang et al. [15] also found that knockout of lncRNA CASC15 could suppress ICC progression by inhibiting Ki67 protein expression. Therefore, accurate prediction of Ki67 status in ICC patients is a predictor for treatment efficacy evaluation and outcome prediction. However, it is difficult to determine Ki67 status of ICC lesions by routine imaging and laboratory tests.
Currently, radiomics is defined as a high-throughput extraction of numerous image features from medical images, and independent features are applied to the construction of diagnostic, predictive and prognostic models [16]. Several studies have achieved good predictive efficiency in the Ki67 prediction of several malignant cancers, including HCC [17], breast cancer [18] and lung cancer [19]. However, the development and validation of Ki67 status prediction model for ICC lesions based on radiomics features has not yet been studied.
Serum carbohydrate antigen 19-9 (CA199) levels are often elevated in ICC patients, rather than elevated serum alpha-fetoprotein (AFP) levels like those in HCC patients. And typical mass-forming ICC lesions exhibit several imaging features like intrahepatic duct dilatation, hepatic capsular retraction and gradual and filling enhancement pattern. Therefore, preoperative diagnosis of ICC is not difficult, whereas, to date, there is no model to predict the Ki67 status of ICC lesions preoperatively. In our study, we aimed to develop and validate a combined Ki67 prediction model for mass-forming ICCs incorporating clinicoradiological features and MRI radiomics. Importantly, the combined Ki67 prediction model will be further validated by a time-independent test cohort.

Patients
This retrospective-prospective study was approved by the Zhongshan Hospital, Fudan University (Ethics approval No. B2021-325R) ethics committees, and patient informed consent was waived. From June 2015 to July 2019, 178 patients with postoperative pathologically confirmed ICC from the Zhongshan Hospital were randomly divided into the training (n = 124, 79 high Ki67 status group and 45 low Ki67 status group) and validation cohorts (n = 54, 33 high Ki67 status group and 21 low Ki67 status group) in a ratio of 7:3. From August 2019 to April 2022, a test cohort of postoperative pathologically confirmed ICC patients (n = 49, 28 high Ki67 status group and 21 low Ki67 status group) from the Zhongshan Hospital was prospective grouped with the same inclusion criteria, and baseline clinicoradiological features of ICC patients in test cohort are shown in Table S1. The main inclusion criteria ( Fig. 1): (a) single lesion with longest diameter ≥ 1.0 cm; (b) without previous treatment history of PLC; (c) complete histopathological description of ICC lesion; (d) all patients underwent MRI examination within 1 month before hepatectomy; (e) adequate MR image quality.

Clinical features retrieval
The demographic data, preoperative serum AFP, carcinoembryonic antigen (CEA), CA199, history of hepatitis B virus (HBV) serum markers and HBV-DNA loads were retrospectively retrieved (Table 1). All PLC samples were sampled using 7-point baseline sampling protocol [20]. Histopathological features including lesion number, Edmondson-Steiner grade and Ki67 status were evaluated by two experienced abdominal pathologists. Anti-human Ki67 rabbit monoclonal antibodies (Maixin Biotech Co., Ltd, Fuzhou, China) were used with a dilution of 1:50 in immunohistochemistry, and the Ki67 labeling index (LI) was recorded. We classified ICC lesions into low Ki67 status group (Ki67 LI < 25%) and high Ki67 status group (Ki67 LI ≥ 25%) as previous studies [11,21].

Tumor segmentation
The tumor segmentation was performed in the ITK-SNAP software. Volumes of interests (VOIs) were manually delineated based on 6 MR sequences (DWI with b value of 500 s/ mm 2 , T2WI-FS, pre-T1WI, AP, PVP and DP, respectively) by an abdominal radiologist with 5 years of experience (X.L.Q.) and checked by a senior abdominal radiologist (X.L.). Besides, 30 MR images of ICC lesions were randomly selected and delineated again by X.L.Q. to assess the test-retest reliability. Blinded to segmentations delineated by X.L.Q., these 30 MR images of ICC lesions were delineated again by C.W.Z. to assess the inter-observer variability.

Feature extraction
To reduce heterogeneity among MR images, all images were resampled to an isotropic voxel size (1 × 1 × 1 mm 3 ) using bilinear interpolation, and intensities were normalized with a fixed bin width. Images were then normalized by z-score to obtain a standard normal distribution of image intensity. The extraction of radiomics features was performed by the uAI Research Portal (Version: 20210730), in which PyRadiomics (https:// pyrad iomics. readt hedocs. io/ en/ v3.0. 1/) was embedded. 2600 radiomics features were extracted from each sequence, and these features were classified into firstorder statistics, shape-based features, texture features, and high-order features.

Feature selection
Firstly, extracted radiomics features were applied with a z-score y i = x) normalization to eliminate index dimension difference. Secondly, features with intraclass correlation coefficients ≥ 0.75 in both test-retest and inter-observer settings were considered as reproducible radiomics features and were chosen for further analysis. Finally, the correlation analysis, multicollinearity analysis and least absolute shrinkage and selection operator (LASSO) methods were performed to select optimal prediction features (Table S3, Figure S1).

Model construction
Clinical model and imaging model were constructed by corresponding independent clinicoradiological predictors. For models based on single or multiple MR sequences and fusion models, linear discriminant analysis (LDA) was used to find the best linear combination of the above optimal prediction features to maximize the discrimination between patients with high and low Ki67 status. On the basis that LDA has a certain classification ability, taking the LDA results as the input feature of logistic regression (LR) and random forest (RF) to further accurately identify patients with high or low Ki67 status, and the classification threshold is 0.5.
Unless otherwise stated, data are shown as number of patients with percentage in parentheses a Data are means with standard deviation in parentheses b Data are medians with interquartile ranges in parentheses

Model evaluation and verification
Receiver operating characteristics curves (ROC) were plotted and the area under curve (AUC), sensitivity, specificity, accuracy, precision and F1-score were calculated to evaluate the performance of models. Delong test was used to compare the predictive efficiency between models, and we applied the Benjamini-Hochberg method to correct the false discovery rate (FDR) [25]. Hosmer-Lemeshow test was performed to evaluate the consistency between actual Ki67 status and predicted Ki67 status, and calibration curve was plotted. Decision curve and clinical impact curve were plotted to verify the clinical practicability of models by quantifying the net benefits at different risk thresholds. The confusion matrixes of the fusion models in three models were plotted. The workflow of the above radiomics analysis is shown in Fig. 2.

Correlation analysis
Radiomics features extracted from pre-T1WI, PVP and DP sequences were correlated with HBV, arterial rim enhancement on AP, enhancement pattern and Ki67, respectively, and heatmaps were performed. Because the different ranges of features, correlation coefficients were calculated for continuous versus binary variables, continuous variables, and non-continuous variables by using Biserial, Pearson, and Spearman correlation analyses, respectively.

Statistical analysis
Student's t test was used when variables were normal distribution, and Mann-Whitney U test was used when variables were non-normal distribution for continuous variables and chi-square test was used for qualitative variables to analyze whether there were statistically significant differences. Univariate and multivariate analysis were used for the selection of independent predictor. Statistical analysis was performed with R software (version 4.1.1). All statistical tests were two-sided, and p value lower than 0.05 were considered statistically significant.

Performance of clinicoradiological features
124 and 54 patients were divided into training and validation cohorts, and baseline clinicoradiological features are shown in Table 1 Table 3, Fig. 3A, B). Example of representative clinicoradiological features of ICC with high Ki67 status are shown in Figure S2.

Performance of radiomics features using single MR sequence
Totally, 76 robust radiomics features were selected from 6 single MR sequences ( Figure S1, Table S3 and S4), and the AUCs of 6 single MR sequence models constructed with robust radiomics features are displayed (Table 3, Fig. 3A, B). Although all single MR sequence models show poor specificity, which prompts single MR sequence model predictive efficiency is unreliable, T1, T1V and T1D models yield stable AUCs between training and validation cohorts, and T1V model shows the most stable predictive efficiency with Δ AUC = 0.009. Therefore, multiple-sequence models were constructed based on above three single-sequence models.

Performance of radiomics features using multiple MR sequence
Two-sequence models have good predictive efficiency in training and validation cohorts. Among them, T1V + T1 and T1V + T1D models show stable predictive efficacy and higher AUCs in validation than training cohort. Thereby, the final three-sequence radiomics model incorporates T1, T1V and T1D models, and it shows a satisfying predictive performance (AUC training = 0.860, AUC validation = 0.843) ( Table 3, Fig. 3D, E). And the final radiomics model performs better than T1V + T1D (FDR p = 0.018) model in the training cohort (Table 4). However, all of these three models show unsatisfying predictive performance in test cohort (AUC = 0.716-0.814) ( Table 3 and Fig. 3C).

Evaluation and verification of the combined Ki67 prediction model
The flowchart of the combined Ki67 prediction model is shown in Fig. 4. Calibration curves show the goodness of fit between the predicted Ki67 status by using the combined Ki67 prediction model and actual Ki67 status in the training (p = 0.787) and validation (p = 0.742) cohorts (Fig. 5A,  B). Decision curves show that radiomics model, clinical + radiomics, imaging + radiomics model and the combined Ki67 prediction model could obtain net benefit by predicting Ki67 status of all range risk threshold, and the combined Ki67 prediction model exhibits the highest net benefit (Fig. 5C). To further assess the clinical utility of models, clinical impact curves show that the combined Ki67   Bold values are models with stable and/or desirable predictive performance model has the largest risk threshold range of 0.5-1.0, and the predicted Ki67 status is highly consistent with the actual Ki67 status with risk thresholds ranging between 0.5 and 1.0 ( Fig. 5D-I). The heatmap of the correlation between Ki67 status, clinicoradiological features and radiomics features is shown in Fig. 6A and Table S5. The confusion matrixes of the fusion models in the training, validation and test models are shown in Fig. 6B.

Discussion
In this study, we established a multiparametric model for predicting Ki67 status in ICC patients preoperatively. The final combined Ki67 prediction model incorporates clinical, imaging and radiomics features, and it exhibits an excellent predictive efficiency.
As previous studies [11,21], we classified ICC lesions into low Ki67 status group and high Ki67 status group by 25% in our study. However, in the majority of studies on predicting Ki67 status of HCC preoperatively, the cut-off value of low and high Ki67 status is usually selected as 10-15% [17,26]. This may be due to the fact that ICC is a much more aggressive cancer than HCC [27,28]. Tsokos et al. [29] found that well-differentiated ICC had higher Ki67 LI than benign proliferations (22.7% vs. 1.4%, respectively; p < 0.001), and none of the benign biliary lesions had Ki67 LI greater than 10%. Therefore, it is inferred that 10% is more likely to be the cut-off value for differentiating benign biliary lesion and ICC, and 25% are more rational in differentiating low and high Ki67 status in ICC lesions.
The multivariate analysis shows HBV is the only independent clinical predictor of Ki67 status in our study. HBV and cirrhosis are risk factors for ICC, with overall odds ratios of 5.10 and 22.92, respectively [30], and Tovoli et al.    . 4 The predictive flowchart of the combined Ki67 prediction model [27] also found that cirrhotic patients with ICCs have different clinical presentation and outcomes. However, Peng et al. [31] showed there was no statistical differences between low and high Ki67 status in ICC patients, it may be due to an irrational cut-off value of 10% was selected in his study. The multivariate analysis also shows that arterial rim enhancement on AP and enhancement pattern are two independent imaging predictors of Ki67 status. Min et al. [32] found that arterial peripheral rim enhancement pattern was prognostic factor for increased risk of death, which is consistent with the relationship between arterial rim enhancement on AP and Ki67. Our study also suggests that ICC lesions with enhancement pattern like HCC (wash-in and wash-out) may yield a higher Ki67 LI. Since neither clinical model nor imaging model nor clinical + imaging model can achieve desirable AUC, so we further analyze the predictive performance of radiomics features using 6 single MR sequences. Obviously, none of them can meet the prediction requirements. But the single MR sequence models including T1, T1V and T1D models show stable predictive efficiency, more importantly, pairwise combination models of these three models yield stable predictive efficacy and higher AUCs in validation than training cohort. Thereby, the final radiomics model constructed with T1, T1V and T1D models may be more reliable. Although the final radiomics model performs better than T1V + T1D models in the training cohort, it performs poorly in test cohort and its specificity is unsatisfying, so a fusion model incorporates clinicoradiological and radiomics features is needed. Clinical + radiomics model, imaging + radiomics and the combined Ki67 prediction model achieve better than clinical + imaging model, indicating that radiomics features are vital to the prediction of Ki67 status in ICC patients.
Decision curves show that the combined Ki67 model could predict Ki67 status in all range risk threshold and obtain the highest net benefit. Clinical impact curves show that the combined Ki67 model has the largest risk threshold range of 0.5-1.0 to identify the Ki67 status accurately. The principle of the LASSO algorithm is to compress coefficients of features by introducing a regularization parameter, and remove some features with zero coefficients, so as to achieve the purpose of feature selection [33]. Radscore is obtained by matrix multiplication of the features and their coefficients obtained by the LASSO algorithm, usually as an independent prediction signature. However, Rad-score completely relies on the feature selection results of LASSO, and the category information is not directly correlated. LDA is a supervised feature dimensionality reduction method, which takes the category information studied as a priori knowledge [34]. The main purpose of LDA was to maximize the variation between samples of different categories and further fit a combined signature that was more suitable for discriminating high and low Ki67 status of ICCs in our study. Therefore, the combined signature obtained by the LDA method, with maximum inter-class variance and minimum intra-class variance, may could make the samples to be predicted obtain the best separability.
Recently, there have been several studies on the Ki67 prediction in HCC based on CT [26] and MR [17,35], but there is no study on the Ki67 prediction in ICC based on CT or MR. In our study, a total of 22 radiomics features (T1WI image: 10, PVP image: 9, DP image: 7) are correlated with the Ki67 status ( Fig. 6A and Table S5). And T1V_firstorder_Maximum (r = − 0.225, p = 0.002), T1V_ glszm_LowGrayLevelZoneEmphasis (r = 0.216, p = 0.004), T1V_ngtdm_Busyness (r = − 0.210, p = 0.005), and T1D_ glszm_LargeAreaEmphasis (r = − 0.212, p = 0.005) have strong correlation, which happens to explain why T1V model shows the most sable predictive efficiency among single MR sequence models. Peng et al. [31] predicted Ki67 status in ICC based on ultrasound radiomics features, principal component analysis as an unsupervised feature dimensionality reduction method, was used before LASSO in the feature selection, therefore, the features finally used to predict Ki67 status were weakly interpretable. The study about Ki67 status prediction in HCC conducted by Ye et al. [17] was similar to our study, only two glrlm features (Lon-gRunHighGrayLevelEmphasis, LongRunLowGrayLevelEmphasis) are identical to our study, suggesting that these two features may be independently related to the Ki67 status, regardless of tumor type.
There are several limitations in our study. Firstly, we define "Ki67 LI ≥ 25%" as high Ki67 status in our study according to previous studies, however, the reason why the Ki67 cut-off value of ICC is higher than that of HCC needs further study. Secondly, selection bias is inevitable in retrospective study, and an estimation of the effect of the splitting procedure may be needed in future study. Thirdly, there is no study on the Ki67 prediction in ICC based on MR in the past, thus more studies in this area are needed to compare and verify our study, and the correlation between radiomics features and Ki67 status, complex clinicoradiological features need to be further explained. Finally and importantly, larger cohorts from other centers are needed to be enrolled for prospective validation of our Ki67 prediction model.
In summary, the combined Ki67 model incorporating clinical feature (HBV), imaging features (arterial rim enhancement on AP and enhancement pattern) and radiomics features (on T1, T1V and T1D sequences) is a potential biomarker in Ki67 prediction, and the flowchart of the combined Ki67 prediction model may be a potential tool in Ki67 stratification of ICC patients (Fig. 4).