A multitask deep learning radiomics model for predicting the macrotrabecular-massive subtype and prognosis of hepatocellular carcinoma after hepatic arterial infusion chemotherapy

Background The macrotrabecular-massive (MTM) is a special subtype of hepatocellular carcinoma (HCC), which has commonly a dismal prognosis. This study aimed to develop a multitask deep learning radiomics (MDLR) model for predicting MTM and HCC patients’ prognosis after hepatic arterial infusion chemotherapy (HAIC). Methods From June 2018 to March 2020, 158 eligible patients with HCC who underwent surgery were retrospectively enrolled in MTM related cohorts, and 752 HCC patients who underwent HAIC were included in HAIC related cohorts during the same period. DLR features were extracted from dual-phase (arterial phase and venous phase) contrast-enhanced computed tomography (CECT) of the entire liver region. Then, an MDLR model was used for the simultaneous prediction of the MTM subtype and patient prognosis after HAIC. The MDLR model for prognostic risk stratification incorporated DLR signatures, clinical variables and MTM subtype. Findings The predictive performance of the DLR model for the MTM subtype was 0.968 in the training cohort [TC], 0.912 in the internal test cohort [ITC] and 0.773 in the external test cohort [ETC], respectively. Multivariable analysis identified portal vein tumor thrombus (PVTT) (p = 0.012), HAIC response (p < 0.001), HAIC sessions (p < 0.001) and MTM subtype (p < 0.001) as indicators of poor prognosis. After incorporating DLR signatures, the MDLR model yielded the best performance among all models (AUC, 0.855 in the TC, 0.805 in the ITC and 0.792 in the ETC). With these variables, the MDLR model provided two risk strata for overall survival (OS) in the TC: low risk (5-year OS, 44.9%) and high risk (5-year OS, 4.9%). Interpretation A tool based on MDLR was developed to consider that the MTM is an important prognosis factor for HCC patients. MDLR showed outstanding performance for the prognostic risk stratification of HCC patients who underwent HAIC and may help physicians with therapeutic decision making and surveillance strategy selection in clinical practice. Supplementary Information The online version contains supplementary material available at 10.1007/s11547-023-01719-1.


Introduction
Hepatocellular carcinoma (HCC) is the fourth most common malignant tumour and ranks as the second leading cause of cancer death globally [1].Unfortunately, > 70% of patients with HCC often have a high tumour burden when they receive the initial diagnosis [2].Hepatic arterial infusion chemotherapy (HAIC) is a promising option for large HCC that provides sustained local high concentrations of chemotherapy agents in the tumour [3].It easier to obtain a high objective response rate (ORR) for large HCC with multicycle HAIC, which can enable further conversion therapy.Shi Ming et al. showed that HAIC with the FOLFOX regimen (oxaliplatin plus fluorouracil and leucovorin) yielded a better median overall survival (OS, 23.1 months) and ORR (48%) than transarterial chemoembolization (TACE) for large HCC (largest diameter > 7 cm) in a randomized phase III trial [4].Moreover, immunotherapies and and multitargeted tyrosine kinase inhibitors (TKIs) including sorafenib and lenvatinib have present outstanding ORR and survival benefit for advanced HCC [5,6].
Xuelei He, Kai Li and Ran Wei have contributed equally to this work.

Extended author information available on the last page of the article
The macrotrabecular-massive (MTM) subtype, as an amorphologic HCC variant with angiogenesis, has been reported to have a dismal prognosis in previous reports [7,8].Patients with this subtype of HCC should be specifically diagnosed before surgery, but histopathologic examinations remain lacking.A series of studies have identified intratumor necrosis or ischemia as an independent predictor of the MTM subtype.And MTM subtype could be effectively diagnosed by these features combined with intratumor fat deficiency.Moreover, compared with non-MTM-HCC, several research found that MTM-HCC was often larger with more prone to intratumor necrosis and frequently exhibit irregular rim-like arterial phase enhancement (IRE) with a stronger invasion ability [9][10][11].Although the abovementioned MRI features could achieve high accuracy for predicting the MTM subtype in previous studies, potential selection bias resulting from interobserver variation was difficult to avoid.Over the past decade, an increasing number of quantitative and qualitative image analysis methods for the prediction of the MTM subtype have been proposed in oncological practice.For example, radiomics converts images into quantitative data in a high-throughput manner, making it a feasible and precise approach for outcome prediction.However, these analyses require the formulation of predefined criteria and manual or semiautomatic segmentation of the region of interest (i.e., the tumour and margin region) [12,13].However, deep learning (DL), as a data-driven approach, has been increasingly applied towards automatic design and organization based on the predictive ability of specific features instead of human performance [14,15].
Therefore, further studies are required to support the robustness and accuracy of the DL radiomics (DLR) approach for predicting the MTM subtype and patient prognosis.The aim of our study was to develop and validate a multitask DLR-based model based on preoperative CT for predicting the MTM subtype and prognosis of HCC patients who underwent HAIC based on multimodal data integrating clinical variables, DLR score and MTM subtype.

Materials and methods
This retrospective, multi-institutional study protocol obtained approval from the Institutional Review Board of all participating hospitals and was conducted following the principles of the 1975 Helsinki Declaration.Due to the retrospective nature of this study, the requirement for written informed consent was waived.

Patient enrolment
All HCC patients were diagnosed based on the European Association for the Study of Liver (EASL) and the American Association for the Study of Liver Disease (AASLD) guidelines [16,17].Between June 2018 and March 2020, a total of 159 consecutive patients with large HCC who received surgical resection (SR) were reviewed and underwent a standard contrast-enhanced computed tomography (CECT) examination within 2 weeks before SR in a tertiary highvolume hospital.The histologic examination of tumour specimens was performed by two pathologists (reader 1, L.L., and reader 2, P.W., with 10 years of experience) by serially examining multiple pathologic specimens.The intraclass correlation coefficient (ICC) was calculated as the metric for reproducibility evaluation.Pathologic features with both intra-and interobserver ICCs higher than 0.9 were selected.The MTM subtype was defined as a > 50% macrotrabecular architectural pattern present after haematoxylin-eosin staining.
Another cohort consisted of 1367 patients with HCC who received initial HAIC as the first-line therapy between January 2014 and May 2022. Figure 1A  (e) loss to follow-up after > 6 months.The reasons for using HAIC rather than surgery or systematic chemotherapy, the HAIC procedures, and criteria for protocol discontinuation are shown in supplementary information E1.1-1.3.Moreover, the preoperative CECT scan protocol is described in supplementary information E1.4.

Follow-up protocol and endpoints
In this study, enrolled patients were censored at the last follow-up (October 30, 2022).After a thorough HAIC protocol was completed, the serum alpha-fetoprotein (AFP) levels and contrast-enhanced CT or MRI were repeated in 3-6-month intervals, at approximately 3-month intervals in the first year and 6-month intervals thereafter.The responses to HAIC were assessed by dynamic contrast-enhanced images acquired before and after HAIC.The assessment was performed independently every 4-6 weeks after initial HAIC by two radiologists (reader 1, L.Z.L., and reader 2, J.Z., with 10 years of experience) who were blinded to the HAIC procedure at the time of data collection.According to the modified Response Evaluation Criteria in Solid Tumor (mRECIST), the responses were divided into complete response (CR), partial response (PR), stable disease (SD), and progressive disease (PD) [18].The primary endpoint was OS, which was calculated from the date of initial treatment to the date of death from any cause or date of last follow-up.Thirty-four clinical variables were collected for analysis as predictors of HAIC prognosis and are listed in supplementary information E 1.5.

Study design
In this study, we used dual-phase (arterial phase and venous phase) CECT data collected from MTM related cohort comprising 159 patients who received SR and HAIC related cohort comprising 752 patients who received HAIC to develop and validate a multitask deep learning radiomic nomogram (MDLRN).Figure 1B-E shows the MDLRM pipeline, including the image segmentation of regions of interest (ROIs), feature extraction and selection, signature building, and model construction.The detailed automatic delineation was listed in supplementary information E 1.6.We used the clinical and CECT data from one tertiary high-volume institution as the training cohort (TC, n = 459) and internal testing cohort (ITC, n = 122) and the clinical data from 4 medical centres as the external test cohort (ETC, n = 141).

MTM-related score
The first step was a histologic-related ML scoring model for prediction of MTM status.A 3D MobileNetV1 Structure (shown in sTable 1 and sFigure 1) Deep Learning Model for prediction of MTM status was constructed to extract the High-Level Image features.Then 22 radiomics features were selected and constructed the model by XGBoost with CART base-classification [19].There were also 6 clinical factors selected to construct the clinical model.At last, the score of three models (Clinical, Deep Learning and Radiomics, DLR-Cli) were added with weight to build a MTM-related score model.The detailed information of multi-task deep learning (MDL) model construction, MTM radiomic model construction and procedure of DLR-Cli model construction were described in supplementary methods E1.8-1.9.

Prognostic score for survival after HAIC
In the next step, we built and validated a MDLRN integrated a MTM related biomarker, DL score extracted from tumor and other clinical information for prediction of OS in the 752 patients in OS cohorts.The same 3D MobileNetV1 multitask model was used for multi-task OS predictions, which was described in supplementary information E1.6.The

Statistical analysis
Statistical analysis was performed using the survival and rms packages of R software version 3.6.3(http:// www.r-proje ct.org/).Continuous variables were presented as mean ± standard deviation (SD) or median with interquartile range (IQR) and compared using the Kruskal-Wallis test, while categorical variables were presented as frequencies with percentages and compared using the chi-squared test.Univariable and multivariable logistic regression analyses were applied to calculate the hazard ratios (HRs) and corresponding 95% confidence intervals (CIs) of variables and identify independent significant risk factors.The OS curves of different subgroups were compared using the Kaplan-Meier method with the log-rank test, and the AUCs of different models were compared by the DeLong test.The predictive parameters, including accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV), were also calculated to assess model performance.
All tests of significance were two-sided, and a p value < 0.05 was interpreted to carry statistical significance.

Baseline characteristics
In MTM related cohort, 74.2% (118/159) of patients were diagnosed pathologically with the MTM subtype.The baseline characteristics stratified by MTM status are shown in Table 1.Among all variables, age < 65 years and Edmondson-Steiner grade III-IV were found to be more prevalent in the MTM group than in the non-MTM group (p = 0.032, < 0.001).Other variables showed a similar distribution between the two groups.In HAIC related cohorts, a total of 752 treatment-naïve patients with HCC (80 females and 672 males; mean age, 54.2 ± 11.8 years) met the inclusion criteria.The clinicopathologic characteristics of the HCC patients who underwent HAIC in the three cohorts are outlined in Table 2.At the final follow-up, the mortality rates were 61.1% (299/489) in the TC, 71.3% (87/122) in the VC, and 35.5% (50/141) in the EVC.The baseline characteristics of the abovementioned two cohorts are shown in sTable 2.

Hand-crafted radiomic and DL feature analysis
Based on the segmented liver images, a total of 5610 pre-defined radiomic features and 4132 DL features were extracted from each phase of CECT.After feature selection, 10 in AP and 12 in PP were selected as significant pre-defined radiomic features.Among all ML classifiers, XGBoost outperformed other 3 classifiers and was selected to build radiomics scores.Most of the selected pre-defined radiomic features were GLCM features, which might be related to the heterogeneity of HCC.Besides, all DL features in AP and PP were chosen to build DL scores for further analysis.Prognostic performance comparison between various of models and staging system was shown in Table 3.

MTM-related score
The baseline characteristics of patients with MTM subtype were listed in sTable 3.In the TC, the deep learning radiomics (DLR) risk score was lower in the MTM group than in the non-MTM group (mean, 0.834 ± 0.097 vs. 0.177 ± 0.089; p < 0.001).Multivariable analysis showed

The development and validation of the MDLRN
Multivariate analysis showed that preoperative parameters, including PVTT (HR, 1.42) and DLR risk score (HR, 0.11), and postoperative parameters, including OR, HAIC sessions and MTM score, were independent risk factors for poor OS (sTable 5).The detailed performance of MDL for OS listed in sTable 6.These independently associated risk factors were used to develop the MDLRN (Fig. 2A,  B), described by the formula: HR = 1.38 × PVTT + 0.54 × OR + 0.74 × HAIC sessions + 2.44 × MTM + 0.10 × DLR score.For each tumour grade, a higher total point value indicated a worse OS.The bootstrapped calibration curves plotted with 1-, 3-and 5-year OS were well matched with the idealized 45° line for the MDLRN in the three cohorts (Fig. 2C-H).To add clinical convenience, a user-friendly online application (https:// preha icnom ogram forhcc.shiny apps.io/ DynNo mapp/) was developed.The AUCs of the preoperative MDLRN for predicting the OS of HCC patients who underwent HAIC in the TC, IVC and EVC were 0.80, 0.71 and 0.74, respectively.In addition, the AUCs of the postoperative MDLRN for predicting OS in the TC, IVC and EVC were 0.84, 0.78 and 0.79, respectively.In this study, we found that the MDLRN improved the prognostic prediction of HCC patients who underwent HAIC compared with rival models and staging systems (AJCC [American Joint Committee on Cancer], BCLC [Barcelona Clinic Liver Cancer] stage, CLIP [Cancer of the Liver Italian Program] classification, HKLC [Hong Kong Liver Cancer] stage) in the three cohorts (Fig. 3).

Visualization interpretability
The learned feature maps of MobileNetV1 are shown in Fig. 4 and detailed patient information in sTable 7. To better explore the hidden patterns the network learned, heatmaps were divided into prediction groups for 1/2/3/ > 3-year death/ survival.According to their imaging features, the examples were divided into MTM and non-MTM subtypes.Overall, the whole intensity of the feature map in the predicted non-MTM group was lower than that in the predicted MTM group, which seems to indicate the natural pathological characteristics of HCC.Moreover, heatmaps showed that the better survival group had a high intensity, which indicated that the MTM subtype was an important factor for prognostic analysis.

Survival risk stratification
To facilitate the clinical application of the MDLRN, we divided the HCC patients who underwent HAIC into two risk groups, including a high-risk group and a low-risk group, according to MDLRN risk scores.We identified the HR cut-off values for the pre-and post-MDLRN (−1.40 and −0.26) in the TC and verified them in the ITC and ETC, respectively.This pragmatic visualization of the risk level could help decide the HAIC strategy for HCC patients.According to the cut-off risk scores for the pre-MDLRN, in the TC, the 1-, 3-and 5-year OS were 89.0%, 52.9% and 34.3% in the low-risk group, respectively, which were better than the corresponding rates in the high-risk groups (37.2%, 5.5% and 5.5%) (p < 0.001) (Fig. 5A).Similarly, the cumulative 1-, 3-, and 5-year OS rates among the high-risk and low-risk groups were also significantly different in the other two test cohorts (both, p < 0.001) (Fig. 5B, C).According to the cut-off risk scores for the post-MDLRN, in the TC, the 1-, 3-and 5-year OS were 91.2%, 55.4% and 25.1% in the low-risk group, respectively, which were better than the rates in the high-risk group (35.7%, 3.8% and 3.8%, respectively) (p < 0.001) (Fig. 5D).Similarly, the cumulative 1-, 3-, and 5-year OS rates among the high-risk and low-risk groups were also significantly different in the other two test cohorts (both, p < 0.001) (Fig. 5E, F).In brief, more deaths were more commonly found during the follow-up period in high-risk patients than in low-risk patients; a higher proportion of low-risk patients received potentially curative therapy (liver transplant, repeat liver resection, or ablation) than high-risk patients.

Discussion
According to current guidelines, the standard treatment for advanced HCC is sorafenib.HAIC is now being applied, mostly in Asia.However, large randomized trials are still lacking.In the EACH and other previous studies, the value In this study, we aimed to identify a certain histologic subtype of HCC, which was defined as the MTM subtype.Notably, the incidence of this histologic subtype increased with increasing tumour diameter in previous studies.In MTM related cohort with 159 patients who underwent SR, 74.2% (118/159) of patients were diagnosed pathologically with the MTM subtype.We developed a DLR-based model for predicting the MTM subtype that showed outstanding performance (AUC, 0.98 in the TC, 0.84 in the IVC and 0.72 in the EVC).In addition, our results showed that a high serum AFP level was an independent predictor of the MTM subtype [11], which was consistent with previous reports.However, no incremental increase in value was observed with the addition of the DLR model to predict the MTM subtype.We also showed that a low baseline DLR score for MTM status was associated with OS (HR, 0.85; p < 0.001) in patients with large HCC who underwent HAIC, indicating its potential clinical application.
This study developed and validated an MDLRN for predicting the OS of patients with large HCC receiving initial HAIC based on CECT data from 752 patients in OS cohort, and the model could accurately stratify patients with large HCC into two prognostic subgroups with significantly different OS.In this study, three attempts were made, as follows: first, we built the MDLRN comprising preoperative and postoperative clinical variables, DLR signatures and MTM subtype for the prediction of OS; second, the entire liver parenchyma was automatically segmented as an ROI using the ResU-Net algorithm for feature extraction; third, we provided an MDLRN-based system as a visualized web tool to recommend suitable patients with large HCCs for HAIC treatment, achieving a good predictive performance (AUC, 0.87 in IVC; AUC, 0.83 in the EVC).
The DLR analysis highlighted the potential important roles of tumour burden and distribution in the entire liver parenchyma as well as the tumour microenvironment (TME) in prognostic prediction.Exposure of the targeted tumours to chemotherapy drugs over multiple cycles is closely related to treatment response [25].Previous studies have suggested that a larger tumour burden and more dispersed distribution both weaken the effect of chemotherapy [4].Similarly, our DLR visualization results were consistent with the abovementioned hypothesis.The predicted death group had a higher intensity heatmap than the predicted survival group, suggesting the importance of tumour burden and distribution.Moreover, previous studies exploring the mechanisms of conventional chemotherapy resistance have revealed the involvement of TME components and seem to explain the relationship between the status of the TME and the response to chemotherapy [26,27].In our heatmap, a higher intensity distribution may be consistent with TME component assembly, including for the ECM, proteoglycans, immune cells and hypoxic environment.This hypothesis needs further experimental research on the underlying mechanism.
The MDLRN based on preoperative DLR scores and clinical parameters should be useful for patient stratification before HAIC, allowing clinicians to optimize treatment, such as switches to SR and LT.Once the patients undergoes HAIC, the post-MDLRN, which was built with postoperative clinical parameters, including number of sessions of HAIC, response to HAIC and predicted MTM subtype, has significantly higher predictive performance and can be used to design individualized surveillance and therapeutic strategies.Through patient stratification performed by the MDLRN, an intensive surveillance regimen, and even some aggressive or expensive preventive and adjuvant therapies, including preventive multitargeted tyrosine kinase inhibitors [TKIs] and programmed cell death protein (PD)-1 therapies, can be considered to prolong the OS of high-risk patients [28,29].On the other hand, low-risk patients may receive less intensive surveillance regimens and more prudent consideration of aggressive or expensive preventive therapies after HAIC to reduce the probability of negative effects and the high cost of these examinations and therapies.
There are some limitations to our study.First, selection bias is unavoidable in observational studies and may affect the real outcomes.Second, we did not perform manual delineation of the tumour area to extract features.Whether the predictive ability of the MDLRN model would significantly improve over that of a model based on the entire tumour ROI remains to be further tested in external cohorts.Third, as time progresses, the therapeutic techniques for HCC are constantly being updated and improved, such as the adjustment of HAIC chemotherapy drug regimens and the improvement of HAIC combined with molecular targeted drugs.This will have a certain degree of impact on outcome prediction and is inevitable Fourth, clinical information regarding complications during and after HAIC and TKI treatment were not analysed, warranting further investigation.Given these limitations, the MDLRN model requires further validation as an OS stratification tool for HAIC in patients with HCC before being applied in other study settings.
In conclusion, MTM is an important prognosis factor for HCC patients which was taken into consideration for building the multitask DLR method.The model could predict the prognosis of HCC patients who underwent HAIC and showed excellent performance in two test cohorts, demonstrating its robustness and effectiveness.Therefore, this tool may help physicians with therapeutic decision making and surveillance strategy selection in clinical practice.
demonstrates the enrolment pathways of HCC patients who underwent HAIC.The inclusion criteria were as follows: (a) age 18-75 years; (b) Eastern Cooperative Oncology Group (ECOG) performance status < 2; (c) Child-Pugh class A or B liver function; and (d) management of HAIC with the FOLFOX regimen (FOL-FOX-HAIC).The exclusion criteria were as follows: (a) any treatment before HAIC; (b) HCC combined with other malignancies; (c) a maximum tumour diameter ≤ 5 cm; (d) simultaneous treatment of TACE combined with HAIC; and

Fig. 1
Fig. 1 Flowcharts show HCC patient recruitment process and MDLR model construction.A HCC patient recruitment.B Data preprocess.C MTM model.D OS nomogram; E MDLR model construction.

Fig. 2
Fig. 2 Development of prognostic nomogram for OS.A The prenomogram was established using diagnostic factors for patients who had not received HAIC treatment and had preoperative HAIC data.B The post-nomogram was established using multiple factors for patients who had undergone HAIC treatment and had both pre-and post -HAIC data.C-E calibration curves plotted with 1-, 3-and

Fig. 3 Fig. 4
Fig. 3 Discriminatory performance of all models and systems in thee cohorts.Graphs show time-dependent areas under the receiver operating characteristic (ROC) curve at various time points (top) for established models and staging systems.AJCC = American Joint

Fig. 5
Fig.5Comparing the survival among different risk level groups based on the two prognostic models.According to the risk scores from the pre-nomogram, the HCC patients were divided into high-, and low-risk groups A-C.Kaplan-Meier (KM) curves for the overall survival (OS) of HCC patients in these two risk level groups in A

Table 1
Patient characteristics according to the MTM subtype Data are number of patients; data in parentheses are percentage of patients unless otherwise indicated.The data in two groups were compared by using the Chi square test

Table 2
Baseline characteristics of patients with large HCC who received HAIC of FOLFOX are number of patients; data in parentheses are percentage of patients unless otherwise indicated.The data in two groups were compared by using the Chi square test.Non-normally distributed data is represented by median and quartile.p value < 0.05 suggest statistically significant differences between three cohorts HAIC hepatic arterial infusion chemotherapy, FOLFOX oxaliplatin plus fluorouracil and leucovorin, OR objective responds, SD standard deviation, BMI body mass index, PS performance status, ECOG Eastern Cooperative Oncology Group, HBV viral hepatitis type B, AFP α-fetoprotein, ALBI albumin-bilirubin, ALB albumin, ALT alanine aminotransferase, AST aspartate aminotransferase, PT prothrombin time, INR international normalized ratio, TBIL total bilirubin, PLT platelet, SBRT stereotactic body radiation therapy, TKI tyrosine kinase inhibitor parison of predictive performance among four different models (clinical, radiomics, DLR, and DLR-Cli) in three cohorts and the AUC, SENS, SPEC, PPV, and NPV data of each model are shown in sTable 4.Among all models, the in the EVC, respectively.The results of the DeLong test indicated a significant difference in performance between the clinical model and the DLR-Cli model (p < 0.001 in TC, p < 0.001 in IVC and p < 0.001 in EVC).Data

Table 3
Prognostic Performance of DL-based models compared with staging systems after HAIC of HCC Numbers in parentheses are the 95% confidence interval.All p values were obtained from analyses comparing the AUC of various models by using the Delong test AJCC American Joint Committee on Cancer, AUC area under the receiver operating characteristic curve, BCLC Barcelona Clinic Liver Cancer, CLIP Cancer of the Liver Italian Program, HKLC Hong Kong Liver Cancer *p value versus preoperative nomogram †p value versus postoperative nomogram