Introduction

Hepatocellular carcinoma (HCC) is the fourth most common malignant tumour and ranks as the second leading cause of cancer death globally [1]. Unfortunately, > 70% of patients with HCC often have a high tumour burden when they receive the initial diagnosis [2]. Hepatic arterial infusion chemotherapy (HAIC) is a promising option for large HCC that provides sustained local high concentrations of chemotherapy agents in the tumour [3]. It easier to obtain a high objective response rate (ORR) for large HCC with multicycle HAIC, which can enable further conversion therapy. Shi Ming et al. showed that HAIC with the FOLFOX regimen (oxaliplatin plus fluorouracil and leucovorin) yielded a better median overall survival (OS, 23.1 months) and ORR (48%) than transarterial chemoembolization (TACE) for large HCC (largest diameter > 7 cm) in a randomized phase III trial [4]. Moreover, immunotherapies and and multitargeted tyrosine kinase inhibitors (TKIs) including sorafenib and lenvatinib have present outstanding ORR and survival benefit for advanced HCC [5, 6].

The macrotrabecular-massive (MTM) subtype, as an amorphologic HCC variant with angiogenesis, has been reported to have a dismal prognosis in previous reports [7, 8]. Patients with this subtype of HCC should be specifically diagnosed before surgery, but histopathologic examinations remain lacking. A series of studies have identified intratumor necrosis or ischemia as an independent predictor of the MTM subtype. And MTM subtype could be effectively diagnosed by these features combined with intratumor fat deficiency. Moreover, compared with non-MTM-HCC, several research found that MTM-HCC was often larger with more prone to intratumor necrosis and frequently exhibit irregular rim-like arterial phase enhancement (IRE) with a stronger invasion ability [9,10,11]. Although the abovementioned MRI features could achieve high accuracy for predicting the MTM subtype in previous studies, potential selection bias resulting from interobserver variation was difficult to avoid. Over the past decade, an increasing number of quantitative and qualitative image analysis methods for the prediction of the MTM subtype have been proposed in oncological practice. For example, radiomics converts images into quantitative data in a high-throughput manner, making it a feasible and precise approach for outcome prediction. However, these analyses require the formulation of predefined criteria and manual or semiautomatic segmentation of the region of interest (i.e., the tumour and margin region) [12, 13]. However, deep learning (DL), as a data-driven approach, has been increasingly applied towards automatic design and organization based on the predictive ability of specific features instead of human performance [14, 15].

Therefore, further studies are required to support the robustness and accuracy of the DL radiomics (DLR) approach for predicting the MTM subtype and patient prognosis. The aim of our study was to develop and validate a multitask DLR-based model based on preoperative CT for predicting the MTM subtype and prognosis of HCC patients who underwent HAIC based on multimodal data integrating clinical variables, DLR score and MTM subtype.

Materials and methods

This retrospective, multi-institutional study protocol obtained approval from the Institutional Review Board of all participating hospitals and was conducted following the principles of the 1975 Helsinki Declaration. Due to the retrospective nature of this study, the requirement for written informed consent was waived.

Patient enrolment

All HCC patients were diagnosed based on the European Association for the Study of Liver (EASL) and the American Association for the Study of Liver Disease (AASLD) guidelines [16, 17]. Between June 2018 and March 2020, a total of 159 consecutive patients with large HCC who received surgical resection (SR) were reviewed and underwent a standard contrast-enhanced computed tomography (CECT) examination within 2 weeks before SR in a tertiary high-volume hospital. The histologic examination of tumour specimens was performed by two pathologists (reader 1, L.L., and reader 2, P.W., with 10 years of experience) by serially examining multiple pathologic specimens. The intraclass correlation coefficient (ICC) was calculated as the metric for reproducibility evaluation. Pathologic features with both intra- and interobserver ICCs higher than 0.9 were selected. The MTM subtype was defined as a > 50% macrotrabecular architectural pattern present after haematoxylin–eosin staining.

Another cohort consisted of 1367 patients with HCC who received initial HAIC as the first-line therapy between January 2014 and May 2022. Figure 1A demonstrates the enrolment pathways of HCC patients who underwent HAIC. The inclusion criteria were as follows: (a) age 18–75 years; (b) Eastern Cooperative Oncology Group (ECOG) performance status < 2; (c) Child‒Pugh class A or B liver function; and (d) management of HAIC with the FOLFOX regimen (FOLFOX-HAIC). The exclusion criteria were as follows: (a) any treatment before HAIC; (b) HCC combined with other malignancies; (c) a maximum tumour diameter ≤ 5 cm; (d) simultaneous treatment of TACE combined with HAIC; and (e) loss to follow-up after > 6 months. The reasons for using HAIC rather than surgery or systematic chemotherapy, the HAIC procedures, and criteria for protocol discontinuation are shown in supplementary information E1.1–1.3. Moreover, the preoperative CECT scan protocol is described in supplementary information E1.4.

Fig. 1
figure 1

Flowcharts show HCC patient recruitment process and MDLR model construction. A HCC patient recruitment. B Data preprocess. C MTM model. D OS nomogram; E MDLR model construction. MTM = macrotrabecular-massive HCC = hepatocellular carcinoma (HCC), MDLR = multitask deep learning radiomics OS = overall surviva

Follow-up protocol and endpoints

In this study, enrolled patients were censored at the last follow-up (October 30, 2022). After a thorough HAIC protocol was completed, the serum alpha-fetoprotein (AFP) levels and contrast-enhanced CT or MRI were repeated in 3–6-month intervals, at approximately 3-month intervals in the first year and 6-month intervals thereafter. The responses to HAIC were assessed by dynamic contrast-enhanced images acquired before and after HAIC. The assessment was performed independently every 4–6 weeks after initial HAIC by two radiologists (reader 1, L.Z.L., and reader 2, J.Z., with 10 years of experience) who were blinded to the HAIC procedure at the time of data collection. According to the modified Response Evaluation Criteria in Solid Tumor (mRECIST), the responses were divided into complete response (CR), partial response (PR), stable disease (SD), and progressive disease (PD) [18]. The primary endpoint was OS, which was calculated from the date of initial treatment to the date of death from any cause or date of last follow-up. Thirty-four clinical variables were collected for analysis as predictors of HAIC prognosis and are listed in supplementary information E 1.5.

Study design

In this study, we used dual-phase (arterial phase and venous phase) CECT data collected from MTM related cohort comprising 159 patients who received SR and HAIC related cohort comprising 752 patients who received HAIC to develop and validate a multitask deep learning radiomic nomogram (MDLRN). Figure 1B–E shows the MDLRM pipeline, including the image segmentation of regions of interest (ROIs), feature extraction and selection, signature building, and model construction. The detailed automatic delineation was listed in supplementary information E 1.6. We used the clinical and CECT data from one tertiary high-volume institution as the training cohort (TC, n = 459) and internal testing cohort (ITC, n = 122) and the clinical data from 4 medical centres as the external test cohort (ETC, n = 141).

MTM-related score

The first step was a histologic-related ML scoring model for prediction of MTM status. A 3D MobileNetV1 Structure (shown in sTable 1 and sFigure 1) Deep Learning Model for prediction of MTM status was constructed to extract the High-Level Image features. Then 22 radiomics features were selected and constructed the model by XGBoost with CART base-classification [19]. There were also 6 clinical factors selected to construct the clinical model. At last, the score of three models (Clinical, Deep Learning and Radiomics, DLR-Cli) were added with weight to build a MTM-related score model. The detailed information of multi-task deep learning (MDL) model construction, MTM radiomic model construction and procedure of DLR-Cli model construction were described in supplementary methods E1.8–1.9.

Prognostic score for survival after HAIC

In the next step, we built and validated a MDLRN integrated a MTM related biomarker, DL score extracted from tumor and other clinical information for prediction of OS in the 752 patients in OS cohorts. The same 3D MobileNetV1 multi-task model was used for multi-task OS predictions, which was described in supplementary information E1.6. The predicted MTM-score, DL-score, pre- and post-operative clinical variables were then integrated into a new Cox-PH model to obtain a precise estimation of the survival time of an individual patient received HAIC.

Statistical analysis

Statistical analysis was performed using the survival and rms packages of R software version 3.6.3 (http://www.r-project.org/). Continuous variables were presented as mean ± standard deviation (SD) or median with interquartile range (IQR) and compared using the Kruskal–Wallis test, while categorical variables were presented as frequencies with percentages and compared using the chi-squared test. Univariable and multivariable logistic regression analyses were applied to calculate the hazard ratios (HRs) and corresponding 95% confidence intervals (CIs) of variables and identify independent significant risk factors. The OS curves of different subgroups were compared using the Kaplan–Meier method with the log-rank test, and the AUCs of different models were compared by the DeLong test. The predictive parameters, including accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV), were also calculated to assess model performance.

All tests of significance were two-sided, and a p value < 0.05 was interpreted to carry statistical significance.

Results

Baseline characteristics

In MTM related cohort, 74.2% (118/159) of patients were diagnosed pathologically with the MTM subtype. The baseline characteristics stratified by MTM status are shown in Table 1. Among all variables, age < 65 years and Edmondson-Steiner grade III–IV were found to be more prevalent in the MTM group than in the non-MTM group (p = 0.032, < 0.001). Other variables showed a similar distribution between the two groups. In HAIC related cohorts, a total of 752 treatment-naïve patients with HCC (80 females and 672 males; mean age, 54.2 ± 11.8 years) met the inclusion criteria. The clinicopathologic characteristics of the HCC patients who underwent HAIC in the three cohorts are outlined in Table 2. At the final follow-up, the mortality rates were 61.1% (299/489) in the TC, 71.3% (87/122) in the VC, and 35.5% (50/141) in the EVC. The baseline characteristics of the abovementioned two cohorts are shown in sTable 2.

Table 1 Patient characteristics according to the MTM subtype
Table 2 Baseline characteristics of patients with large HCC who received HAIC of FOLFOX

Hand-crafted radiomic and DL feature analysis

Based on the segmented liver images, a total of 5610 pre-defined radiomic features and 4132 DL features were extracted from each phase of CECT. After feature selection, 10 in AP and 12 in PP were selected as significant pre-defined radiomic features. Among all ML classifiers, XGBoost outperformed other 3 classifiers and was selected to build radiomics scores. Most of the selected pre-defined radiomic features were GLCM features, which might be related to the heterogeneity of HCC. Besides, all DL features in AP and PP were chosen to build DL scores for further analysis. Prognostic performance comparison between various of models and staging system was shown in Table 3.

Table 3 Prognostic Performance of DL-based models compared with staging systems after HAIC of HCC

MTM-related score

The baseline characteristics of patients with MTM subtype were listed in sTable 3. In the TC, the deep learning radiomics (DLR) risk score was lower in the MTM group than in the non-MTM group (mean, 0.834 ± 0.097 vs. 0.177 ± 0.089; p < 0.001). Multivariable analysis showed that an AFP level > 400 ng/ml and the DLR risk score were independent indicators for the MTM subtype. The comparison of predictive performance among four different models (clinical, radiomics, DLR, and DLR-Cli) in three cohorts and the AUC, SENS, SPEC, PPV, and NPV data of each model are shown in sTable 4. Among all models, the DLR-Cli model showed optimal discrimination, achieving AUCs of 0.967 in the TC, 0.912 in the IVC and 0.773 in the EVC, respectively. The results of the DeLong test indicated a significant difference in performance between the clinical model and the DLR-Cli model (p < 0.001 in TC, p < 0.001 in IVC and p < 0.001 in EVC).

The development and validation of the MDLRN

Multivariate analysis showed that preoperative parameters, including PVTT (HR, 1.42) and DLR risk score (HR, 0.11), and postoperative parameters, including OR, HAIC sessions and MTM score, were independent risk factors for poor OS (sTable 5). The detailed performance of MDL for OS listed in sTable 6. These independently associated risk factors were used to develop the MDLRN (Fig. 2A, B), described by the formula: HR = 1.38 × PVTT + 0.54 × OR + 0.74 × HAIC sessions + 2.44 × MTM + 0.10 × DLR score. For each tumour grade, a higher total point value indicated a worse OS. The bootstrapped calibration curves plotted with 1-, 3- and 5-year OS were well matched with the idealized 45° line for the MDLRN in the three cohorts (Fig. 2C–H). To add clinical convenience, a user-friendly online application (https://prehaicnomogramforhcc.shinyapps.io/DynNomapp/) was developed.

Fig. 2
figure 2

Development of prognostic nomogram for OS. A The pre-nomogram was established using diagnostic factors for patients who had not received HAIC treatment and had preoperative HAIC data. B The post-nomogram was established using multiple factors for patients who had undergone HAIC treatment and had both pre- and post -HAIC data. CE calibration curves plotted with 1-, 3- and 5-year overall survival (OS) were well matched with the idealized 45° line for the pre-nomogram in training cohort, internal testing cohort and external testing cohort. FH calibration curves plotted with 1-, 3- and 5-year OS were well matched with the idealized 45°line for the post-nomogram in training cohort, internal testing cohort and external testing cohort

The AUCs of the preoperative MDLRN for predicting the OS of HCC patients who underwent HAIC in the TC, IVC and EVC were 0.80, 0.71 and 0.74, respectively. In addition, the AUCs of the postoperative MDLRN for predicting OS in the TC, IVC and EVC were 0.84, 0.78 and 0.79, respectively. In this study, we found that the MDLRN improved the prognostic prediction of HCC patients who underwent HAIC compared with rival models and staging systems (AJCC [American Joint Committee on Cancer], BCLC [Barcelona Clinic Liver Cancer] stage, CLIP [Cancer of the Liver Italian Program] classification, HKLC [Hong Kong Liver Cancer] stage) in the three cohorts (Fig. 3).

Fig. 3
figure 3

Discriminatory performance of all models and systems in thee cohorts. Graphs show time-dependent areas under the receiver operating characteristic (ROC) curve at various time points (top) for established models and staging systems. AJCC = American Joint Committee on Cancer, BCLC = Barcelona Clinic Liver Cancer, CLIP = Cancer of the Liver Italian Program, HKLC = Hong Kong Liver Cancer

Visualization interpretability

The learned feature maps of MobileNetV1 are shown in Fig. 4 and detailed patient information in sTable 7. To better explore the hidden patterns the network learned, heatmaps were divided into prediction groups for 1/2/3/ > 3-year death/survival. According to their imaging features, the examples were divided into MTM and non-MTM subtypes. Overall, the whole intensity of the feature map in the predicted non-MTM group was lower than that in the predicted MTM group, which seems to indicate the natural pathological characteristics of HCC. Moreover, heatmaps showed that the better survival group had a high intensity, which indicated that the MTM subtype was an important factor for prognostic analysis.

Fig. 4
figure 4

The learned feature maps of MobileNetV1 for 1-, 2-, 3-, > 3 years OS. The 1-year OS non-MTM image come from a patient with 56 years old, Portal Vein Tumor Thrombus (PVTT), Stable Disease (SD) and 6 HAIC session. The 1-year OS MTM image come from a patient with 40 years old, Portal Vein Tumor Thrombus (PVTT), Stable Disease (SD) and 2 HAIC session. The 2-years OS non-MTM image come from a patient with 60 years old, Portal Vein Tumor Thrombus (PVTT), Stable Disease (SD) and 5 HAIC session. The 2-years OS MTM image come from a patient with 63 years old, No Portal Vein Tumor Thrombus (PVTT), Stable Disease (SD) and 6 HAIC session. The 3-years OS non-MTM image come from a patient with 47 years old, Portal Vein Tumor Thrombus (PVTT), Partial Response (PR) and 3 HAIC session. The 3-years OS MTM image come from a patient with 58 years old, No Portal Vein Tumor Thrombus (PVTT), Progressive Disease (PD) and 2 HAIC session. The > 3-years OS non-MTM image come from a patient with 26 years old, Portal Vein Tumor Thrombus (PVTT), Partial Response (PR) and 6 HAIC session. The > 3-years OS MTM image come from a patient with 58 years old, No Portal Vein Tumor Thrombus (PVTT), Partial Response (PR) and 8 HAIC session

Survival risk stratification

To facilitate the clinical application of the MDLRN, we divided the HCC patients who underwent HAIC into two risk groups, including a high-risk group and a low-risk group, according to MDLRN risk scores. We identified the HR cut-off values for the pre- and post-MDLRN (−1.40 and −0.26) in the TC and verified them in the ITC and ETC, respectively. This pragmatic visualization of the risk level could help decide the HAIC strategy for HCC patients. According to the cut-off risk scores for the pre-MDLRN, in the TC, the 1-, 3- and 5-year OS were 89.0%, 52.9% and 34.3% in the low-risk group, respectively, which were better than the corresponding rates in the high-risk groups (37.2%, 5.5% and 5.5%) (p < 0.001) (Fig. 5A). Similarly, the cumulative 1-, 3-, and 5-year OS rates among the high-risk and low-risk groups were also significantly different in the other two test cohorts (both, p < 0.001) (Fig. 5B, C). According to the cut-off risk scores for the post-MDLRN, in the TC, the 1-, 3- and 5-year OS were 91.2%, 55.4% and 25.1% in the low-risk group, respectively, which were better than the rates in the high-risk group (35.7%, 3.8% and 3.8%, respectively) (p < 0.001) (Fig. 5D). Similarly, the cumulative 1-, 3-, and 5-year OS rates among the high-risk and low-risk groups were also significantly different in the other two test cohorts (both, p < 0.001) (Fig. 5E, F). In brief, more deaths were more commonly found during the follow-up period in high-risk patients than in low-risk patients; a higher proportion of low-risk patients received potentially curative therapy (liver transplant, repeat liver resection, or ablation) than high-risk patients.

Fig. 5
figure 5

Comparing the survival among different risk level groups based on the two prognostic models. According to the risk scores from the pre-nomogram, the HCC patients were divided into high-, and low-risk groups AC. Kaplan–Meier (KM) curves for the overall survival (OS) of HCC patients in these two risk level groups in A the training cohort, B the internal test cohort, C the first external test cohort. KM analysis of the risk scores for OS among the high-,and low-risk groups based on the post-nomogram (post-HAICN) in D the training cohort, E the internal test cohort, F the external test cohort

Discussion

According to current guidelines, the standard treatment for advanced HCC is sorafenib. HAIC is now being applied, mostly in Asia. However, large randomized trials are still lacking. In the EACH and other previous studies, the value of systemic chemotherapy with the FOLFOX regimen for the treatment of advanced HCC was confirmed [20,21,22]. The continuous infusion of chemotherapeutic drugs by hepatic intra-arterial therapy contributed to the adequate local drug concentration for targeted tumours. Previous studies have reported an encouraging safety profile and antitumour activity when treating locally advanced HCC with FOLFOX–HAIC [23, 24]. However, high-level tumour heterogeneity at the histologic, genomic, and molecular levels results in a certain range of individualized differences in the prognosis of HCC patients who undergo HAIC. Therefore, we developed and validated an MDLRN integrating clinical variables, DLR score and histologic features to provide indicators for selecting HAIC and therapeutic strategies in large HCC.

The MDLRN was constructed by a multiple-task deep learning model, clinical factors and radiomics features. The multiple-task deep learning model was constructed by MobileNetV1 feature extraction and 5 classification tasks (MTM prediction, 1-, 2-, 3- and greater than 3-years OS. Multitask deep learning has taken the pathological features of the MTM subtype into consideration. Thus, the same extraction module was used in the MTM scoring task and OS scoring tasks, further demonstrating the relationship between MTM and OS. Moreover, the postoperative pathological features could be predicted by the preoperative model, which can improve the performance of the preoperative model.

In this study, we aimed to identify a certain histologic subtype of HCC, which was defined as the MTM subtype. Notably, the incidence of this histologic subtype increased with increasing tumour diameter in previous studies. In MTM related cohort with 159 patients who underwent SR, 74.2% (118/159) of patients were diagnosed pathologically with the MTM subtype. We developed a DLR-based model for predicting the MTM subtype that showed outstanding performance (AUC, 0.98 in the TC, 0.84 in the IVC and 0.72 in the EVC). In addition, our results showed that a high serum AFP level was an independent predictor of the MTM subtype [11], which was consistent with previous reports. However, no incremental increase in value was observed with the addition of the DLR model to predict the MTM subtype. We also showed that a low baseline DLR score for MTM status was associated with OS (HR, 0.85; p < 0.001) in patients with large HCC who underwent HAIC, indicating its potential clinical application.

This study developed and validated an MDLRN for predicting the OS of patients with large HCC receiving initial HAIC based on CECT data from 752 patients in OS cohort, and the model could accurately stratify patients with large HCC into two prognostic subgroups with significantly different OS. In this study, three attempts were made, as follows: first, we built the MDLRN comprising preoperative and postoperative clinical variables, DLR signatures and MTM subtype for the prediction of OS; second, the entire liver parenchyma was automatically segmented as an ROI using the ResU-Net algorithm for feature extraction; third, we provided an MDLRN-based system as a visualized web tool to recommend suitable patients with large HCCs for HAIC treatment, achieving a good predictive performance (AUC, 0.87 in IVC; AUC, 0.83 in the EVC).

The DLR analysis highlighted the potential important roles of tumour burden and distribution in the entire liver parenchyma as well as the tumour microenvironment (TME) in prognostic prediction. Exposure of the targeted tumours to chemotherapy drugs over multiple cycles is closely related to treatment response [25]. Previous studies have suggested that a larger tumour burden and more dispersed distribution both weaken the effect of chemotherapy [4]. Similarly, our DLR visualization results were consistent with the abovementioned hypothesis. The predicted death group had a higher intensity heatmap than the predicted survival group, suggesting the importance of tumour burden and distribution. Moreover, previous studies exploring the mechanisms of conventional chemotherapy resistance have revealed the involvement of TME components and seem to explain the relationship between the status of the TME and the response to chemotherapy [26, 27]. In our heatmap, a higher intensity distribution may be consistent with TME component assembly, including for the ECM, proteoglycans, immune cells and hypoxic environment. This hypothesis needs further experimental research on the underlying mechanism.

The MDLRN based on preoperative DLR scores and clinical parameters should be useful for patient stratification before HAIC, allowing clinicians to optimize treatment, such as switches to SR and LT. Once the patients undergoes HAIC, the post-MDLRN, which was built with postoperative clinical parameters, including number of sessions of HAIC, response to HAIC and predicted MTM subtype, has significantly higher predictive performance and can be used to design individualized surveillance and therapeutic strategies. Through patient stratification performed by the MDLRN, an intensive surveillance regimen, and even some aggressive or expensive preventive and adjuvant therapies, including preventive multitargeted tyrosine kinase inhibitors [TKIs] and programmed cell death protein (PD)-1 therapies, can be considered to prolong the OS of high-risk patients [28, 29]. On the other hand, low-risk patients may receive less intensive surveillance regimens and more prudent consideration of aggressive or expensive preventive therapies after HAIC to reduce the probability of negative effects and the high cost of these examinations and therapies.

There are some limitations to our study. First, selection bias is unavoidable in observational studies and may affect the real outcomes. Second, we did not perform manual delineation of the tumour area to extract features. Whether the predictive ability of the MDLRN model would significantly improve over that of a model based on the entire tumour ROI remains to be further tested in external cohorts. Third, as time progresses, the therapeutic techniques for HCC are constantly being updated and improved, such as the adjustment of HAIC chemotherapy drug regimens and the improvement of HAIC combined with molecular targeted drugs. This will have a certain degree of impact on outcome prediction and is inevitable Fourth, clinical information regarding complications during and after HAIC and TKI treatment were not analysed, warranting further investigation. Given these limitations, the MDLRN model requires further validation as an OS stratification tool for HAIC in patients with HCC before being applied in other study settings.

In conclusion, MTM is an important prognosis factor for HCC patients which was taken into consideration for building the multitask DLR method. The model could predict the prognosis of HCC patients who underwent HAIC and showed excellent performance in two test cohorts, demonstrating its robustness and effectiveness. Therefore, this tool may help physicians with therapeutic decision making and surveillance strategy selection in clinical practice.