Introduction

Liver cancer is the sixth most lethal malignancy globally, and hepatocellular carcinoma (HCC) accounts for  ~ 90% of primary liver cancer cases [1, 2]. Individualized prognostication is critical for informing optimal patient care in HCC. To this end, several staging systems have been developed based on tumor burden (e.g., size, number, vascular invasion, and extrahepatic metastasis), liver function, and patient performance status [3,4,5,6,7]. Among them, the most widely used system is the Barcelona Clinic Liver Cancer (BCLC) system, which allows prognostication and subsequent treatment allocation [3]. Nonetheless, there is great survival heterogeneity in each risk subgroup [8, 9], shedding light on the unsatisfactory performance of current systems in profiling the comprehensive landscape of tumor aggressiveness and the unmet need for HCC prognostication refinement.

Hepatectomy is the backbone for curative-intent treatment in early-stage HCC, offering a 5-year survival of 35–70% [2]. Nevertheless, controversy still shrouds the surgical indications for HCC. In line with the BCLC system, Western practice guidelines restrict liver resection to patients with very early- to early-stage HCC [10, 11]. However, growing evidence showed more favorable outcomes for liver resection in selected patients with intermediate- to advanced-stage HCC in comparison to transarterial chemoembolization and systemic therapies [12,13,14,15,16], as incorporated into major Asian guidelines [7, 17,18,19]. In spite of this, data remain scarce regarding the optimal surgical candidates among patients with intermediate- to advanced-stage HCC.

Overall survival (OS) has been widely accepted as the most important endpoint in oncology and HCC research and is not subject to investigator bias. Encouraging results have been reported on the utility of gadoxetate disodium–enhanced magnetic resonance imaging (EOB-MRI) features for noninvasive prediction of worse OS in HCC patients after treatment [20,21,22,23]. Nevertheless, these studies were hampered by a small sample size (e.g., 120–376 patients) and lack of external validation. Additionally, there are limited data on the comparison between the EOB-MRI and extracellular contrast agent-enhanced MRI (ECA-MRI) in HCC prognostication, whilst the latter is a more available, cost-effective, and time-efficient examination with superior arterial phase quality [24]. Furthermore, to our knowledge, few attempts have been made to assess the capacity of MRI-based prognostic tools in informing surgical recommendations for patients with intermediate- to advanced-stage HCC.

Therefore, we aimed to develop and externally validate a risk score based on preoperative clinical-radiological data to predict OS in HCC patients after hepatectomy. Additionally, we sought to investigate whether this score could be used to stratify prognosis and identify patients with intermediate- to advanced-stage HCC who would potentially achieve favorable prognosis after surgery.

Materials and methods

This retrospective dual-institutional study was approved by the institutional review boards of West China Hospital, Sichuan University (Chengdu, Sichuan, China) and Henan Provincial People’s Hospital (Zhengzhou, Henan, China). The requirements for informed consent were waived.

Patients

From July 2015 to November 2020, consecutive patients who underwent preoperative contrast-enhanced EOB-MRI within 2 months before resection for HCC at institution 1 were retrospectively enrolled and constituted the training cohort (termed the “EOB-MRI cohort”). The inclusion criteria were (a) surgically proven HCC; (b) R0 resection (defined as the complete macroscopic removal of tumor with a negative microscopic margin); (c) absence of a previous history of HCC treatment; and (d) absence of any co-malignancy other than HCC. The exclusion criteria were (a) distant metastasis at preoperative work-ups; (b) ruptured HCC; (c) incomplete clinical or pathological data; (d) inadequate MR images for analysis; and (e) without follow-up information.

If the developed risk score incorporated HBP imaging features, internal validation would be performed in the EOB-MRI cohort; otherwise, internal validation would be performed in an independent ECA-MRI cohort to test the model’s generalizability in broader populations. Specifically, from July 2010 to December 2021, consecutive patients who underwent preoperative ECA-MRI within 2 months before resection for HCC at institution 1 following the same eligibility criteria were enrolled and constituted the “ECA-MRI cohort.” The ECA-MRI-based internal validation cohort was created using propensity score matching as detailed in the “Statistical analysis” section. The selection of these two MRI contrast agents was based on the clinicians’ recommendations and patients’ preferences.

From April 2014 to March 2019, consecutive patients who underwent EOB-MRI within 2 months prior to resection for HCC at institution 2 following the same eligibility criteria were retrospectively enrolled and formed the external validation cohort.

In all study cohorts, data including clinical information, laboratory indexes (i.e., aspartate aminotransferase, alanine aminotransferase, total bilirubin, albumin, platelet, prothrombin time [PT], the international normalized ratio [INR], alkaline phosphatase, gamma-glutamyl transferase, and alpha-fetoprotein [AFP]) within 1 month prior to surgery and histopathological results were recorded. Baseline laboratory data with  ≤ 5% missingness were imputed by medians, with 0.4% (2/520) of missing values for AFP, 0.2% (1/520) for PT, and 0.5% (2/420) for INR. Cirrhosis was diagnosed by the recommendations in the Clinical Practice Guidelines [25]. The albumin-bilirubin (ALBI) score was calculated using previously described approaches [26].

MRI technique

MRI was performed with six 3.0-T systems and two 1.5-T systems. Liver MRI sequences included: T2-weighted imaging, diffusion-weighted imaging with apparent diffusion coefficient maps, T1-weighted in- and opposed-phase imaging, and T1-weighted dynamic imaging consisting of precontrast phase, late arterial phase, portal venous phase, delayed phase (ECA-MRI) or transitional phase (EOB-MRI), and HBP (EOB-MRI) images. Details on MRI protocols are provided in Supplementary Material 1 and Table S1.

Image analysis

All deidentified MR images were transmitted to institution 1 and reviewed independently by two fellowship-trained abdominal radiologists (readers 1 and 2, with 7 and 10 years of experience in liver MRI, respectively) who were informed that all enrolled patients had HCC, but were unaware of the remaining clinicopathological and follow-up information. Any discrepancy in imaging interpretation was resolved by a senior abdominal radiologist (reader 3, with over 20 years of experience in liver MRI).

On a per-patient basis, the following features were evaluated: (a) imaging features related to tumor burden [10, 11, 27]; (b) presence or absence of major, ancillary, LR-TIV and LR-M features as defined by Liver Imaging Reporting and Data System (LI-RADS) version 2018 [27]; (c) presence or absence of other imaging features related to tumor biology or patient outcomes [28,29,30,31]; and (d) presence or absence of imaging features related to underlying liver diseases [32, 33] (Table 1). For multiple tumors, radiologic features of the largest tumor were recorded for analysis.

Table 1 All evaluated MRI features and the definitions

Follow-up

OS was defined as the time interval from hepatectomy to death from any cause, and patients who were alive were censored at the date of the last follow-up. Follow-up ended on June 15, 2022, for institution 1 and July 30, 2021, for institution 2.

Statistical analysis

Categorical variables were compared using the chi-square test or Fisher's exact test, while continuous variables were compared by the Student’s t test or Mann–Whitney U test, as appropriate. Interobserver agreement was assessed by computing Cohen’s κ statistics for binary features, weighted κ statistics for categorical features, and intraclass correlation coefficient for continuous variables, respectively.

Propensity score matching

Propensity score matching was performed to minimize the effects of potential confounders and selection bias between the training and internal validation cohorts [34]. The propensity score was estimated by logistic regression, with covariates including sex, age, liver cirrhosis, BCLC stage, and mortality as independent variables, and the type of study cohorts (training vs. internal validation cohorts) as the dependent variable for model fitting. Enrolled patients were matched using 1:1 optimal pair matching. The standardized mean difference was calculated to assess the covariate balance between the two matched cohorts, with a goal-to-achieve value  < 0.15. For the matched data, categorical variables were compared using the McNemar test, whilst continuous variables were compared using the paired t test or Wilcoxon signed-rank test, when applicable.

Development and validation of the risk score

Univariable Cox regression analysis was performed to identify potential predictors in the training cohort, whilst adjusting for patients’ sex and age. To develop an easy-to-apply score, continuous variables were transformed into binary variables as per normal ranges of laboratory indexes or clinical relevance. Spearman’s correlation coefficient was calculated to investigate the collinearity between variables; when collinearity was encountered, variables with the largest hazard ratios in the univariable analysis were selected for further analysis.

Variables with a p < 0.01 at univariable analysis were entered into the multivariable Cox regression model; the final model was selected using the backward stepwise approach with Akaike Information Criterion and five-fold cross-validation. A risk score was constructed based on the final multivariable Cox regression model. The effect of the variable with the highest β-coefficient was assigned 20 points, and all scaled β-coefficients were rounded to the nearest integer. Harrell’s concordance index (C-index) was used to measure the discrimination of the risk score [35], and the calibration curve was drawn to assess model calibration [36].

Survival analysis

OS was calculated by the Kaplan-Meier method and compared by the log-rank test, with a false discovery rate-adjusted approach applied [37]. To classify patients into high- and low-risk survival groups, the optimal cutoff value of the risk score was determined by X-tile software (version 3.6.1). Subgroup analyses were performed based on six available clinical-radiological-pathological variables known to affect HCC prognosis, including tumor size, tumor-in-vein, microvascular invasion (MVI), tumor differentiation, liver cirrhosis, and the ALBI grade. The prognostic value of the risk score was also assessed in subgroups of patients undergoing resection within (stage 0-A) and beyond (stage B-C) BCLC criteria. To ensure an adequate number of patients for each subgroup, the training and internal validation cohorts were used for subgroup analyses.

Statistical analyses were performed using R software (version 3.5.1; The R Foundation for Statistical Computing) and SPSS Statistics (version 26.0; IBM). A two-tailed p < 0.05 indicated a statistically significant difference.

Results

Patient characteristics

At institution 1, a total of 210 patients (mean age ± standard deviation, 52.1 ± 11.6 years; 173 men) in the “EOB-MRI cohort” constituted the training cohort. Given that no HBP imaging features were included in the final prognostic model, we matched 210 patients (mean age ± standard deviation, 53.5 ± 11.0 years; 178 men) in the “ECA-MRI cohort” for internal validation, with an adequate balance of all matching variables (Table 2 and S2). At institution 2, a total of 100 patients (mean age ± standard deviation, 56.2 ± 10.0 years; 81 men) were included and constituted the external validation cohort (Fig. 1).

Table 2 Baseline patient characteristics and MRI features
Fig. 1
figure 1

Flowchart of patient selection. EOB-MRI, gadoxetate disodium-enhanced magnetic resonance imaging; ECA-MRI, extracellular contrast agent-enhanced magnetic resonance imaging; HCC, hepatocellular carcinoma; MR, magnetic resonance

Patients from the training cohort were younger than those from the external validation cohort (mean age, 52.1 vs. 56.2 years; p = 0.002), with less frequent cirrhosis (49.5 vs. 70.0%; p = 0.001) and larger tumors (median size, 4.1 vs. 3.4 cm; p = 0.022). The median follow-up period was 48.1 (interquartile range [IQR]: 24.5–66.7), 51.3 (IQR: 35.4–72.8), and 35.5 (IQR: 33.3–49.2) months for the training, internal validation and external validation cohorts, respectively. Baseline patient characteristics and MRI features are summarized in Table 2.

Development and validation of the OSASH score for predicting OS

In the training cohort, 11 variables were significantly associated with worse OS at univariable Cox regression analysis, and four of them (i.e., incomplete tumor “capsule”, mosaic architecture, tumor multiplicity, and serum AFP  > 400 ng/mL) were included in the final model at multivariable analysis (Table 3). By incorporating these variables, a simplified risk score termed the “OSASH score” (named by incorporating the key letters of overall survival after surgery for HCC) was developed (Fig. 2).

Table 3 Univariable and multivariable Cox regression analysis of prognostic factors for overall survival in the training cohort
Fig. 2
figure 2

Graphical illustration of the OSASH score. AFP, alpha-fetoprotein; DP, delayed phase; HBP, hepatobiliary phase; HCC, hepatocellular carcinoma; PVP, portal venous phase

The C-index of the OSASH score for predicting OS was 0.85 (95% confidence interval [CI]: 0.78–0.91), 0.81 (95% CI: 0.75–0.88), and 0.62 (95% CI: 0.49–0.75) in the training, internal validation and external validation cohorts, respectively. The calibration plots showed an overall good agreement between the score-predicted risk of death and the observed incidence of death in all cohorts (Figure S1). Interobserver agreement for the OSASH score was presented in Table S3.

Survival risk stratification based on the OSASH score

Using 32 as the cutoff point for the OSASH score derived from the training cohort (Table 4; Fig. 3A), patients in the internal validation cohort were divided into prognostically distinct low and high-risk groups (5-year OS rates, 88.5% vs. 40.5%; p < 0.001) (Table 4; Fig. 3B). Similar results were also obtained in the external validation cohort (5-year OS rates, 74.3% vs. 60.0%; p = 0.039) (Table 4; Fig. 3C).

Table 4 Median OS, 3- and 5-year OS rates, and hazard ratios for OSASH score risk subclasses in all cohorts
Fig. 3
figure 3

Kaplan-Meier curves demonstrating differences in OS between the OSASH-low and OSASH-high risk patients with HCC in the (A) training, (B) internal validation, and (C) external validation cohorts. HCC, hepatocellular carcinoma; OS, overall survival

The OSASH score predicted OS in six subgroups

In each subgroup, as mentioned above, OSASH-low-risk patients had significantly longer OS than that of OSASH-high risk patients in both the training (Table 5; Figure S2) and internal validation (Table 5; Figure S3) cohorts (all p < 0.05).

Table 5 5-year OS rates and hazard ratios for OSASH score risk subclasses in six subgroups in the training and internal validation cohorts

Prognostic impact of the OSASH score in patients across different BCLC stages

Survival risk stratification based on the BCLC algorithm

In the training cohort, comparable outcomes were obtained for patients with BCLC stage 0 and A HCC (5-year OS rates, 100.0% vs. 89.6%; p = 0.178) and for patients with BCLC stage B and C HCC (5-year OS rates, 52.7% vs. 49.4%; p = 0.185) (Table S4; Fig. 4A). In the internal validation cohort, patients with BCLC stage 0 and A HCC had similar outcomes (5-year OS rates, 96.3% vs. 92.7%; p = 0.444), whereas patients with BCLC stage B HCC had significantly longer OS than that of those with BCLC stage C HCC (5-year OS rates, 85.5% vs. 31.2%; p < 0.001) (Table S4; Fig. 4B).

Fig. 4
figure 4

Kaplan-Meier curves according to (A, B) BCLC stage and (C, D) the OSASH score combined with BCLC stage subgroups in the training and internal validation cohorts. BCLC, Barcelona Clinic Liver Cancer

Incremental prognostic value of the OSASH score to the BCLC algorithm

To further assess the incremental value of the OSASH score to the BCLC algorithm, subgroup analyses were performed in patients with BCLC stage 0-A and B-C HCC. In both the training and internal validation cohorts, OSASH-low-risk patients achieved significantly longer OS than that of OSASH-high-risk patients in either patients with BCLC stage 0-A HCC (internal validation cohort: 5-year OS rates, 94.6% vs. 77.8%; p = 0.034) or with BCLC stage B-C HCC (internal validation cohort: 5-year OS rates, 74.7% vs. 28.7%; p < 0.001) (Table S4; Fig. 4C and D).

Furthermore, in the training cohort, patients with BCLC stage B-C HCC and OSASH-low risk had significantly longer OS than those with BCLC stage 0-A HCC and OSASH-high risk (5-year OS rates, 67.5% vs. 40.0%; p = 0.023) (Fig. 4C). In the internal validation cohort, patients with BCLC stage B-C HCC and OSASH-low risk achieved comparable outcomes to that of those with BCLC stage 0-A HCC and OSASH-high risk (5-year OS rates, 74.7% vs. 77.8%; p = 0.964) (Fig. 4D and S4).

Discussion

In this large dual-institutional cohort study, by integrating three MRI features and serum AFP, we developed and validated a preoperative risk score for the prediction of OS in HCC patients undergoing hepatectomy. The score displayed similarly good prognostic performance in the training and internal validation cohorts but a reduced performance in the external validation cohort. It was capable of stratifying patients into 2 prognostically distinct risk strata among all study cohorts and six subgroups. This was clinically relevant because it may allow the identification of a small portion of patients at high risk of death, for whom more intensive surveillance could be considered, and adjuvant therapies might provide survival benefit. Furthermore, among patients with BCLC stage B and C HCC without extrahepatic metastasis, this score identified a subgroup of low-risk patients who achieved favorable prognoses after resection, suggesting its potential value to complement traditional staging systems for OS prediction.

In the present study, 38.5% (200/520) of surgical patients had intermediate- to advanced-stage HCC. Despite representing a marked deviation from the current BCLC recommendations [3], our study population captured the context of real-world clinical practice of high-volume tertiary care centers in China, where a proportion of patients underwent hepatectomy for intermediate- to advanced-stage HCC according to the multidisciplinary team recommendations and patients’ preferences [7]. However, patients with intermediate- to advanced-stage HCC harbor substantial tumor heterogeneity, hence a preoperative patient selection is critical to identify the optimal surgical candidates. In our study, patients with BCLC stage B-C HCC and OSASH-low risk might be potential surgical candidates because these patients showed 5-year OS rates approaching 70%, which were similar to patients with BCLC 0-A HCC and OSASH-high risk. However, future prospective multi-institutional studies are required to test the reliability and reproducibility of our findings.

To date, EOB-MRI has been more commonly used in published prognostic researches, mainly because it can additionally provide HBP imaging features for analysis, of which some (e.g., HBP peritumoral hypointensity) have been closely linked to HCC prognosis [20, 23, 38]. Therefore, in the current study, to comprehensively explore the prognostic value of all available imaging features, the risk score was initially developed in an EOB-MRI cohort (the training cohort). However, our results showed that no HBP imaging features were independently associated with OS, which motivated us to test the reproducibility of our findings in a propensity score-matched ECA-MRI cohort. It turned out that the ECA-MRI cohort showed comparably discriminatory power as the EOB-MRI cohort. These preliminary observations suggest that the risk score was applicable for both EOB-MRI and ECA-MRI to help predict OS for HCC patients undergoing surgical resection.

The OSASH score was constructed with 4 variables profiling the tumor burden (tumor multiplicity) and biology (incomplete tumor “capsule,” mosaic architecture, and AFP). The mechanisms underlying these clinic-radiological alterations are still in research. The presence of an incomplete tumor “capsule” often indicates infiltrative tumor growth and poorer survival [39]. Previous studies have identified incomplete tumor “capsule” as an imaging marker for predicting MVI [40], postoperative extrahepatic metastasis [41], and high BRAF and RAF1 expression in HCC [42], and the latter could accelerate tumor proliferation and differentiation and promote tumor invasion and metastasis. Mosaic architecture refers to the presence of randomly distributed internal nodules or components, usually with different imaging features in terms of enhancement, intensity, shape, and size [27]. Histopathologically, it corresponds to the appearance of different foci of clonal expansion at various stages of hepatocarcinogenesis, of which some may comprise fat metamorphosis, necrosis, blood products, cystic degeneration, and fibrosis septa [43]. Therefore, mosaic architecture is regarded as an imaging marker of tumor heterogeneity at the histological level, while the latter is a critical prognostic element. Serum AFP  > 400 ng/mL also denoted a worse OS in our study, keeping in line with previous reports [44, 45]. AFP can promote tumor growth partly by the inhibition of apoptosis; besides, it was also associated with the upregulation of vascular endothelial growth factor signaling, thereby promoting tumor angiogenesis and metastasis [46].

This study had several limitations. Firstly, there might have been unavoidable selection bias owing to the retrospective design. Particularly, our patients were enrolled over a time span of 11 years, which could be a potential source of bias given the evolutions in MRI and surgical techniques. Apart from that, there was a time difference of 1 year in the last follow-up time between the two institutions, which might have impacted our results. Thus, further prospective multicenter studies are warranted to validate the presented findings. Secondly, the OSASH score did not achieve a good prognostic performance in the external validation cohort, which might have been due to the relatively small external sample size and the substantial heterogeneities in the study population, MRI parameters, and surgical techniques between the two institutions. Therefore, future large-scale studies are needed to verify the generalizability of the OSASH score in different populations. Thirdly, due to the smaller sample size in the external validation cohort, it was impossible to conduct further survival analyses to examine whether OSASH-low-risk patients had better outcomes than OSASH-high-risk patients in different subgroups, as shown in the internal validation cohort. Hence, the predictive ability of the OSASH score in various subgroups requires to be further externally validated using another larger cohort.

In conclusion, by incorporating three preoperative MRI features and serum AFP, we developed and validated the OSASH score for the prediction of postsurgical OS in HCC patients, which identified a subgroup of low-risk patients with BCLC stage B and C HCC who achieved favorable prognosis after resection. Future multicenter prospective studies with rigorous design are needed to validate our findings.