Introduction

Hepatocellular carcin oma (HCC) is the most common form of primary liver cancer, with over 80% of patients developing cirrhosis [1,2,3,4]. Surgical liver resection remains the mainstay of curative therapy for those with very early/early-stage tumor (Barcelona Clinic Liver Cancer [BCLC] 0/A) and preserved liver function [5, 6]. However, recurrence after resection adversely affects long-term prognosis, with a prevalence that has been reported to be as high as 60–70% within 5 years [7, 8].

Two types of HCC recurrence have been proposed: early recurrence (within 2 years after resection) and late recurrence (more than 2 years). Specifically, early recurrence has been associated with the aggressiveness of the primary HCC and is likely due to its intrahepatic metastasis. Conversely, late recurrence is more frequently regarded as the de novo multicentric tumor, which may be more related to the severity of the underlying liver disease [9, 10]. Identifying patients at increased risk of late recurrence can help tailor personalized treatment and follow-up strategy and consequently improve survival. To that end, cirrhosis has been reported as the predominant risk factor for late recurrence [9, 11], and a recent study found that spleen stiffness measured by transient elastography, which reflects the degree of portal hypertension, was the only predictor of late ecurrence [12]. However, spleen stiffness is not routinely measured during the preoperative workups of HCC. Furthermore, the added value of spleen measurement in patients with established cirrhosis remains unclear.

Similar to spleen stiffness, an increase in spleen volume has been positively associated with the severity of chronic liver disease (CLD) [13, 14] and is thus a potential predictive marker of late recurrence. Besides, the measurement of spleen volume is simpler than spleen stiffness with the aid of artificial intelligence (AI) techniques. On the other hand, in recent years contrast-enhanced MRI has shown promising prognostic utilities in HCC as it permits evaluation of the morphology, hemodynamics, metabolism, and function of the liver and tumor via multiparametric imaging sequences [15]. Therefore, MRI allows for accurate measurement of volumetric indices, simultaneously providing critical information regarding tumor aggressiveness.

Therefore, we aimed to explore risk factors of late HCC recurrence, especially spleen volume measured with AI techniques on preoperative MRI, and to develop a risk score for prediction of late recurrence and individualized risk stratification.

Materials and methods

Study population and data acquisition

The protocol of this retrospective study conforms to the ethical guidelines of the 1975 Declaration of Helsinki and was approved by the institutional review board. The study adhered to the TRIPOD guideline for developing and validating a prognostic model [16].

From January 2011 to May 2020, consecutive patients with HCC who underwent curative-intent resection and preoperative MRI scan at a tertiary academic hospital were screened. Inclusion criteria were as follows: (a) successfully followed up for at least 2 years (i.e., did not die, experience early recurrence, or lost to follow-up within 2 years), (b) with pathologically confirmed HCC, (c) with established cirrhosis determined based on liver biopsy or clinical history and typical imaging features. Exclusion criteria included the following: (a) received any prior procedure for the liver and/or spleen (e.g., transjugular intrahepatic portosystemic shunt, splenectomy, or partial splenic embolization); (b) presence of portal invasion and/or extrahepatic spread; (c) had any co-malignancies other than HCC; (d) had ruptured tumors; (e) received any adjuvant treatments before recurrence; (f) the interval between MRI and resection exceeded 1 month; and (g) had inadequate MRI image quality (e.g., severe artifact).

Baseline data were extracted from the electronic medical records, including patient demographics, clinical characteristics, laboratory results, and surgical information. For laboratory results, we collected the most recent values recorded before surgery. The pathological parameters included tumor number, maximum size, tumor differentiation, surgical margin width, microvascular invasion, and histological grade of liver fibrosis evaluated using either the Ishak or the Scheuer grading system [17, 18].

MRI examination and qualitative imaging analysis

All MRI examinations were performed using either four 3.0-T systems or a 1.5-T system. Details of MRI acquisition protocols are provided in Supplementary materials.

Two abdominal radiologists (with 7 and 10 years of experience in liver MRI) who were blinded to all clinical, pathological, and follow-up data independently performed the imaging analyses. The following items were evaluated for each patient: (a) location, size, and number of tumors; (b) Liver Imaging Reporting and Data System (LI-RADS v.2018) major and ancillary features (except for those related to growth and ultrasound visibility); (c) other imaging features that reflect tumor aggressiveness (e.g., margin, growth subtypes, internal artery, bilobar involvement). Discrepancies were resolved by conducting a consensus review session with a third senior radiologist who had over 20 years of experience in liver MRI.

Measurement of liver and spleen volumetric indices

All MRI images were transferred to a post-processing workstation dedicated to 3D volumetric analyses. An independent radiologist who was not involved in the imaging analyses performed volumetric analysis with a commercially available, fully automated volumetric software (SenseCare, Shanghai, China). Specifically, portal venous phase images were automatically segmented and the liver (including each lobe) and spleen volumes (cm3) were calculated by summing the corresponding consecutive areas and multiplying by the slice thickness (Supplementary Fig. 1). The calculation process took approximately 3 min for each case. The radiologist reviewed all segmentation generated by the software and corrected segmentation errors manually. The following volumetric indices were recorded: total liver volume, left/right/caudate lobe volume, spleen volume, and tumor volume.

Follow-up and endpoint definition

As per protocol, a follow-up procedure was scheduled for all patients after surgery at 1 month, every 3 months for the first 2 years, and then every 6 months thereafter, or as clinically required, supplemented with telephone interviews every 6 months. Patients were followed until death, lost to follow-up, or the end of this study (May 1, 2022), and data were censored at the end of follow-up.

The primary endpoint for the study was late HCC recurrence, defined as radiological or pathological identification of local disease or distant metastasis at least 24 months after surgery. Secondary endpoints during follow-up included recurrence-free survival (RFS) and overall survival (OS).

Statistical analysis

Quantitative variables were reported as means and standard deviation (SD) or median and interquartile range (IQR). Categorical variables were reported as frequencies and percentages. Comparisons between groups were conducted by t-test, Mann-Whiney test, chi-squared test, or Fisher’s exact test as appropriate. Correlation between volumetric indices, laboratory, and pathologic parameters was assessed with either Pearson or Spearman correlation. Kaplan–Meier curves with log-rank tests were used for survival analysis, and Cox proportional hazards model was used to estimate the multivariable-adjusted hazard ratios (HRs) and 95% confidence intervals (95%CIs). The non-linear association between spleen volume and risk of late recurrence was assessed using restricted cubic splines fitted in Cox regression model.

For the prediction of late recurrence, a risk prediction score was developed. Variables with a p value < 0.1 in the univariable Cox analysis were entered into the multivariable analysis. Single components of the established scores (e.g., Child–Pugh, MELD, FIB-4) were not entered simultaneously to avoid multicollinearity. A backward stepwise procedure based on the Akaike information criteria was used to simplify the model and identify the best subset of independent predictors. The final score was built by fitting a Cox model, with 5-year recurrence as the endpoint, and the coefficients estimated for each predictor were used as relative weights to compute the linear predictor.

We assessed the predictive accuracy of the score with discrimination and calibration. Discrimination was assessed with the time-dependent area under the receiver operator characteristic (td-AUC) curves. Calibration was assessed statistically by computing the Brier score and graphically by generating the calibration plot. Besides, the developed score was internally validated using bootstrap resampling method (with 1000 replicates) to examine optimism in score performance. To identify two risk categories with distinct risk of late recurrence, a cutoff value was used as the linear predictor of the final score at its 80th percentile. Furthermore, the final score was applied for the prediction of different patterns of late recurrence (definitions are provided in the Supplementary materials).

Two sets of sensitivity analyses were implemented to examine the robustness of our results. First, Fine and Gray’s subdistribution hazards regression model was applied to evaluate the possible influence of competing events on the endpoint, taking death as a competing event for late recurrence. Second, to evaluate whether the discriminative ability of the developed score could be improved by parameters obtained after surgery (e.g., surgery-related and pathological indices), we included additional variables significantly associated with the endpoint (clinically or statistically) to the score individually.

Missing values were assumed to be missing at random and thus were imputed with multiple imputation with chained equations. A two-tailed p value < 0.05 was considered statistically significant. All statistical analyses were performed with R software (version 4.2.2).

Results

Baseline characteristics

As of May 2020, 301 eligible patients who were alive and free of early recurrence at 2 years after curative-intent resection were finally included (Fig. 1), with a mean age of 52.9 years (SD 10.6) and a sex distribution of 86.7% (261) males. The etiology of CLD was mostly related to hepatitis B virus infection (94.7%). Two hundred fourteen (71.1%) cases were categorized as BCLC stage A, with most having a single resectable tumor. The median interval between MRI examination and surgery was 8 days (IQR 3–8). Baseline characteristics of the study population are demonstrated in Table 1.

Fig. 1
figure 1

Flow diagram of the study population. HCC hepatocellular carcinoma, TIPS transjugular intrahepatic portosystemic shunt, PSE partial splenic embolization

Table 1 Demographics and baseline characteristics of the study population

During a median follow-up of 48.8 (IQR 36.6–65.8) months, 84 (27.9%) patients in the entire cohort developed late recurrence. The cumulative incidence of late recurrence at 3, 4, and 5 years was 11.0%, 19.2%, and 29.1%, respectively. Patients with and without late recurrence had similar age, sex distribution, and etiology of CLD, while patients with the outcome had more severe liver disease and worse liver function (Table 1).

Predictive value of spleen volume

Among the volumetric indices, only spleen volume was different between patients with (median = 321.4 cm3, IQR 217.2–480.8 cm3) and without late recurrence (median = 268.9 cm3, IQR 200.6–376.4 cm3, p = 0.006) (Fig. 2a). In addition, spleen volume showed moderate-to-poor correlation with laboratory and pathological parameters that reflect the severity of liver fibrosis and cirrhosis (r < 0.6 for all correlation), while no correlation was observed between other volumetric indices and laboratory/pathological parameters (r < 0.3 for all correlation) (Fig. 2b).

Fig. 2
figure 2

Analysis of spleen volume. a Distribution of spleen volume between the non-recurrence and recurrence groups. b Correlation between volumetric indices and laboratory/pathological parameters reflecting the severity of liver fibrosis/cirrhosis. Liver-related volumetric indices were calculated by excluding tumor volume. Black cross in each cell indicates a correlation p value > 0.05

In univariable Cox regression analysis, variables associated with late recurrence (p < 0.1) were mostly related to the severity of underlying liver disease, except for tumor number and bilobar involvement of the liver (Table 2). For imaging features reflecting tumor aggressiveness, nonrim arterial phase hyperenhancement, nonenhancing capsule, and targetoid transitional or hepatobiliary phase appearance were the only significant predictors in univariable analysis (Table 3). In multivariable Cox analysis, only APRI score (HR 1.38, 95%CI 1.13–1.68, p = 0.001), tumor number (HR with 2–3 vs. 1: 2.02, 95%CI 1.15–3.65, p = 0.015; HR with > 3 vs. 1: 10.05, 95%CI 2.37–42.58, p < 0.001), and spleen volume (HR 1.01, 95%CI 1.00–1.01, p = 0.008) were associated with late recurrence, while no imaging features remained significant (Table 2).

Table 2 Uni- and multivariate Cox regression model for risk factors associated with late HCC recurrence
Table 3 Association between imaging features and late HCC recurrence in univariable Cox analysis and Fine-Gray competing risk analysis

Afterwards, a restricted cubic spline was used to demonstrate the correlation between spleen volume and risk of late recurrence (Fig. 3a). The risk of late recurrence was steady and relatively low in patients with a baseline spleen volume lower than 370 cm3, but increased progressively to the increase in spleen volume in those with a volume higher than 370 cm3. Thus, using 370 cm3 as the cutoff value, baseline spleen volume could effectively identify a subgroup of patients with a substantially higher risk of late recurrence (HR 2.02, 95%CI 1.31–3.12, p = 0.002) (Fig. 3b). Furthermore, higher baseline spleen volume was also associated with substantially worse RFS and OS after resection (Supplementary Fig. 2).

Fig. 3
figure 3

Predictive value of spleen volume. a Non-linear correlation between spleen volume and the risk of late recurrence. b Cumulative rate of late recurrence between high spleen volume (> 370 cm3) and low spleen volume (≤ 370 cm.3)

In additional subgroup analysis, the effect of spleen volume on late recurrence was more pronounced in patients with lower BCLC stage, maximum tumor size > 5 cm, and solitary and well-differentiated tumor, though the differences between subgroups were not significant (p for interaction > 0.05 for all comparisons) (Supplementary Fig. 3).

Development of a spleen volume-based prediction score

Using 5-year recurrence as the endpoint, a spleen-based risk score was developed by fitting a Cox regression model, with APRI score, spleen volume (fitted as non-linearity), and tumor number as components.

The risk score showed a td-AUC of 0.700 at 3 years, 0.701 at 4 years, and 0.751 at 5 years, and these measures were 0.693, 0.700, and 0.748 after optimism-correction in the internal validation, indicating minimal overfitting of the developed score (Fig. 4a). Calibration plot indicated good consistency between the predicted and observed endpoint, and the Brier score confirmed that the developed score was well calibrated (Brier score at 3, 4, 5 years: 9.7, 15.0, 17.2; optimism-corrected Brier score at 3, 4, 5 years: 9.9, 15.3, 17.7) (Fig. 4b). A nomogram was then generated to facilitate clinical adoptions (Fig. 4c).

Fig. 4
figure 4

Development and validation of a risk prediction score. a Time-dependent area under the curve (td-AUC) of the risk score in predicting 3-, 4-, and 5-year recurrence risk. b Calibration plot of the risk score. c Nomogram for predicting late recurrence risk by incorporating APRI score, spleen volume, and tumor number as predictors. d Cumulative rate of late recurrence stratified according to the risk groups defined by the risk score (patients with a score > 0.42 were defined as high-risk)

To identify a high-risk group, the final score was split at the 80th percentile (= 0.42). Using this cutoff, the final score could effectively stratify the entire cohort into two risk groups, with the high-risk group demonstrating a markedly higher risk of late recurrence compared with the low-risk group (3-, 4-, 5-year recurrence risk: 23.9%, 39.7%, 64.7% vs. 8.2%, 14.6%, 20.5%; p < 0.001 for each survival rate) (Fig. 4d).

Pattern of late recurrence

Among those who developed late HCC recurrence, 26 patients had intrahepatic local recurrence (ILR), 65 had intrahepatic distant recurrence (IDR), and 13 had extrahepatic metastasis (EM) (Supplementary Table 1). The risk score showed higher discriminative ability in predicting IDR (td-AUC = 0.790) compared with ILR (0.704) and EM (0.753), and the high-risk group had a higher incidence of IDR (44.3% vs. 15.8%) and EM (11.5% vs. 2.5%) compared with the low-risk group. In addition, density curves showed that IDR and EM in the high-risk group chronologically preceded that in the low-risk group, suggesting a more intensive recurrence in the high-risk group at the early stage after resection (Supplementary Fig. 4).

Sensitivity analysis

In the first sensitivity analysis that used death (over 2 years) as a competing event, predictors of late recurrence in the uni- and multivariable Fine and Gray competing risk analysis remained similar as in the Cox regression analysis (Table 3 and Supplementary Table 2). In the second analysis, no marked improvement in the predictive accuracy was observed after incorporating postoperative (i.e., surgical or pathological) parameters (Supplementary Table 3).

Discussion

Late recurrence of HCC after curative-intent resection is widely regarded as de novo multicentric tumors that are mostly related to the severity of the underlying liver disease. The present study found that preoperative spleen volume, which could be quickly and accurately measured using MRI and AI techniques, was independently associated with an increased risk of late recurrence and decreased survival in patients with cirrhosis. A risk score based on the APRI score, spleen volume, and tumor number was developed and internally validated, demonstrating good risk prediction and stratification ability.

Unlike early HCC recurrence of which the prevalence and risk factors have been well recognized, data are sparse with regard to late recurrence. A prospective study reported that 27 out of 87 (31.0%) patients with HCC and CLD developed late recurrence after liver resection [12]. Another large-scale multicenter study based on an HBV-predominant cohort reported an incidence of 41.3% during a median follow-up of 78.0 months [19]. The late recurrence rate was comparatively lower in our cohort (27.8%), which might be attributed to the higher percentage of early HCC stage we included (91.4% of patients were at BCLC-0/A and no BCLC-C were included). Of note, since cirrhosis is a well-established risk factor, this study only included patients with confirmed cirrhosis, so as to evaluate the incremental value of volumetric indices and MRI imaging features for late recurrence prediction.

The role of liver and spleen-related indices in predicting key events (e.g., occurrence of decompensation and HCC) during the natural history of CLD has been adequately addressed. Previous results found that liver stiffness was predictive of HCC recurrence after resection [20,21,22]. More recently, however, spleen stiffness instead of liver stiffness was suggested as the only independent predictor of postoperative late recurrence [12]. In addition to stiffness change, morphological change of the spleen has also been associated with the severity of liver fibrosis [13, 14, 23], as progression in fibrosis results in a reduction in liver volume and increase in spleen volume. Besides, compared with a diameter measured on cross-sectional or craniocaudal image, three-dimensional spleen volume can reflect the complex morphology more accurately and thereby had superior predictive ability for decompensation [24].

Therefore, we hypothesized that spleen volume could be a quantitative parameter to reflect the severity of cirrhosis and predict the risk of late HCC recurrence. We found that preoperative spleen volume was independently associated with late recurrence, and a cutoff value of 370 cm3 was determined to identify patients with an increased risk of late recurrence, which was slightly higher than the suggested population-based reference (322 cm3) [25]. Yoo et. al [26]. proposed a cutoff of 532 mL to predict HCC occurrence in patients with compensated CLD, which was significantly higher than our determined cutoff. This difference may be attributed to the heterogeneity between HCC occurrence and late recurrence, as the pathogenesis of the latter includes multiple factors besides cirrhosis. Specifically, tumor number was identified as another independent predictor of late recurrence, which has also been observed previously [9, 27, 28]. Interestingly, tumor number was the only tumor-related parameter associated with late recurrence, whereas other imaging features or pathological indices reflecting tumor aggressiveness were not associated with the outcome. These results further confirmed that late HCC recurrence represents a distinct pattern of hepatocarcinogenesis compared with early recurrence and is more likely driven by the severity of underlying liver disease.

CT and MRI are the recommended procedures for HCC screening and postoperative surveillance [5]. Compared with CT, MRI has the advantage in providing information on tumor aggressiveness via multiparametric sequences [29, 30], allowing for simultaneously evaluating the risk of early and late recurrence. Besides, the main problem that hinders the widespread use of spleen volume is the time-consuming segmentation process and interobserver variation. With the advancements in the field of AI techniques, spleen segmentation and volume measurement can now be accomplished in a fully automated, time-efficient, and reproducible manner, making it possible to be integrated into routine clinical workflows.

Because of the heterogeneous risk of late HCC recurrence, we proposed a prognostic score to estimate the risk of late recurrence to inform individualized clinical decision-making. Specifically, patients categorized as high-risk may benefit from more regular and intensive surveillance (e.g., shorter follow-up interval and a preference for MRI over CT) [19]. Moreover, intrahepatic recurrence (both local and distant) occurred earlier in the high-risk group, and most patients with extrahepatic metastasis had simultaneous or previous intrahepatic recurrence, underscoring the need for establishing a risk score-guided surveillance strategy.

Several limitations exist in our study. First, selection and indication bias are inherent to the retrospective design. Second, as a retrospective study we were unable to obtain information on HBsAg seroclearance for patients with HBV infection and sustained virological response for patients with HCV infection, which were suggested to decrease recurrence risk, though the results remain controversial [31, 32]. Third, spleen volume was not directly compared with liver and spleen stiffness in our study. Fourth, different scanning protocols used might increase data variability and reduce results reliability. Lastly, the relatively small number of patients included limits the ability to draw firm conclusions relative to the general population, and multicenter prospective studies are warranted to validate and refine our results.

In conclusion, preoperative spleen volume measured on MRI with AI techniques is a reliable and sensitive parameter in predicting late recurrence risk after curative-intent resection in patients with HCC and established cirrhosis. A risk score based on spleen volume was developed, providing the opportunity for individualized risk stratification and tailoring of postoperative surveillance strategy.