Introduction

Primary liver cancer ranks the fourth leading cause of cancer-related death globally [1]. Hepatocellular carcinoma (HCC) accounts for more than 90% of liver cancer cases [2]. Symptoms of early-stage HCC are often insidious, thus the fact that some patients already develop huge HCC (diameter ≥ 10 cm) at the time of diagnosis [3]. Patients with huge HCC are considered to have poor prognosis because of the difficulties in R0 resection [4], which requires adequate margins and therefore demanding large extent of resection. Tumor shrinkage of neoadjuvant systemic therapy or preoperative locoregional treatment is one approach that may improve patient outcomes after hepatectomy for huge HCC [5, 6]. Association liver partition and portal vein ligation for staged hepatectomy (ALPPS) was also proven to be effective for treatment of huge hepatocellular carcinoma [7]. However, high prevalence of post-hepatectomy liver failure (PHLF) induced by the large resection extent and insufficient future liver remnant (FLR), in turn, limits the efficacy and safety of hepatectomy for huge HCC.

The concept of liver function reserve was initially established for maintenance of adequate postoperative liver function [8]. The International Study Group of Liver Surgery (ISGLS) developed the normative PHLF criteria in 2011 [9], and reducing the probability of PHLF has been among the purposes of preoperative evaluation thereafter.

To assess pre-hepatectomy liver function, serological test is one of the basic and noninvasive methods. Based on serological tests and clinical symptoms, Pugh et al. summarized the findings of Child and Turcotte and reported Child–Pugh classification in 1973 [10]. According to Child–Pugh classification, hepatectomy is considered relatively safe for patients in grade A, and somewhat helpful clinically. However, patients with Child–Pugh grade A may also differentiate greatly in liver function, with some still experiencing PHLF. Some studies also concentrate on the pathological severity of liver cirrhosis in patients with Child–Pugh grade A [11]. In comparison, albumin-bilirubin (ALBI) grade evaluation requires only two serological indices [12], while being precise, dynamic, validated in several studies in recent years [13, 14], and proven to be more effective in predicting prognosis in patients with compensated liver function than Child–Pugh classification [15]. As therapeutic modalities diversify in precision, Child–Pugh classification and ALBI grade are more frequently utilized for both surgical and nonsurgical treatment of liver cancer [16, 17]. So far, no evidence has shown the predictive value of Child–Pugh classification and ALBI grade for huge HCC patients with high PHLF probability. In the present study, we compared the significance of the Child–Pugh classification and ALBI grade in predicting PHLF grade B–C, and established a nomogram based on these two models in 343 patients with huge HCC who underwent radical surgery. The nomogram was further validated in independent internal and external cohorts.

Methods

Study population

Retrospective clinical information was collected for consecutive patients who underwent hepatectomy for HCC at Zhongshan Hospital, Fudan University and Fujian Medical University Cancer Hospital from January 2016 to December 2021. Patients enrolled in this study were screened based on the following inclusion and exclusion criteria. The inclusion criteria included (I) patients with huge HCC (maximum diameter of tumor extent ≥ 10 cm) confirmed by preoperative contrast enhanced magnetic resonance imaging (MRI) or computed tomography (CT) (Supplementary Fig. 1A and B); (II) radical resection with no tumor cells detected in the resection margins of the specimens under microscope and with re-examination of contrast enhanced MRI or CT around 1 month after hepatectomy, showing no evidence of residual disease [18] (Supplementary Fig. 1C and D); (III) complete records of preoperative and postoperative laboratory parameters and follow-up data. The exclusion criteria included: (I) previous history of liver surgery; (II) evidence of preoperative systematic or locoregional treatment; (III) evidence of macrovascular invasion or extrahepatic metastasis prior to surgery; or (IV) non-R0 resected HCC. In total, 514 eligible patients who underwent hepatectomy at Zhongshan Hospital, Fudan University (Xuhui District, Shanghai) between 2016 and 2021 were included in the study, and were divided into a training cohort and an internal validation cohort in the ratio of 2:1. Moreover, 97 eligible patients who underwent hepatectomy at Fujian Medical University Cancer Hospital (Fuzhou, Fujian Province) during the same period were included in the study as an external validation cohort.

The study was conducted in accordance with the ethical standards of the Helsinki Declaration and was approved by the Ethics Committees of Zhongshan hospital and Fujian Medical University Cancer Hospital. All patients have signed informed consent in written form before surgery.

Definitions

HCC was diagnosed based on enhanced MRI or CT and validated by pathologic evidence, based on the guidelines for the Diagnosis and Treatment of Hepatocellular Carcinoma (2019 Edition) [18]. PHLF was evaluated as defined by the International Study Group of Liver Surgery (ISGLS) [9], which was diagnosed when the bilirubin level and international normalized ratio were simultaneously elevated 5 days after hepatectomy. Patients requiring no change in the clinical management are classified as PHLF grade A. In comparison, patients requiring noninvasive intervention including daily diuretics, albumin, and fresh frozen plasma, are classified as PHLF grade B, and those requiring invasive treatments including hemodialysis, extracorporeal liver support and liver transplantation, are classified as PHLF grade C. Therefore, PHLF grade B–C was considered as severe PHLF, and a key point of this study. Child–Pugh classification was evaluated according to the version developed by Pugh et al. summarizing the findings of Child and Turcotte and reported in 1973 [10]. ALBI scores were calculated as (0.66 × log10 bilirubin) + (− 0.085 × albumin) pre-operatively, where bilirubin was measured in μmol/L and albumin in g/L [12]. The grading brackets are specified as follows: ALBI score ≤  − 2.60 (mALBI grade 1), >  − 2.60 to ≤  − 2.27 (mALBI grade 2a), >  − 2.27 to ≤  − 1.39 (mALBI grade 2b), and >  − 1.39 (mALBI grade 3) [19, 20]. Hepatectomy was performed as previously described [21, 22]. Briefly, liver transection was performed with the clamp-crush technique or an ultrasonic dissector and used intermittent Pringle maneuver, i.e., 20 min of alternate block followed by 5 min of reperfusion. Ligation was carefully sutured at the site of hemorrhage or bile leakage. Hepatectomy was divided into major resection (more than three Couinaud segments) or minor resection (less than three Couinaud segments) resection. All preoperative variables were based on the most recent serologic test before surgery.

Follow-up

Precisely, laboratory parameters were collected on postoperative days (PODs) 1, 3, 5, and 7, or more frequently as appropriate, including liver, kidney, and coagulation function tests. Medical records were also kept of the invasive or non-invasive treatments or operations during the postoperative hospital stay. PHLF was diagnosed by two experienced surgeons in the treatment team. Patients were routinely examined around 1 month after hepatectomy for AFP, DCP (PIVKA-II) and cancer antigen 19–9 (CA19-9) as well as by enhanced MRI or CT scans. After that, patients were requested to be examined every 3 months. Patients also received follow-up phone calls each year.

Statistics

Statistics were analyzed using SPSS (v23.0; IBM, Armonk, NY, USA) and R software (v3.6.3; R Project for Statistical Computing). Categorical variables were demonstrated as counts and percentages, while continuous variables were summarized as medians (range) or means (standard deviation), as appropriate. Besides, Pearson’s χ2 analysis, Fisher’s exact test, or Mann–Whitney U test were adopted as appropriate, and maximum likelihood estimation (MLE) was adopted for logistic regression. To determine the predictors of PHLF grade B–C, multivariate regression analysis was performed using variables with p values of < 0.05 in univariate analysis, and a two-tailed p value of < 0.05 was considered statistically significant. The plotting of the nomogram, receiver operating characteristic (ROC) curves, calibration curves and decision curves were performed using the rms package, pROC package, and rmda package, respectively.

Results

Patient characteristics

The characteristics of patients in training cohort (n = 343), internal validation cohort (n = 171) and external validation cohort (n = 97) were shown in Table 1, with no differences found in baseline demographic variables, laboratory tests, and intraoperative major events between the training cohort and internal validation cohort. There were 16 (4.7%) Child–Pugh class B patients in the training cohort, and 5 (2.9%) in the internal validation cohort. Two hundred thirty-three (67.9%), 75 (21.9%), and 35 (10.2%) patients were classified as mALBI grade 1, 2a, and 2b in the training cohort, with 120 (70.2%), 41 (24.0%), and 10 (5.8%) patients classified as mALBI grade 1, 2a, and 2b in the internal validation cohort, respectively. The incidence of PHLF in the training cohort was 15.2%, which was comparable to that in the internal validation cohort (12.9%, p = 0.485).

Table 1 Preoperative information, intraoperative events and PHLF of patients in the training cohort (n = 343), internal validation cohort (n = 171), and external validation cohort (n = 97)

Establishment of the nomogram for PHLF grade B–C

The results of univariate analysis were shown in Table 2. Precisely, the independent predictive significance for PHLF grade B–C was shown for international normalized ratio (INR), cirrhosis, intraoperative blood loss, Child–Pugh classification, and mALBI grade (Table 2). These independent risk factors were included in the establishment of the nomogram for severe PHLF prediction (Fig. 1) Receiver operating characteristic (ROC) analysis, decision curve analysis (DCA) and calibration curve analysis (CCA) were adopted to assess the predictive accuracy of the nomogram. The area under the ROC curve was 0.863 (p < 0.001, 95% CI, 0.812–0.914) for the nomogram, 0.753 (p < 0.001, 95% CI, 0.676–0.829) for ALBI scores and 0.718 (p < 0.001, 95% CI, 0.631–0.806) for Child–Pugh scores (Fig. 2A) Moreover, DCA suggested better predictive value of the nomogram than both ALBI score and Child–Pugh scores (Fig. 2B).

Table 2 Univariable and multivariable analyses for PHLF grade B–C in the training cohort
Fig. 1
figure 1

Nomogram for the prediction of PHLF grade B–C in patients with huge HCC developed in the training cohort. mALBI, modified albumin-bilirubin; INR, international normalized ratio; PHLF, post-hepatectomy liver failure

Fig. 2
figure 2

ROC curves for the nomogram in the training cohort (A), internal validation cohort (C) and external validation cohort (E), respectively. Decision curves for the nomogram in the training cohort (B), internal validation cohort (D) and external validation cohort (F), respectively. ROC, receiver operating characteristic; AUC, area under curve, ALBI, albumin-bilirubin

Validation of the nomogram in internal and external independent cohorts

The area under the ROC curve was 0.823 (p < 0.001, 95% CI, 0.737–0.909), 0.689 (p = 0.004, 95% CI, 0.572–0.805), and 0.691 (p = 0.004, 95% CI, 0.555–0.827) for the nomogram, ALBI scores, and Child–Pugh scores in the internal validation cohort (Fig. 2C); and 0.740 (p = 0.001, 95% CI, 0.624–0.856), 0.639 (p = 0.044, 95% CI, 0.514–0.765, and 0.619 (p = 0.085, 95% CI, 0.482–0.857) for the nomogram, ALBI scores, and Child–Pugh scores in the external validation cohort (Fig. 2E). Consistent with the training cohort, both the internal and external validation cohorts demonstrated higher net benefit of the nomogram than the ALBI scores and Child–Pugh scores through the DCA curves (Fig. 2D, F). Calibration curve showed consistency between predictions and observations in the training cohort and the validation cohort (Fig. 3).

Fig. 3
figure 3

Calibration curves for the nomogram in the training cohort, internal validation cohort and external validation cohort, respectively

Risk stratification of the nomogram

The nomogram score corresponding to the maximum Youden index was considered as the cut-off value in the training cohort. Patients with a nomogram score higher than 137.02 were considered as high-risk populations for PHLF grade B–C, and those with a lower nomogram score were considered as low-risk populations. Based on this, we pooled the probabilities of PHLF grade B–C for different risk groups in the training cohort and the validation cohort (both internal and external validation cohorts included) (Table 3). Patients in the high-risk group reported significantly higher frequency of PHLF grade B–C than those in the low-risk group, which was demonstrated both in the training cohort and the validation cohort (p < 0.001, p < 0.001; respectively).

Table 3 Incidence of PHLF grade B–C in high-risk group and low-risk groups in the training cohort and validation cohorts

Discussion

Hepatectomy remains the best treatment option for patients with HCC [23]. With advances in preoperative evaluation, intraoperative techniques, and postoperative management, the surgical indications for huge HCC have also expanded [24]. Especially, huge HCC may induce higher frequency of PHLF due to the large surgical resection extent. However, the studies on huge HCC were rather limited. Xiang et al. reported a probability of 29.0% of severe PHLF in 186 patients with huge HCC [25]. Guo et al. demonstrated that 13.8% patients experienced PHLF of all grades after major hepatectomy in a total of 745 patients [26]. In the present study, based on the experience of two centers over a period of 5 years, we reported a double PHLF grade B–C rate after hepatectomy in a large cohort of 611 patients with huge HCC.

PHLF grade B leads to deviation from regular clinical management and requires noninvasive treatment, while grade C requires invasive treatment [9]. Therefore, we sought for independent risk factors that could predict severe PHLF. Child–Pugh classification is widely acknowledged as a predictor for outcomes of liver surgery or transplantation [27,28,29]. However, the Child–Pugh classification was not precise enough, and assessment of ascites and hepatic encephalopathy was subjective. In recent years, there have been many studies looking further into predictors of hepatectomy in cohorts of Child–Pugh A patients [30, 31]. ALBI, together with other indicators, has been considered to improve the evaluation of liver function in recent years [32]. ALBI grade also showed good predictive ability for PHLF in HCC patients with different BCLC stages in previous report [33]. However, the ALBI grade also leaves much to be desired, as a large proportion of patients are classified as grade 2, while they differentiate significantly in actual liver function. A detailed assessment of ALBI grade, the mALBI grade, was therefore established by Kudo et al. [19]. Our results illustrated the similarly limited performance of Child–Pugh scores and ALBI scores in PHLF grade B–C prediction, which is possibly due to the influence of other preoperative and intraoperative characteristics. Among them, many characteristics may influence the development of HCC or prognosis of patients after hepatectomy [34, 35]. Therefore, we included some other baseline indicators and major intraoperative events in the univariate analysis, screening mALBI grade, Child–Pugh grade, intraoperative blood loss, cirrhosis, and INR as independent risk factors. A nomogram was established and validated in two independent cohorts for accurate prediction of PHLF grade B–C. Simple and noninvasive, the model has demonstrated its accuracy in PHLF grade B–C prediction in both the training and internal validation cohorts, and outperformed both Child–Pugh scores and mALBI scores separately. In an external validation cohort with nonidentical baseline patient characteristics, the model demonstrated adequate precision in predicting the occurrence of severe PHLF as well. Considering the results above, our nomogram model could potentially be generalized to other medical centers.

Also, the nomogram stratified candidates of hepatectomy for huge HCC into different risk groups of severe PHLF. In the overall cohort, 67.7% of the patients in the high-risk group developed severe PHLF, suggesting stricter standards on the surgical indications, as well as more careful perioperative management. For these patients, it might be safer to choose nonsurgical treatment [36], neoadjuvant systemic therapy [37], or TACE followed by hepatectomy [38]. In comparison with the high-risk group, patients in the low-risk group have a probability of 13.3% for developing PHLF grade B–C, thus considered more appropriate candidates for hepatectomy. Although the accuracy of this model still needs improvement, the stratification into different risk groups could benefit treatment choices for patients and surgeons.

The main limitation of this study is its retrospective nature with potential selection bias. Validation in more centers and larger cohorts is needed in the future. In addition, our cohort contained more than 70% patients with HBV infection, so whether the model can be applied equally to huge HCC due to other etiologies, including infection and steatohepatitis, should be explored in further studies.

Conclusions

This noninvasive nomogram based on mALBI-Child–Pugh grade showed higher accuracy for predicting PHLF B–C compared with ALBI scores and Child–Pugh scores separately in patients with huge HCC, and could presumably improve indications for hepatectomy, and patient stratification for perioperative management by predicting PHLF grade B–C.