Introduction

Hepatocellular carcinoma (HCC) is one of the leading causes of cancer-related death in patients with liver cirrhosis, with more than new 700,000 cases diagnosed yearly worldwide [1, 2]. Over the past few decades, it has become clear that the natural history of HCC strongly depends on anatomical stage, underlying liver function and overall patients’ physical status: this has led to the development of several prognostic algorithms with intent to optimize treatment [37].

The Barcelona Clinic Liver Cancer (BCLC) stage includes prognostic variables such as tumor stage, performance status, and Child–Turcotte–Pugh (CTP) class [8]. Prospective validation of the BCLC staging system has demonstrated reliable prognostic subdivision of HCC [9, 10]. Due to its association with treatment allocation, the BCLC algorithm has received formal endorsement by organizations such the European Association for the Study of the Liver (EASL) and the American Association for the Study of Liver Diseases (AASLD) [1113]. However, there is marked heterogeneity in the reported 3-year survival in BCLC-B stage disease of 10–40 %. Therefore, formulating appropriate treatment strategies for the individual patient is difficult within this nebulous BCLC-B staging system.

According to the BCLC staging system, transarterial chemoembolization (TACE) is recommended as first-line treatment for patients with IHCC or BCLC-B. Two randomized controlled trials have shown an approximate 50 % reduction in mortality in patients treated with TACE compared to controls [14, 15]. A significant OS benefit from TACE has been further consolidated by two separate meta-analyses [16], which however re-defined the magnitude of benefit of TACE due to patient and procedural heterogeneity, resulting in some of the pooled studies not meeting their primary survival endpoints [17].

Issues such as the relative efficacy of TACE and the risk of adverse events among this group of patients results in the use of sorafenib, trial therapies or best supportive care [18, 19]. Alternatively, clinicians who do not adhere to BCLC guidelines offer other treatments such as resection or transarterial radioembolization (TARE) if IHCC patients meet local criteria [20, 21]. Therefore, despite the presence of consensus guidelines, there is variation in treatment in patients with BCLC-B disease. There is an urgent need for improved prognostication and subsequent stratification of management for patients with IHCC.

Bolondi et al. [22] created a prognostic score to further subdivide patients with IHCC in an effort to improve treatment allocation among this complex group. The Intermediate Stage Score (ISS) consists of five stages and includes CTP classification, ECOG performance status, portal vein thrombus and specific size criteria (Table 1). On the basis of the score, the authors recommended that patients can be offered first-line options such as TACE while patients with advanced stage (Quasi-C) should receive sorafenib [22]. There have been mixed outcomes in demonstrating the efficacy of this score. Two studies have demonstrated an association between ISS and OS among patients treated with bland transarterial embolization (TAE) and TACE (N = 580, 466) [23, 24]. However, in a separate European study, the score did not achieve prognostic significance (N = 254) [25]. Our intent was to validate the prognostic ability of the ISS in patients with intermediate-stage HCC (BCLC-B) by using propensity score analysis in diverse Eastern and Western populations treated with either surgical resection (LR) or TACE.

Table 1 BCLC-B sub-classification by Bolondi et al

Materials and methods

Patient population

All centers in this study were involved in prospective collection of data from patients with a diagnosis of HCC made according to radiological or histological criteria, between 2001 and 2015. Patients were recruited from Hammersmith Hospital, London, St Mary’s Hospital, Seoul, University of Novara, and, Dokkyo Medical University, Dokkyo and Kinki University, Osaka). Informed consent was obtained from all patients recruited in this study in accordance with the Declaration of Helsinki and Good Clinical Practice (GCP) guidelines. Ethical approval for this study was obtained from the East London Research Ethics Committee.

Clinical variables were retrieved include patient demographics, complete blood count, albumin, aspartate and alanine aminotransferases (AST, ALT), alkaline phosphatase (ALP) alpha-fetoprotein (AFP), the international normalized ratio (INR) value and underlying etiology of liver disease was also identified. Patients with IHCC (BCLC-B) were categorized into five groups as per the criteria described by Bolondi et al. [22] (Table 1). Liver functional reserve was estimated using the CTP classification.

Tumor staging was described as the number of focal hepatic lesions and maximum diameter detected during contrast enhancement phase on computerized tomography. The Milan criteria and up-to-seven criteria (Up-to-7) were used to categorize size for calculating the ISS. The Milan criteria is defined as a single lesion <5 cm, up to three lesions <3 cm, the absence of gross vascular invasion or nodal or distant metastases [26]. Within the Up-to-7 criteria, seven is the sum of the size (centimeters) and the number of tumors for any given HCC [27].

Statistical analysis

Continuous variables were presented as a median and range, and associations were tested using Mann–Whitney U or Student’s t test as appropriate. Categorical variables with absolute or relative frequencies were tabulated and or Fisher’s exact test, where appropriate. The OS rates for various ISS levels in all patients were analyzed using Kaplan–Meier method, and log-rank test was used to compare survival time. Univariate analyses of prognostic variables were completed with the Cox proportional hazards model. All statistical analyses were completed using two-sided test, and statistical significance was achieved where p < 0.05.

The date of HCC diagnosis till the date of death, loss to follow-up or study censoring (1st January 2016) was used to calculate overall survival. All patients were monitored with routine follow-up till the dates of death, loss to follow-up or study censoring.

Propensity score adjustment (PS) is a statistical method to reduce the effect of residual confounding in two groups [28]. In this study, PS was used to reduce the effect of residual confounding in the cohort by adjusting for confounding variables that are not accounted for within ISS classification, such as age, gender, hepatitis status and INR that impact treatment options. Cox regression analysis was used to determine the effect of ISS adjusted for PS quartiles in TACE and LR treatment groups. An interaction test was performed to determine the statistical significance of ISS in TACE and LR groups. Statistical analyses were performed using R version 3.1.2 (ww.r-project.org) and SAS 9.4 (SAS Institute Inc. Cary, North Carolina).

Results

Patient characteristics

Our study population consisted of 611 BCLC-B patients diagnosed with HCC across five centers (Table 2). The majority of patients underwent TACE (73.4 %) as first anticancer treatment, while 27.6 % were offered liver resection. Patients undergoing liver resection were younger (p < 0.001), while a higher proportion of patients undergoing TACE were Hepatitis B positive (p = 0.01). Five patients treated with TACE had portal vein thrombosis (PVT) and were classified as ‘Quasi C’ or ISS 5. There was a significant difference in the CTP classification between LR and TACE, with a higher proportion of patients with CTP > A6 receiving TACE (p < 0.01). The median OS (OS) of the overall population was 37 months (95 % confidence interval (CI) 33.0–39.3 months). The 1- and 3-year survival rates were 84.1, and 21.9 %, respectively. There was no significant difference between the median OS between TACE and LR subgroups (34.8 vs. 40 months, p = 0.09).

Table 2 Patient demographic at initial HCC diagnosis

ISS characteristics and OS

In univariate analyses of the cohort, male gender, positive hepatitis B status and INR were variables that were significant for increased mortality and were not within the ISS prognostic score (Table 3). There was a difference in the ISS categories between TACE and LR groups, with a higher proportion of patients with ISS 2 or greater treated with TACE and those with an ISS of 2 or less treated with LR (p < 0.0001). There were no significant differences in baseline characteristics between ISS groups (Table 4). Due to the small number of patients with ISS 4 and 5, these were analyzed together to improve statistical validity. Significant differences in OS were observed between the different ISS groups ranging from 51 (ISS 1) to 16 months (ISS 4 and 5; p < 0.001), (Table 3; Fig. 1).

Table 3 Univariate analysis of factors that predict overall survival in patients with intermediate hepatocellular carcinoma (IHCC) treated with TACE or LR
Table 4 Sub-classification of BCLC-B with intermediate stage score (ISS) and corresponding characteristics
Fig. 1
figure 1

Cumulative mortality stratified by intermediate stage score (ISS) for all patients with intermediate hepatocellular carcinoma

ISS retains prognostic utility in propensity score adjustment analysis

When considering the prognostic utility of the ISS according to treatment received, ISS was significant in TACE (p = 0.0003) and LR (p = 0.008). ISS retained its prognostic ability following PS adjustment. In the PS-adjusted model, among patients undergoing LR, ISS of 4 and 5 implied poor prognosis compared to ISS 1 [hazard ratio (HR) 2.13 (95 % CI 0.64, 7.02)], such that ISS was a prognostic score among patients treated with LR [Likelihood ratio test (LRT) p = 0.007]. This comparison between ISS 4 and 5 to ISS 1 was evident for patients treated with TACE [HR 3.59 (95 % CI 2.07, 7.57)], (LRT p < 0.001, Table 5). On assessing the prognostic value of ISS on either treatment, there was no evidence of a difference in ISS subgroups between LR and TACE groups (p = 0.23).

Table 5 Propensity score-adjusted Cox proportional hazards model of ISS on overall survival within TACE and LR population and overall likelihood ratio test (LRT), and interaction test to determine effect of ISS between treatments

Discussion

This is the first large, multi-center study to validate the prognostic ability of the ISS in patients with BCLC-B stage disease, independent of treatment received. Bolondi and colleagues divided BCLC-B stage disease into sub-classifications based on trial results and expert opinion in an effort to reduce heterogeneity in survival in this otherwise disparate patient group. While their method has been validated in a number of papers, this the largest study incorporating both Eastern and Western populations that adheres to the BCLC-B classification. As such this is the first study to explore the use of LR within the BCLC-B classification, albeit in small numbers. PS has been used to reduce confounders between LR and TACE groups, adding to the robust nature of the results obtained.

A plethora of prognostic scores have recently been introduced aiming to improve treatment selection in patients with BCLC-B stage disease [2931]. These scores such as the Hepatoma Arterial Embolization Prognostic score (HAP score) and Selection for Transarterial chemoembolization Treatment (STATE) score have derived prognostic variables within a cohort and subsequently validated the scores within an external population [30, 31]. The recently proposed ART and HAP scores have attracted significant attention recently particularly as prognostic markers in patients receiving TACE. The HAP score consists of two measures of tumor burden (AFP and size of largest tumor) and two measures of liver function (albumin and bilirubin) [30]. However, the original study included patients with BCLC-A, B and C disease, as well as concerns regarding the independent prognostic ability of bilirubin, may impact on the overall utility of this score. The ART score while useful in determining retreatment with TACE does not contribute to prognostic sub-classification within BCLC-B. Recently Ogasawara and colleagues derived the CHIP score as a means to delineate survival heterogeneity in BCLC-B stage tumors [32]. However, in their paper when compared to the ISS, their novel score showed no real difference in prognostic ability.

The variables included in the ISS are similar to previously identified scores including markers of liver function such albumin, bilirubin, and tumor burden. The main difference with the ISS is that it incorporates three measures of tumor burden; up-to-7 criteria, size of the largest tumor and number of tumors. We report considerable variation in OS from 15.6 to 51 months in our population suggesting that the variables used by Bolondi et al. are useful in delineating prognosis further within this patient group.

A key strength of this study is that we used patient datasets derived from different academic institutions in both Europe and Asia. While TACE is the recommended treatment for BCLC-B patients according to American and European guidelines, in Asian centers, it is not uncommon to propose surgical management [33, 34]. We have shown that ISS retains its prognostic ability in LR or TACE in BCLC-B stage disease prior to and following PS-adjusted analysis. Resection of liver lesions beyond the Milan criteria in BCLC-B population has been shown to improve OS compared to TACE treatment [35], and though beyond the remit of this study, these results suggest that surgical intervention may be a useful treatment modality in a carefully selected population group, and does warrant further investigation in a larger population group within a prospective study design. ISS appears a useful prognostic tool within each treatment category, and there is no evidence of a difference in the effects of ISS subgroups between treatment groups.

However, the inclusion of ‘Quasi C sub-classification’ (ISS 5) and patients with portal vein thrombosis involves a subgroup recognized to possess a poorer prognosis with variable treatment options [36]. While we have demonstrated the prognostic accuracy of the ISS, we have not validated the treatment allocation aspect of the score as proposed by Bolondi et al., an aspect that has not been corroborated in any study. In this context, reflection is required on the use of liver transplant for patients with BCLC-B disease given the poorer overall prognosis of this patient group compared with BCLC-A in the context of global organ shortages. We suggest, therefore, that the role of the ISS is in prognostication rather than as treatment allocation per se.

This is a significant time for the management of HCC as new therapies emerge on the horizon. Useful prognostic tools that improve patient selection are crucial in order to ensure that safe, appropriate and effective therapies are administered in a timely manner. It is evident from this large multi-centered study that the ISS offers a useful tool for clinicians to stratify treatment options, such as TACE and LR, in the BCLC-B population.