Introduction

Hepatocellular carcinoma (HCC) is the third most common cause of cancer mortality worldwide [1], and a considerable number of patients continue to be diagnosed with advanced disease. Recently, sorafenib has been shown to improve the survival of patients with advanced-stage HCC [2]. The effectiveness is attributed to its unique antiproliferative and antiangiogenic mechanism [39].

Although the extent of tumor reduction observed with sorafenib therapy has been unsatisfactory, previous trials found that sorafenib significantly improved overall survival (OS) [2, 10]. Indeed, it has become well known that improvement in objective response (OR) without shrinkage of a tumor is a unique characteristic of this drug. As reduction in tumor vascularity appears to be direct effect of sorafenib, it is reasonable to speculate that the longer OS obtained with sorafenib can be attributed to its unique antiangiogenic mechanism, in addition to its antiproliferative effect on cancer cells. As increased tumor viability is typically accompanied by an increase in arterial vascularity, evaluation of arterial enhancement on imaging is critical in predicting OS. However, the response evaluation criteria in solid tumors 1.1 (RECIST1.1), the first set of criteria developed for assessment of response to treatment by HCC patients, focuses on assessment of tumor size and neglects consideration of changes in vascularity status.

In recognition that the vascularity of a lesion is important in evaluating response to HCC treatment, the modified-RECIST (mRECIST) requires assessment of tumor vascularity, which reflects the extent of tumor necrisis [11, 12]. However, use of the mRECIST still poses a difficulty in measuring irregularly shaped tumors, because it calls for unidirectional measurement of tumor size for overall evaluation of tumor burden. Therefore, use of the mRECIST, as well as the RECIST1.1, may not provide for completely adequate evaluation of tumor response in HCC patients.

To overcome the disadvantages of using the conventional criteria, we designed the response evaluation criteria in cancer of the liver (RECICL), a new evaluation system based on evaluation of change in tumor vascularity together with 2-directional assessment of tumor size. Due to the inclusion of these criteria, we hypothesized that use of the RECICL provides for more accurate evaluation of response to sorafenib therapy as assessed by OS than the RECIST1.1 or mRECIST. By testing this hypothesis, we attempted to fulfill 2 research aims in the present study. First, we endeavored to determine the means by which the therapeutic response of HCC patients, especially those presenting with hypervascular lesions and/or with lesions of irregular shape, should be estimated in the context of accurate prediction of OS. Second, we attempted to clarify the significance of and identify any problems with the use of the RECICL by retrospective comparison of its use with that of the RECIST1.1 and mRECIST criteria for evaluation of response among the same cohort of HCC patients.

Materials and methods

Patients

Between May 2009 and August 2011, 289 patients with advanced HCC had been treated with sorafenib therapy at Kinki University Hospital or Osaka Red Cross Hospital. From among these patients, 156 patients who had undergone continuous administration of sorafenib for more than 1 month and met the inclusion criteria were selected for study enrollment. The response of all patients to sorafenib had been examined at least once using contrast-enhanced computed tomography (CE-CT) and/or dynamic magnetic resonance imaging (MRI), both are imaging techniques (Fig. 1). Patients’ characteristics are summarized in Table 1. Our institution did not require institution approval or informed consent for review of patient records and images in this retrospective study. We posted research content at outpatient areas and a website, and we gave patients the right to refusal for our study.

Fig. 1
figure 1

Flow chart of patient selection process. After exclusion of patients who met the exclusion criteria or did not meet the inclusion criteria, 156 patients remained for analysis

Table 1 Characteristics of hepatocellular carcinoma patients treated with sorafenib

The inclusion criteria for this study were (1) diagnosis of HCC based on histological examination or radiologic findings showing early enhancement, followed by late wash-out on CE-CT or dynamic MRI, in conjunction with HCC refractory to radiofrequency ablation (RFA) and transarterial chemoembolization based on the indication of sorafenib; (2) performance status of 0 or 1; and (3) Child-Pugh class A or B liver cirrhosis. The exclusion criteria were (1) concomitant antineoplastic treatment; (2) transarterial chemoembolization or RFA performed less than 3 months before initiation of sorafenib; (3) lack of response evaluation using CE-CT or dynamic MRI during follow-up period; or (4) both the presence of extrahepatic lesions and the absence of intrahepatic lesions.

Initial and follow-up assessment

Liver function and tumor stage were evaluated using the Child-Pugh, Barcelona Clinic for Liver Cancer, and Cancer of the Liver Italian Program classifications. Two independent radiologists evaluated tumor size and vascularity every 4–6 weeks during and after treatment using the images of CE-CT and gadolinium ethoxybenzyl diethylenetriamine pentaacetic acid (Gd-EOB-DTPA)-MRI. In this study, we retrospectively determined the best response during the sorafenib treatment and adopted it as the overall response. The responses of all patients were evaluated using RECIST1.1, mRECIST, and RECICL criteria by evaluators who were not blind to the patients’ diagnoses. The target lesions of each case were defined by 2 physicians by review of CE-CT and/or dynamic MRI images obtained during pretreatment. OS analysis was based on the length of time from initial treatment until time of death, and OS analysis of patients who were alive at the end of the observation was based on the length of time from initial treatment until time of the final hospital visit.

Response evaluation using the RECIST1.1, mRECIST, and RECICL

The differences among the RECIST1.1, mRECIST, and RECICL are summarized in Supplementary Table 1. Briefly, both the RECIST1.1 and mRECIST call for unidirectional measurement of tumors, but the RECIST1.1 does not require evaluation of tumor viability while the mRECIST requires evaluation of only those areas of the tumor showing arterial enhancement on CE-CT or dynamic MRI. In contrast, the RECICL requires 2-directional measurement of tumors showing arterial enhancement. Representative images of the cases evaluated by the RECIST1.1, mRECIST, and RECICL are shown in Supplementary Figure 1. As can be observed, use of the RECIST1.1 called for unidirectional measurement of both enhanced and necrotic lesions, which showed no change before and after treatment (Supplementary Figures 1A and 1B). On the other hand, use of the mRECIST and RECICL required evaluation of tumor enhancement, which revealed a response according to the mRECIST and RECICL criteria (Supplementary Figures 1C and 1D for mRECIST and Supplementary Figures 1E and 1F for RECICL). Unlike the mRECIST, which does not require evaluation of lesions that do not show enhancement, the RECICL considers tumors not showing enhancement to be viable if they increase in size after initiation of therapy, as demonstrated in Supplementary Figure 2.

Definition of terms

Complete response (CR) was defined as disappearance of all lesions by the RECIST1.1, as disappearance of any arterial enhancement within all target lesions by the mRECIST, and as either a 100 % tumor necrotizing effect or a 100 % reduction in tumor size accompanied by disappearance of all contrast enhancement at any phase by the RECICL. Partial response (PR) was defined as 30 % or greater decrease in tumor size as determined by evaluation of the sum of the diameters of the target lesions, whose size was estimated using unidirectional measurement, by both the RECIST1.1 and mRECIST, and as 50 % or greater reduction in tumor necrosis or size as determined by 2-directional measurement by the RECICL. Progressive disease (PD) was defined as 20 % or greater increase in tumor size as determined by evaluation of the sum of the maximal dimensions of the target lesions by both the RECIST1.1 and mRECIST and as either a 25 % or greater increase in tumor size or the appearance of 1 or more new lesions by the RECICL. The RECIST1.1, mRECIST, and the RECICL all defined stable disease (SD) as the absence of either PR or PD; OR as the sum of all cases showing CR and PR; objective response rate (ORR) as the percentage of OR among all cases; and disease control rate (DCR) as the percentage of cases showing CR, PR, or SD.

Statistical analysis

Univariate survival curves were estimated using the Kaplan–Meier method, comparison of survival rates among groups was conducted using the log-rank test, and comparison of categorical variables was performed using the Chi Square test. The level of significance was set at p < 0.05. All analyses were performed using SAS statistical software version 8.2 (SAS Institute, Cary, NC, USA) or the SPSS Medical Pack for Windows version 10.0 (SPSS, Inc., Chicago, IL, USA).

Results

Evaluation of response by the RECIST1.1, mRECIST, and RECICL

Of the 156 patients who had been successfully treated with sorafenib therapy for more than 30 days, the number of patients showing CR, PR, SD, and PD and the ORR and DCR as estimated by use of each system were, respectively, as follows: 3, 12, 71, and 70 cases and 9.6 % and 55.1 % according to the RECIST1.1; 6, 30, 55, and 65 cases and 23.1 % and 58.3 % according to the mRECIST; and 6, 29, 53, and 68 cases and 22.4 % and 56.4 % according to the RECICL (Tables 2, 3). Although no statistically significant difference was observed among the DCR estimated by the 3 systems, 20 patients (approximately 14 %) classified as SD by the RECIST1.1 were classified as OR by the mRECIST and RECICL.

Table 2 Classification of response to sorafenib by the RECIST1.1, mRECIST, and RECICL
Table 3 Comparisons of the response classification between RECIST1.1 and RECICL (A), and between mRECIST and RECICL (B)

Comparison of Kaplan–Meier curves for OS as estimated by the RECIST1.1, mRECIST, and RECICL

Figure 2 shows the Kaplan–Meier curves for OS as estimated using the 3 systems (Fig. 2a as estimated by the RECIST1.1, Fig. 2b by the mRECIST, and Fig. 2c by the RECICL). The median OS of the patients classified as OR, SD, and PD, respectively, by the 3 systems was 19.9 months [95 % confidence interval (CI) 12.5–21.3 months], 19.2 months (95 % CI 15.1–23.3 months), and 14.3 months (95 % CI 9.7–18.8 months) by the RECIST1.1; 27.2 months (95 % CI 15.2–39.2), 16.8 months (95 % CI 13.8–19.7 months), and 14.3 months (95 % CI 10.5–18.0) by the mRECIST; and 27.2 months (95 % CI 9.6–44.8 months), 19.2 months (95 % CI 17.1–21.3 months), and 14.3 months (95 % CI 10.1–18.4 months) by the RECICL. As shown in Figs. 2a, b, use of both the RECIST1.1 and mRECIST failed to allow for stratification of OS, although classification of response by the mRECIST was found to be more strongly associated with OS than that by RECIST1.1 (p = 0.0575 and p = 0.073 by log-rank test, respectively). On the other hand, classification of response by RECICL was found to be significantly associated with OS, with the patients showing OR found to have the longest survival and those showing PD the shortest (p = 0.0033 by log-rank test; Fig. 2c; Table 4). Regarding the treatment response determined by RECICL, the OS was significantly higher in the group of OR than in PD patients (p = 0.002). However, we could not detect the significant association between SD and OR, and PD for OS, although there were the trends of higher OS in the better response groups (respectively, p = 0.093, p = 0.069).

Fig. 2
figure 2

Kaplan–Meier curves of overall survival based on response to treatment as estimated by the RECIST1.1, mRECIST, and RECICL. Kaplan–Meier curves of the overall survival of the 156 patients based on response to sorafenib therapy as estimated by the RECIST1.1 (a), mRECIST (b), and RECICL (c). The median OS of the patients classified as OR, SD, and PD, respectively, was 19.9 months, 19.2 months, and 14.3 months by the RECIST1.1 (p = 0.073 by log-rank test); 27.2 months, 16.8 months, and 14.3 months by the mRECIST (p = 0.0575); and 27.2 months, 19.2 months, and 14.3 months by the RECICL (p = 0.0033)

Table 4 Univariate and multivariate analyses for the contribution of clinical backgrounds and tumor response assessed by the three criteria on overall survivals

Inconsistency among classification by the RECIST1.1, mRECIST, and RECICL

Figure 3 shows the differences in response classification obtained using the RECIST1.1, mRECIST, and RECICL. As can be observed, most patients classified as either PD or SD by RECIST1.1 were classified as either CR or PR (i.e., as OR) by both the mRECIST and RECICL, leading 28 of 156 patients to be classified differently by the RECIST1.1 compared to the mRECIST and RECICL. Specifically, of the 141 patients classified as either PD or SD by the RECIST1.1, 21 of the patients classified as PD and 20 classified as SD were classified as OR by the mRECIST and RECICL (Fig. 3). This finding suggested the possibility that patients classified as OR by the mRECIST and/or RECICL, even those classified as SD or PD by the RECIST1.1, showed better prognosis than those classified as non-OR. To examine this possibility, Kaplan–Meier survival analysis was performed of cases classified as SD or PD by the RECIST1.1 for comparison of their classification by the mRECIST, and RECICL. Among the 141 patients classified as PD or SD by the RECIST1.1, the number of cases of OR, SD, and PD and the ORR and DCR was estimated at 17 cases, 55 cases, and 69 cases and 12.1 % and 51.1 %, respectively, by the mRECIST and 15 cases, 56 cses, and 70 cases and 10.1 % and 50.3 %, respectively, by the RECICL.

Fig. 3
figure 3

Percentage change in tumor size of cases classified differently by the RECIST1.1, mRECIST, and RECICL. Percentage change in tumor size of 28 cases that were categorized differently by the RECIST1.1, mRECIST, and RECICL. The percentage change was calculated using the formula (tumor size post treatment – tumor size pretreatment)/tumor size pretreatment × 100 for estimation by the RECIST1.1 and mRECIST and the formula (tumor area post treatment—tumor area pretreatment/tumor area pretreatment × 100 for estimation by the RECICL. The lower part of the panel denotes the range of objective response (OR), the middle part of the panel the range of stable disease (SD), and the upper part the range of progressive disease (PD)

Figure 4 shows the Kaplan–Meier curve for OS of these 141 patients as estimated by the mRECIST and RECICL. As can be observed, the median OS of patients classified as OR, SD, and PD was 27.2 months (95 % CI 11.7–42.7 months), 16.8 months (95 % CI 13.8–19.7 months), and 14.3 months (95 % CI 10.5–18.0 months), respectively, as estimated by the mRECIST and 27.2 months (95 % CI 11.9–42.5 months), 19.2 months (95 % CI 17.1–21.3 months), and 14.3 months (95 % CI 10.1–18.4 months), respectively, as estimated by the RECICL. Whereas classification of response by the mRECIST failed to allow for stratification of each type of response for OS (p = 0.1124; Fig. 4a), classification of response by RECICL was found to be significantly associated with OS, indicating that it allows for precise prediction of prognosis (p = 0.0066; Fig. 4b).

Fig. 4
figure 4

Kaplan-Meier curves of overall survival of patients classified as SD or PD by the RECIST1.1 and as OR by the mRECIST and RECICL. Kaplan-Meier curves of 141 patients classified as SD or PD by the RECIST1.1 and as OR by the mRECIST (a) and RECICL. The median OS of patients classified as OR, SD, and PD was 27.2 months, 16.8 months, and 14.3 months, respectively, as estimated by the mRECIST (p = 0.1124 by log-rank test) and 27.2 months, 19.2 months, and 14.3 months, respectively, as estimated by the RECICL (p = 0.0066)

Discussion

For management of cancer chemotherapy, it is critical to have reliable tools to guide treatment planning in clinical practice. For this, OS should be considered as a critical endpoint, although tumor response assessed by imaging was sometimes used as a surrogate endpoint so far. When the validity of the criteria in predicting OS in advanced HCC patients treated with sorafenib was compared, we found RECICL was the best criteria for the precise prediction of the prognosis of these patients compared to the RECIST1.1 and mRECIST.

In Western countries, World Health Organization criteria and the RECIST1.1 are commonly used for evaluation of treatment for liver cancer [13]. While their use has proven valuable in assessing response to conventional cytotoxic chemotherapy, there has been concern regarding their applicability to patients treated with recently developed molecularly targeted agents, such as sorafenib, which appear to have a “dormant” effect in that they initially appear to yield little response but ultimately lead to improvement in overall time to progression and OS [2, 10]. Sorafenib in particular has been a breakthrough agent in the treatment of advanced HCC, as demonstrated by the significant improvement in OS, despite the reporting of an ORR of only 2 % with its use [2, 10]. This observation of increased response to treatment has prompted use of imaging techniques, namely CE-CT and MRI, as an alternative method of assessing treatment response [14, 15]. While both mRECIST and RECICL incorporate vascularity as a factor in response assessment, the RECICL also calls for 2-directional measurement of tumor size and defines tumors that increase in size to be viable even if they do not show early enhancement upon imaging. The major advantage of use of the mRECIST and RECICL is that these call for evaluation of the contrast-enhancing portion of the tumor rather than evaluation of the entire tumor (Supplementary Figure 1) and consider tumor necrosis a sign of response. Such differences in criteria results in the ORR estimated using the mRECIST or RECICL to be approximately 2.5 times higher than that estimated using the RESICT1.1. Interestingly, the most significant association between tumor response and OS was found using the RECICL (Fig. 2c), although classification by mRECIST was found to be more strongly associated with classification by RESICT1.1 (Fig. 2a, b). Therefore, it is reasonable to speculate that evaluation of tumor viability improves assessment of the antitumor activity of sorafenib by the mRECIST and RECICL but not by the RECIST1.1.

Another difference between the mRECIST and RECICL is that, while classification of response by the former is based on unidirectional measurement, that by the latter is based on 2-directional measurement. Moreover, while only the hypervascular area of the tumor is regarded as viable, and, thus, tumor viability is only estimated during the arterial phase, by the mRECIST, tumor viability is estimated at all phases by the RECICL. Supplementary Figs. 2 and 3 show examples of how use of the mRECIST and RECICL can lead to different classification of the same cases. In the case shown in Supplementary Fig. 3, marked reduction of tumor volume with enhancement was observed. Although the response was classified as SD by the mRECIST, it was classified as PR by the RECICL, as assessed by 2-directional measurement of size. Another advantage of using RECICL is that it calls for evaluation of non-enhanced areas of the target lesion, which are often found to have increased on post-therapeutic imaging. Indeed, some lesions that appear hypovascular on CE-CT are found to have increased in size and should, thus, be regarded as viable (Supplementary Figure 2). Therefore, patients classified as PR by the mRECIST would be classified as PD by the RECICL, indicating that use of the RECICL allows for more accurate categorization of response than the mRECIST for assessment of OS. Indeed, use of the RECICL was found to allow for successful discrimination of patients with tumor progression among the patients who had been classified as SD by the mRECIST.

There should be three limitations regarding the assessment by RECICL. First, the assessment of RECICL focus only on the measurable intrahepatic lesions without evaluating the portal vein thrombi and extrahepatic lesions. Second, a hypovascular HCC such as sarcomatoid HCC should be difficult for assessment by RECICL because the alteration of vascularity could not be determined. Evaluation of response for such lesions should be determined using another criteria. Third, the retrospective nature of the study might have led to bias in selection of the patients. To address the limitations and independently validate the results of this study, we are currently designing an investigation of the accuracy of use of the RECICL in the prediction of OS in a prospective multicenter patient cohort with a larger sample size.

In this comparison of the validity of use of the RECIST1.1, mRECIST, and RECICL, use of the RECICL was found to allow for much more precise identification of patients with better prognosis compared to the RECIST1.1 or mRECIST. This finding leads us to conclude that use of the RECICL is the best means of obtaining precise prognostic information at an early stage after treatment. Although further studies are required to confirm the superiority of RECICL in HCC with portal vein thrombi and extrahepatic lesions, the results of this study are of significance from a clinical viewpoint, especially in the selection of therapy. Given the robustness of the data presented herein, we strongly assert that the RECICL should become the standard system used in the evaluation of response to chemotherapy, including molecularly targeted therapies, by HCC patients.