Introduction

Thomas Starzl designed liver transplantation (LT) to treat unresectable primary and secondary hepatobiliary tumors [1, 2]. The first 'successful' LT was performed on July 23, 1967, in a child presenting with a large hepatocellular cancer (HCC) in the context of biliary atresia. The child died after 400 days, during which time she underwent many reinterventions to treat both thoracic and abdominal tumor recurrences. Due to the lack of selection criteria, the concept of LT as the primary treatment of hepatobiliary malignancies was rapidly challenged because of the prohibitively high incidence of tumor recurrence [2, 3]. The 'oncological pendulum' reversed in the nineties. The indication for LT moved from large multifocal lesions to a more limited tumor burden. A tumor load restricted to ≤ three tumors having a diameter ≤ 3 cm (Paris criteria) or one tumor ≤ 5 cm (Milan criteria, MC) resulted in 5-year disease-free survival (DFS) rates of 70–80% [4, 5]. The MC became the international gold standard to select HCC patients for LT [6,7,8]. However, after some years of stabilized practice, it became clear that the MC were too strict, denying access for many patients to potentially curative therapy. Many Western teams worked at a cautious extension of the inclusion criteria. Conversely, many Eastern ones adopted a much more aggressive attitude fostered by the explosive development of living-donor-liver transplantation (LDLT) [9]. The search for 'the ideal' score was launched to give as many patients as possible access to a potentially curative oncological procedure without compromising outcomes. However, the co-existence of multiple scoring systems explains the heterogeneous treatment of HCC, leading to difficulties when interpreting short- and long-term outcomes, and access to LT varies widely among countries, continents, and allocation organizations.

This paper aims to systematically review the different HCC-LT selection systems developed, with the intent to investigate their impact in terms of access to LT without compromising overall survival and oncological results. Using the available data, a meta-analysis was also done to investigate the post-transplant recurrence rates reported using the MC vs. the expanded selection criteria.

Materials and methods

Search sources and study design

A systematic review of the published literature on the different HCC-LT selection systems developed was undertaken. The search strategy was performed following the preferred reporting items for systemic reviews and meta-analysis (PRISMA) guidelines [10].

The specific research question formulated in the present study included the following PICO components:

Patient: patient with a confirmed HCC undergoing a LT;

Intervention: LT adopting an expanded HCC-LT selection system;

Comparison: LT adopting a standard selection approach (typically, the MC);

Outcome: patient death and/or tumor recurrence.

A search of the PubMed and Cochrane Central Register of Controlled Trials Databases was conducted using the following terms: ("liver transplant*"[Title/Abstract] OR "living donor liver transplant*"[Title/Abstract]) OR “living donor” AND ("criteria"[Title/Abstract] OR "score"[Title/Abstract] OR "model"[Title/Abstract]) AND ("HCC"[Title/Abstract] OR "hepatocellular carcinoma"[Title/Abstract] OR "hepatocellular cancer"[Title/Abstract]) AND ("1993/01/01"[PDAT]: "2021/03/14"[PDAT]).

The search period was from "1993/01/01" to "2021/03/14". The systematic review considered only English studies that included human patients. The start of the search period corresponded to the first publication of an HCC-LT selection system by the Bismuth group [4].

Published reports were excluded based on several criteria: (a) data on animal models; (b) lacked enough clinical details; (c) had non-primary source data (e.g., review articles, non-clinical studies, letters to the editor, expert opinions, and conference summaries). In studies originating from the same center, possible overlapping of clinical cases was examined, and the most informative study was considered eligible for inclusion.

Data extraction and definitions

Following a full-text review of the eligible studies, two independent authors (MF and JL) performed the data extraction and crosschecked all outcomes. When selecting articles and data extraction, potential discrepancies were resolved following a consensus with a third reviewer (QL). Collected data included: first author of the publication, reference number, center, year of publication, type of selection system (based on morphological, biological, radiological, or pathological aspects), number of cases, number of patients within the new selection system, number of cases within MC, number of patients exceeding MC, additive number and increased percentage of LT cases compared with the MC, 5-year overall and disease-free survival rates in new criteria-IN, MC-OUT/new criteria-IN, and new criteria-OUT cases and finally percentage of living donor LT.

As already reported, we stratified the selection systems identified in four groups according to the characteristics of the variables composing the scores. In detail: (a) “morphological” systems were based only on the radiology-derived tumor variables (i.e., number and dimensions); (b) “biological” systems also included biological markers derived from the blood tests; (c) “radiological” systems also included variables derived from the post-locoregional therapy response or the radiology-related tumor activity (i.e., PET avidity); and, (d) “histological” scores also included parameters connected with pre-LT biopsies.

Quality assessment

Selected studies were systematically reviewed with the intent to identify potential sources of bias. The papers' quality was assessed using the Risk of Bias In Non-randomized Studies of Interventions (ROBINS-I) tool [11].

Statistical analysis

The meta-analysis was performed using OpenMetaAnalyst. The statistical heterogeneity was evaluated with the Higgins statistic squared (I2). I2 value was considered indicative of heterogeneity: low = 0–25%; 26–50% = moderate; ≥ 51% = high. In the case of low-to-moderate (0–50%) heterogeneity, a fixed-effects model was used. The random-effects model was used when high heterogeneity was reported. The odds ratio (OR) and 95% confidence intervals (95% CI) were reported. A P value < 0.05 was considered indicative of statistical significance.

Results

Search results and study characteristics

The PRISMA flow diagram schematically depicts the article selection process (Fig. 1). Among the 2898 articles screened, 59 studies reporting HCC-LT selection systems were identified [4, 5, 7, 8, 12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66].

Fig. 1
figure 1

PRISMA flow diagram showing the article selection process

The variables adopted for constructing the selection systems and selecting HCC patients for LT were as follows: 15 (25.4%) were exclusively based on morphological tumor characteristics; 34 (57.6%) on biological characteristics either alone or in combination with morphological features, eight (13.6%) on radiological features, and two (3.4%), on pathological characteristics only. More detailed information about the different variables used to construct a new selection system is displayed in Table 1 [4, 5, 7, 12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66].

Table 1 HCC and LT Scores based on the different combinations of tumor morphology, biology, radiology, and pathology

As for the period of publication, only two studies (3.4%) were published before 2000, [4, 5] 21 (35.6%) during the decade 2000–2009, and 36 (61.0%) during the decade 2010–2021. Interestingly, all but one study based only on morphological tumor characteristics was published before 2010 [23]. The geographical distribution of the articles was as follows: Asia 30 (50.8%), Europe 17 (28.8%), and North America 12 (20.4%). In 22 (37.3%) papers, HCC-LT selection systems were developed in the field of LDLT. In 47 (79.7%) studies, the MC status was reported, thereby comparing the respective proposed new selection systems. According to the data reported, the MC status was estimable in only one (1.7%) report.

Qualitative assessment of the included studies

Results from the qualitative assessment of the included studies are shown in Fig. 2. Overall, 9 (15.3%) studies presented an unclear risk of bias due to the absence of data from a comparative group; in 5 (8.5%) studies, data comparing the outcome of the proposed new selection system with a comparative one were incompletely reported, leading to a potentially high risk of bias.

Fig. 2
figure 2

ROBINS-I qualitative assessment of the included studies

Review of the eligible studies: the 'tower of Babel' of the selection systems

Data concerning the results observed in the analyzed selection systems are displayed in Table 2 [4, 5, 7, 8, 12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66].

Table 2 HCC and LT: overall and disease-free survival rates—results of the different scores

When considering the 48 (81.4%) studies in which sufficient information was available about the MC status, a total of 20,409 cases were reported, 14,453 of them met the new criteria, and 11,189 were MC-IN.

Overall, a total number of 3353 new criteria-IN/MC-OUT cases were reported leading to a 16% increase of transplanted HCC patients. Apart from two reports [19, 58], all proposed expanded selection systems aimed to widen the inclusion criteria. This intent led to an increase in transplanted patients from 2 to 62% compared with the MC. (Table 2 and Fig. 3).

Fig. 3
figure 3

Percentage of supplementary liver transplantations compared to the Milan criteria when using new expanded criteria

Despite the increased number of transplants, the results were only moderately compromised. Interestingly, if the tumor load was within the respective new criteria, 5-year patient survival rates were always superior to 50% (range: 62–90%) (Table 2 and Fig. 4). When adhering to the new criteria, excellent 5-year DFS rates were also obtained. Conversely, DFS dropped each time below 50% if the new selection system was overruled (Table 2 and Fig. 5).

Fig. 4
figure 4

5-year overall survival rates in the different reported HCC criteria

Fig. 5
figure 5

5-year disease-free survival rates in patients within the Milan criteria, without the Milan criteria but within the new expanded criteria or exceeding the new criteria

Meta-analysis for the post-transplant recurrence

Only seventeen papers reported the post-transplant recurrence data required to perform a meta-analysis to compare the MC vs. the expanded criteria [13, 14, 16,17,18, 20, 23, 28, 30, 32, 39, 42, 46, 58, 60, 65, 66]. When the papers were investigated, no heterogeneity was reported (I2 = 0, P = 0.857). A total of 1834 patients meeting the MC (205 recurrences, 11.2%) were compared with 2360 patients meeting the different proposed expanded selection systems (268 recurrences, 11.4%). No statistical significance was reported between the two groups (OR = 1.006, 95% CI = 0.827–1.224; P = 0.951), although a + 28.7% of transplantable cases was observed using the expanded criteria (Fig. 6).

Fig. 6
figure 6

Forest plot and meta-analysis on the post-transplant recurrence: Milan criteria vs. enlarged selection criteria

Discussion

The data observed in the present systematic review confirm that a careful extension of the inclusion criteria may allow many patients to access a potentially curative LT without seriously compromising the outcome.

The first HCC-LT selection system was ‘officially’ born in 1996 when Mazzaferro proposed the MC, achieving a 4-year DFS rate of 92% [5]. Despite the low number of patients reported (n = 48), the retrospective design of the study, and the absence of a control group, the MC still rule access of patients to transplant waiting lists more than 30 years later.

MC represent a very efficacious system for selecting HCC patients waiting for LT thanks to its super-selective ability. This is probably the main reason why the MC remain the most valuable benchmark considered in the setting of LT oncology, even in the presence of a large number of studies considering other more sophisticated parameters. However, the strength of the MC contemporaneously represents its weakness: in fact, the super-selection of the MC excludes a too high number of potentially transplantable patients from a curative strategy.

In 2001, the University of California San Francisco (UCSF) group was the first to challenge the MC. Similar survival rates were obtained using their new criteria, the critical difference being that 20% more patients were able to access a curative LT [7]. Up to now, 59 different HCC scoring systems have been proposed in the setting of HCC and LT [4, 5, 7, 8, 12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66].

All the criteria “extending” the MC can be grouped under the “Metroticket” definition again introduced by the Milan group: the further the trip (namely, the larger the tumor burden), the more expensive the ticket (namely, the higher the post-LT recurrence rate) [8].

Initially, the extension of inclusion criteria for LT was exclusively based on morphological criteria, namely tumor number and diameter [4, 5, 7, 8, 12,13,14,15,16,17,18,19,20,21,22]. In 2007, the Kyoto group [23] for the first time demonstrated that the morphology-alone selection approach was overruled by two fundamental principles of modern oncology, namely the necessity to a) combine tumor morphology and biology and b) evaluate the response to neo-adjuvant therapies to address tumor aggressiveness and behavior [23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66]. The Kyoto group showed that a successful LT could be achieved in patients harboring up to ten tumors on the condition that the tumor marker Protein Induced by Vitamin K Absence-II (PIVKA-II) was ˂400 mAU/mL [23].

Other Asian groups elaborated on this concept during the same period by introducing alpha-fetoprotein (AFP) levels in their selection systems [24,25,26]. Several Japanese and South-Korean centers raised AFP and PIVKA-II sensitivity by contemporaneously using these markers [24, 38, 42,43,44, 56]. Also centers from Western countries progressively introduced AFP to select HCC patients, with cut-off levels ranging from 100 to 2,500 ng/mL [30, 31, 39, 46,47,48,49, 51, 53,54,55]. Later, inflammatory markers such as neutrophil- (NLR) and platelet-to-lymphocyte (PLR) ratios were added for further refinement [33,34,35, 41, 45, 47, 63]. Recently, the radiological response has also been introduced as a useful parameter in selecting HCC cases. For example, the progressive disease after treatment using the mRECIST criteria has been adopted in several studies for predicting the risk of poor post-transplant clinical course [59, 61]. Also the tracer uptake by the HCC at PET-CT scanning has been added as a good prognostic factor in some selection systems [58, 61, 62, 64].

The use of radiological response as a selective tool is the direct consequence of the everyday use of locoregional therapies before transplant, both in the settings of bridging and downstaging [67]. Thanks to the direct effect of these treatments, the selection process has further moved from static to dynamic tumor evaluation. AFP slope ˂15 ng/ml/month [29, 59, 63] and any morphological response on imaging using the modified-Response evaluation criteria in solid tumors (mRECIST) criteria are favorable prognostic factors [59, 63].

It is interesting to note that almost all the proposed expanded HCC-LT selection systems permit the transplantation of more patients without seriously compromising their long-term outcome. This evidence is also confirmed in the meta-analysis performed, in which very similar recurrence rates were observed comparing the MC vs. the new criteria, despite a + 28.7% of transplantable cases was reported using these enlarged systems.

It is of particular interest to note that the DFS rates of patients exceeding the MC but meeting the new selection systems were similar to those obtained in MC. The selection process driven by the new criteria identified a sub-group of MC-OUT patients benefitting from LT. Conversely, if the new selection systems were overruled (new criteria-OUT patients), 5-year DFS was always inferior to 50%, a number corresponding to an oncologically futile transplant procedure. [68, 69].

It is difficult to identify the best selection system to use among the proposed ones. The experiences gathered during the last three decades in both deceased and living donor LT in both Western and Eastern centers indicate that the development of a universally acceptable selection system is within reach. The “ideal” HCC-LT score should incorporate scientifically reliable, pre-operatively available, easy-to-use, dynamic, morphological plus biological, tumor characteristics.

To further improve the selection process, four different matters need to be explored further. The first relates to the pre-transplant diagnosis of microvascular tumor invasion and poor tumor grading. Due to intra-tumor heterogeneity, tumor aggressiveness is challenging to capture with a biopsy [70]. PIVKA-II, a surrogate marker of vascular invasion, should be systematically implemented in clinical use in Western countries [71]. It is to be expected that radiomics will help to solve this shortcoming in the near future [72].

The second matter relates to the impact of LDLT in the treatment of HCC patients waiting for LT. LDLT not only represents a unique opportunity to increase the allograft pool (necessary to cope with the rising number of HCC patients), but most of all allow exploration of the effect of expanding the HCC inclusion criteria without harming non-tumor patients on the waiting list [73]. The role of LDLT in treating HCC patients will become increasingly important, because dropout risk is virtually eliminated [74]. Important in this (ethical) context is also the fact that recent technical developments have turned LDLT from a “high risk, high return” into a “low risk, high return” procedure [75]. These considerations imply that LDLT represents a fertile soil to explore further the role of transplantation in the cure of HCC patients. The time has come for the Western world to take up this challenge.

The third matter relates to integrating the concept of transplant benefit in HCC patient selection. Transplant survival benefit corresponds to the number of years gained by LT minus the number of years offered by alternative treatments from LT. Intention-to-treat transplant survival benefit adheres to the same concept, considering the gain in life expectancy, but from waiting list registration, thereby taking into consideration any possible therapy from the time of HCC diagnosis [76]. The identification of selection systems based on the concept of benefit should improve the selection process of HCC patients by identifying patients deserving LT and avoiding futile transplants in patients presenting with too advanced or too early tumor burdens.

Finally, any selection system should also consider the immunosuppression load of the HCC liver recipient. Immunosuppression cannot be disregarded in the context of LT for HCC, as it is the most relevant pro-oncogenic factor [77]. This consideration is especially critical when expanding the inclusion criteria, which, by definition, implies a larger tumor burden and a potentially higher risk of recurrence, and when dealing with remaining tumor tissue at the examination of the total hepatectomy specimen [78]. The development of more extensive inclusion criteria should be accompanied by strategies that aim to minimize the immunosuppressive load.

The present study has some limitations. As already underlined, some of the selected papers revealed an uncertain or high risk of bias. This limit is the consequence of the retrospective and non-randomized nature of all studies exploring the role of HCC-LT selection systems. Another limitation relates to the poor homogeneity of the different proposed selection systems, with only a minimal number of studies reporting their external validation. The significant absence of data available in the articles strongly limited our meta-analysis. Only 17/66 articles clearly stated the recurrence data required. Indeed, more homogeneous and more detailed studies are required for conducting such an investigation using more significant numbers.

Conclusions

The development of a widely accepted “comprehensive” HCC-LT selection system is a necessity. To reach this goal, the development of new diagnostic technologies, more comprehensive implementation of living-donor-liver transplantation, and integration of the concept of benefit into the therapeutic scheme of HCC patients will be necessary. All these elements are essential to bring order to the chaos of selection systems and, more importantly, to offer the best possible treatment to the highest possible number of HCC liver patients. Hopefully, the tower of Babel of scores will disappear in the near future.