Key points

  1. 1.

    To highlight the effectiveness of machine learning algorithm on the prediction of therapeutic outcome for hepatocellular carcinoma after various treatment modalities

  2. 2.

    To illustrate the advantages and disadvantages of each machine learning algorithm

  3. 3.

    To familiarize the challenges of selecting a machine learning algorithm when creating a model

Introduction

Hepatocellular carcinoma (HCC) is an aggressive tumor which remains the second-most frequent cause of cancer death worldwide [1,2,3]. According to the different statuses of patients with HCC, several guidelines [4,5,6,7] recommend various treatment strategies. Due to the aggressive biological behavior of HCC, recurrence is not uncommon. Therefore, it is essential to predict therapeutic outcomes prior to treatment so that physicians can design a personalized therapeutic strategy for each patient. The conventional process of model establishment is selecting the appropriate predictors, utilizing them for statistical analysis and ultimately deriving a multivariate predictive model [8,9,10,11,12]. However, predictive models developed by traditional statistical methods, such as the logistic regression (LR) model and Cox proportional hazards model, are not reliable because the factors included in the models are too simple and utilize a low evidence level. Machine learning (ML) is a powerful tool for generating high-level medical features or combining quantitative radiomic parameters with efficient algorithms [13,14,15,16]. ML algorithms simulate human learning to detect hidden patterns within HCC therapeutic data that are clearer than those derived from traditional statistical methods. With this in mind, ML algorithm has been used in many studies to predict the therapeutic outcome of HCC patients. Thus, in this review, the advantages and disadvantages of each ML algorithm are clarified, and relevant literature on the prediction of therapeutic outcomes after various treatment modalities for HCC is described.

Advantages and disadvantages of the ML algorithm

ML algorithms have several advantages over traditional statistical methods. First, traditional statistical methods can only process the variables that have a linear relationship with the outcome [12], whereas ML algorithms have the ability to process nonlinear data. Second, ML algorithms can learn from existing data to find novel patterns between variables and generate predictions [17,18,19,20]. Third, the ML model may contain more variables [21, 22] since the variables do not simply rely on the selection of traditional statistical methods [23,24,25]. Last, ML methods can process big data at a high speed.

Although ML algorithms are increasingly used, the disadvantages of ML algorithms should be kept in mind. First, the current ML methods are still not readily available for clinical practice, and the design of the ML model is not standard. Second, the lack of perfect generalization capability is still a common issue in clinical practice. The detailed advantages and disadvantages of the ML algorithm are listed in Table 1.

Table 1 Advantages and disadvantages of ML algorithm

ML models in the prediction of therapeutic outcomes for HCC

With the development of ML algorithms, a growing number of studies have developed prognostic predictive models for HCC using the ML method. Therefore, understanding how ML works is essential. In this section, various ML models are introduced.

Neural networks

Neural networks are a classic ML method that simulates human brain neural networks. The most widely used neural networks are artificial neural networks (ANNs) and deep neural networks (DNNs). ANN [26] is one of the earliest neural network models and can be divided into three components: an input layer, a hidden layer, and an output layer. The ANN model can include a perceptron or a multilayer perceptron (MLP) (Fig. 1), with or without a hidden layer. However, ANNs cannot directly deal with medical imaging. With the development of deep learning, DNNs are widely used in establishing models [27]. Convolutional neural networks (CNNs) [28, 29] are one of the most common DNNs that can automatically identify and segment medical imaging. Another type of DNN is the recurrent neural network (RNN). However, RNNs are limited in HCC prognostic studies because the RNN algorithm cannot process data over a large time span.

Fig. 1
figure 1

Schematic diagram of an artificial neural network (ANN). a shows a schematic diagram of a perceptron. It is the simplest model of an ANN and only includes an input layer and an output layer. In the perceptron, the input feature parameters are directly converted to the output results through the weight between the input and output. b shows a schematic diagram of a 3-layer ANN (also called a multilayer perception). The first layer is the input layer, corresponding to the input feature parameters (X); the middle is the hidden layer, which uses a composite function to achieve the abstraction for input features so that the input can be better divided linearly; the last is the output layer, where the number of categories to be classified determines the number of neurons in this layer, and its output value (Y) is the predictive value of the ANN

Support vector machines

A support vector machine (SVM) [30] is a type of two-category model aimed at finding the optimal separating hyperplane with the largest distance to the support vector of any class (Fig. 2). Due to the hyperplane concept, SVM is often used for the selection of parameters for which the parameters are selected by the correlation to the results. However, SVMs are only applied in studies with small sample sizes, as the number of support vectors in large datasets is still very big, which may increase the complexity and training time of SVM algorithms.

Fig. 2
figure 2

Schematic diagram of a support vector machine (SVM). Although a decision hyperplane with maximal margin separates every sample into two classes, the support vector is the sample point on the margin hyperplane, which is the largest margin of classification under the constraints. The final SVM model is only related to support vectors

Decision tree and random forest

The decision tree (DT) [31] is easy to understand and adopts a form of yes or no question and comprises a root node, parent node and leaf node/terminal node. Unfortunately, with the increasing complexity of the DT model, the predictive value inevitability decreases. Random forest (RF) [32] represents an ensemble learning approach of multiple unique DTs, which is designed to increase the predictive performance (Fig. 3). In the training procedure, the bootstrap sampling method is used to construct each tree based on randomized samples and features from the original dataset, and the final result of the RF is the average prediction of each tree. In most ML algorithms, RFs have the highest predictive performance [33]. Ishwaran et al. [34] designed random survival forests (RSFs), an extension of RFs to right-censored survival data. However, due to the complexity of RF, the processing requires more time for training the model compared with other ML algorithms.

Fig. 3
figure 3

Schematic diagram of random forests (RFs). An RF is an ensemble learning approach that consists of multiple decision trees. In the process of training the RF model, each decision tree is trained in sequence, and the training samples of each tree are extracted from the original datasets by a randomized sampling method. The features used in each decision tree are also obtained by random sampling. The joint prediction of multiple decision trees improves the accuracy of the RF model

Bayesian networks

Bayesian networks (BNs) [35] are different from most ML algorithms. A BN is an extension of Bayes’ theorem and presents the causality under each variable via a directed acyclic graph (Fig. 4). Therefore, those algorithms can visualize information. BNs have been applied to analyze predictors for survival in postsurgical HCC patients through conditional probability tables (CPTs) [36]. However, the relationship between each variable in the BN model is not always clear, which leads to low accuracy.

Fig. 4
figure 4

Schematic diagram of a Bayesian network (BN). A BN is a directed acyclic graph that consists of nodes, edges and conditional probability. The nodes represent random parameters, and the directed edges between the nodes represent conditional dependencies (from the parent node to its child nodes). The interdependence between the nodes is expressed with conditional probability, and the classification result is the class with the highest conditional probability

Methods

A search of PubMed was conducted for a prognostic predictive model for HCC published from January 1995 to May 2020. The following search algorithm was created: “hepatocellular carcinoma” and “model” and “predict” or “prognostic/prognosis” and “machine learning” or “neural network” or “support vector machine” or “decision tree” or “random forest” or “Bayesian network”. Initially, a total of 291 relevant research articles were searched, and the literature selection process is shown in Fig. 5. Ultimately, 29 articles were enrolled in the final analysis.

Fig. 5
figure 5

Flowchart of the search strategy and selection of studies for inclusion

Prediction of therapeutic outcomes by various treatment modalities

There are various treatment options for HCC. Surgical resection, ablative therapy, and liver transplantation (LT) are potentially curative treatments, and transarterial chemoembolization (TACE) and sorafenib are palliative treatments [37,38,39,40]. Due to the poor prognosis of HCC patients, it is essential to create a suitable predictive model for predicting therapeutic outcomes prior to treatment. In this section, the current updates of ML algorithms are reviewed for various treatment modalities in HCC patients.

Surgical resection

Partial hepatectomy remains the mainstay of curative treatment in the early stage of HCC. Intrahepatic recurrence of HCC after surgical resection is the major cause of death, as the incidence rate is approximately 70% at 5 years [41, 42]. Therefore, an accurate prediction of prognosis prior to resection is crucial.

In previous studies, the authors developed a predictive ANN model [43,44,45,46,47] to predict therapeutic outcomes after surgical resection, and the ANN model was verified to be superior to the LR model and Cox proportional hazards regression model. Unfortunately, the ANN model cannot be used to select variables, which may decrease the predictive accuracy when some potentially clinically meaningful variables are overlooked. Similar to ANN, the BN model [19, 36] also cannot be used to select variables. The predictive variables within the BN model are based on the clinician’s experience and knowledge, and the associated relationship between variables and outcome is not always clear; therefore, the performance of the BN model is generally confusing. Unlike ANN and BN, RF and SVM can be used to either select variables or develop models [48,49,50,51,52,53,54]. Wang et al. [52] used the RF algorithm to select 30 radiomic features from 3144 MR texture features and developed a predictive RSF model for the 5-year survival of HCC following surgical resection with an area under curve (AUC) of 0.980. In addition, Liao et al. developed an RF model [53] based on 46 features from whole slide images (WSIs), and the results showed comparable accuracy to the TNM staging system in predicting the prognosis of HCC patients after surgical resection. However, it should be noted that the sample size in the clinical study of HCC is usually small, and the SVM model is theoretically more suitable than other models. Xu et al. used an immunohistochemistry (IHC)-based SVM algorithm [48] to predict the recurrence of 336 HCC patients after surgical resection. The SVM model finally selected 8 features from 49 features and had an accuracy of 82.1%. In comparison to the abovementioned ML algorithms, the CNN algorithm has great convenience in establishing predictive models because they can be used not only to segment imaging but also to select parameters and to develop models [23, 55]. Wang et al. [23] used the CNN algorithm to extract high-level temporal and spatial features from multiphase CT imaging using an automatic mode, which showed high efficacy with an AUC of 0.825 for predicting the early recurrence of HCC. Nevertheless, the automatic mode based on the CNN algorithm requires high computational power and thus has limited use. The relevant papers are listed in Table 2.

Table 2 Characteristics of ML-based predictive model of HCC patients after hepatectomy

LT

LT is regarded as an effective therapy in HCC patients who are within the Milan criteria, with a recurrence rate of 10–15% [38, 56, 57]. Once post-LT HCC recurrence occurs, the prognosis is poor. Therefore, it is necessary to accurately identify HCC patients who will benefit from LT, thereby optimizing donor-recipient matching.

To our knowledge, ML-based analysis in predicting therapeutic outcomes for HCC after LT is rather limited. Marsh et al. [58, 59] developed an ANN model using seven clinical factors to predict the recurrence risk in HCC patients after LT, and the results showed that the discriminatory power was 70%. However, in the combination of this model and other variables, such as genotyping for microsatellite mutations/deletions (TM-GTP), the predictive performance increased from 70 to 85%. Rodriguez-Luna et al. [60] externally validated the ANN/TM-GTP model, and the discriminatory power was 89.5%, while the sample size in the external validation cohort was too small, comprising only 19 patients; therefore, the predictive performance was less convincing. A multicenter study conducted by Nam et al. showed more convincing results [24] because they developed a DNN model and included a relatively large sample size, in which the training cohort was 563 and the validation cohort was 214. Nevertheless, the predictive model should be based not only on the characteristics of receipts but also on donors. Therefore, precise receipt-donor matching is crucial to develop a predictive model. Zhang et al. [61] established an MLP model by including 14 characteristics of donors as well as recipients. The results showed that the c-statistics of the specific MLPs at 1, 2, and 5 years were 0.909, 0.888, and 0.845, respectively. However, the main weakness of this MLP model is the lack of external validation, and the generalization of this model needs to be further confirmed. The relevant papers are listed in Table 3.

Table 3 Characteristics of ML-based predictive model of HCC patients after transplantation

Local ablation

A string of image-guided percutaneous ablations encompasses a great variety of techniques, including radiofrequency ablation (RFA), microwave ablation (MWA), ethanol injection, and cryoablation [62,63,64,65]. As RFA is the most frequently used ablation modality for HCC [66, 67], the main topic addressed in this section is RFA. Although RFA has shown good feasibility in local tumor control for HCC, complete ablation is slightly idealistic, and the relapse rate ranges from 49 to 63% [68]. When a recurrence of HCC after ablation arises, the proliferation and invasive ability of tumors are markedly increased.

Very few studies have used ML models to predict therapeutic outcomes in HCC patients in the setting of RFA. In a small sample analysis of 83 HCC patients [69], an SVM model was used to analyze the relationship between clinical features and early post-RFA recurrence, and the results showed that the model had an AUC of 0.69. However, the predictive performance of the SVM model may decrease when a large number of variables are inputted. Therefore, it is essential to select the variable prior to establishing the SVM model. Conversely, the ANN algorithm cannot be used to select variables, while the key advantage of the ANN model is that it can process data with a large number of variables and samples. In the study of Wu et al. [70], a total of 15 variables were inputted into two ANN models for the prediction of 1-year disease-free survival (DFS) and 2-year DFS, and the performances of these two models were both excellent, with AUCs of 0.964 and 0.974, respectively. Unfortunately, both SVM and ANN models are immature because they lack external validation. The relevant papers are listed in Table 4.

Table 4 Characteristics of ML-based predictive model after RFA

TACE

Most HCC patients are typically diagnosed at intermediate or advanced stages when curative treatments cannot be applied [71, 72]. According to the Barcelona Clinic Liver Cancer (BCLC) staging system [73] and several treatment guidelines [6, 7], TACE is the gold standard for patients with intermediate-stage HCC. Since not all HCC patients can benefit from TACE [74, 75], a predictive model providing therapeutic outcome estimation prior to the procedure is urgently needed for clinical decision making.

Previous studies have constructed a variety of ML models with clinical and radiological variables for predicting the therapeutic outcome of HCC patients after TACE [25, 76,77,78,79]. Mähringer-Kunz et al. [76] used traditional imaging features, such as tumor size and tumor number, and other clinical variables to create an ANN model for predicting 1-year survival after TACE. Further, the results demonstrated that the predictive performance of this model was 0.77 in the training cohort and 0.83 in the validation cohort. However, imaging features are not always visible to the naked eye, and some tiny imaging features may be overlooked. Radiomics is an emerging discipline that can extract invisible imaging features such as statistic, shape and texture features from medical images. Abajian et al. [77] established an RF model and used semiautomatic 3D tumor segmenting software to extract several statistic and shape features from MR imaging. The results of their study demonstrated that the most valuable predictor of treatment response following TACE was relative tumor signal intensity on pre-TACE MR images, and the highest predictive accuracy of the RF model achieved 78%. However, the quantitative imaging features in Abajian’s study are too simple and cannot provide adequate information to predict the therapeutic outcome. Liu et al. [78] developed an SVM model with complex radiomic features. These complex radiomic features were first extracted by manual segmentation from static B-mode images, which included 181 statistic features, 13 tumor shape features, and 740 texture features. After extraction, the meaningful radiomic features were selected by the gradient boosted regression trees (GBRT) algorithm [78], and finally, the SVM model was established with an AUC of 0.81 in the internal validation cohort. Indeed, radiomic features can not only be extracted by manual or semiautomatic segmentation tools but can also be extracted automatically by CNN algorithms [25, 78, 79]. Morshid et al. [79] used a CNN-based segmentation protocol to extract a large number of shape and texture features from portal venous phase CT images. Based on these imaging features, an RF model was established, and the results showed that the RF model could accurately distinguish TACE-refractory patients with an AUC of 0.7331 [79]. In addition to extracting imaging features, the CNN algorithm can also be used to establish the predictive model [25, 78]. Peng et al. [25] used the CNN algorithm to automatically extract the imaging features of HCC from CT images and established a predictive model of tumor response after TACE. Their study showed that the CNN models could predict the complete response (CR), partial response (PR), stable disease (SD) and progressive disease (PD) of HCC lesions with AUCs of 0.97, 0.96, 0.95, and 0.96, respectively. Similarly, Liu et al. [78] developed a CNN model to extract imaging features from dynamic contrast-enhanced ultrasound (CEUS) images and predict the objective response of 130 HCC patients after TACE with an AUC of 0.93. The relevant papers are listed in Table 5.

Table 5 Characteristics of ML-based predictive model of HCC patients after TACE

Sorafenib

Sorafenib is the standard treatment for advanced-stage HCC. The median OS of sorafenib-treated HCC was 10.7 months and 6.5 months in two previous representative randomized controlled trials [80, 81]. Because of the high cost and modest efficacy, a reliable predictive tool is necessary to assist clinicians in adjusting the daily management of sorafenib for such patients.

ML methods are not routinely used for predicting therapeutic outcomes in the treatment of sorafenib for HCC. Choi et al. [22] collected clinical and radiological data from 480 sorafenib-treated patients, and the important variable scores were used to select final parameters based on the RF algorithm. They found that the established model had a better predictive performance in time to progression (TTP) and overall survival (OS) than those of the Child–Pugh and Model for End-Stage Liver Disease (MELD) scores (0.746 vs 0.686 and 0.545 for TTP, 0.875 vs 0.777 and 0.682 for OS). However, this study lacks independent external validation. The relevant papers are listed in Table 6.

Table 6 Characteristics of ML-based predictive model of HCC patients after treatment of sorafenib

Future perspectives in ML for the prognostic study of HCC

Currently, the ML model for predicting the therapeutic outcome of HCC is usually based on multivariate predictors, such as demographic, clinical, radiological, pathologic and genetic parameters. Selecting the final predictors is a considerable challenge in traditional statistical models because traditional statistical methods may lose some important information. The ML model can include more variables, and it may become a promising protocol over the traditional statistical model. In addition, the ML algorithm can extract and select radiomic features that are invisible to the naked eye, and those novel variables may provide promising predictive value compared with simple radiological parameters (tumor size and tumor number, etc.).

The most important challenge in the ML approach is the accurate selection of algorithms to create the predictive model with external validation for the model. On the one hand, certain types of ML models are favored for specific types of data, such as CNNs for imaging data and SVMs for small sample size data. The ML model should be selected by a thoughtful study design. On the other hand, as there is a need for clinical reality in the future, appropriate external validation should be used to confirm the generalization ability. Due to the lack of a commonly accepted design of ML predictive models for the prognostic study of HCC, it may be possible that the current ML model is not the best one available.

Conclusion

ML algorithms can automatically extract imaging features and identify optimal subsets of features from large data sets, particularly when combined with radiomics analysis. Relative to traditional statistical models, ML models demonstrate improved predictive performance in the prognostic study of HCC. Regrettably, most existing ML predictive models lack external validation, which is an obstacle to serving HCC patients as personalized predictive tools. Although most current ML algorithms are preliminary, this promising method will be widely accepted in clinical practice in the future.