Introduction

Hepatocellular carcinoma (HCC) is characterized as a disease that spreads throughout the liver due to repeated intrahepatic recurrence of localized lesions, resulting in death due to liver failure. HCC typically originates from underlying liver disease and the major cause is hepatitis B or C virus infection with or without cirrhosis [1].

Alcohol abuse and cigarette smoking are also common factors of etiology, while metabolic diseases including obesity and diabetes as well as nonalcoholic fatty liver disease become amplifiers of risk of HCC [2]. There are various treatment methods for HCC. and it is necessary to predict the survival period and survival rate following treatment methods. Hepatic resection is the best treatment option for potential curative outcomes, but less than one-third of HCC cases are eligible for resection of HCC at the time of diagnosis [3]. In addition, the high rate of recurrence despite curative resection presents a major challenge in HCC management [4]. Most postoperative recurrence cases occur in the remnant liver as intrahepatic recurrence [5], and discerning reliable predictors is essential for patient risk evaluation, treatment decision-support and long-term survival improvement. HCC can be diagnosed with biopsy or with noninvasive imaging in high risk groups with chronic hepatitis or cirrhosis. If the imaging diagnosis is indecisive or has atypical features, biopsy is suggested. However, in case of patients with ascites, high risk of bleeding, and HCC in challenging location, biopsy is difficult, and therefore imaging diagnosis is preferred in these cases [6]. For the reason, it is necessary to predict the survival period and survival rate following treatment methods. HCC can be diagnosed on biopsy or by noninvasive imaging in high risk groups with chronic hepatitis or cirrhosis. If the imaging diagnosis is indecisive or has atypical features, biopsy is recommended. However, biopsy is difficult in patients with ascites, high risk of bleeding, or HCC in a challenging location, and in such cases, imaging diagnosis is preferred [6]. For the reason, it is necessary to predict the survival period and survival rate following treatment methods.

In order to develop a predictive model for the survival period and survival rate, we might need to obtain multi-center data, which is a sufficient number to represent the population, and including well curated features for analyzing HCC and survival period. To do this with overcoming internal data limitations in hospital, we utilized the HCC multi-center data of Korea Central Cancer Registry, National Cancer Center, and Ministry of Health and Welfare data sets, and appropriate machine learning algorithms. This artificial intelligent type predictive model could lead us to develop personalized treatment methods that consider liver function and HCC status, and data-based treatment imposing clinician's insights.

Various machine learning algorithms were used for survival rate prediction, which are voting ensembles, Logistic Regression, K-nearest neighbors, Decision Tree Classifier, Support Vector Machine, Random Forest, Extreme gradient boosting trees (XG Boost), Light GBM, and Natural Gradient Boosting (NG Boost).

The aim of this study was machine learning-based survival rate prediction of Korean hepatocellular carcinoma patients using the multi-center data as a foundation for development of a new predictive artificial intelligence model according to treatment methods.

Methods

Patients

A total of 10,742 patients diagnosed with liver cancer, as registered by Korean Liver Cancer Study Group and Ministry of Health & Welfare, Korea Central Cancer Registry from 2008 to 2015, were evaluated; 101 patients had diagnoses of liver cancer other than HCC and were excluded (Fig. 1). Cases were divided into Group I diagnosed as HCC before treatment, and Group II diagnosed according to HCC diagnostic criteria as outlined in Korean Liver Cancer Association guidelines [6]. HCC is diagnosed if the histological and immunological findings after biopsy are positive or if the image findings are consistent with HCC, at a size larger than or equal to 1 cm, hyper enhancement in arterial phase and washout at portal venous or delayed phase on multi-phase CT and MRI using specific contrast, in high-risk patients. The authors divided the patients according to diagnostic modality, 2,920 patients were analyzed with HCC histologically either by needle or surgical biopsy (Group I) and 5,562 patients were included with HCC radiologically (Group II) (Fig. 1), with baseline demographic data previously published [7].

Fig. 1
figure 1

Flow diagram of the patients population

The study design was approved by the Institutional Review Board of Pusan National University Hospital (No. 2009-025-095) and was conducted in accordance with the Declaration of Helsinki.

Feature selection

Using predictive algorithms based on machine learning, the data on HCC patients collected at Korea Central Cancer Registry were used to determine the appropriate treatment (Table 1) and survival period for HCC patients with a range in liver functionality. The authors attempted to determine features that are effective in predicting survival rate and to interpret said features in keeping with the purpose of this study.

Table 1 Treatment Methods. Group I, patients who were diagnosed histologically; Group II, patients who were diagnosed radiologically. a: radiofrequency ablation, alcohol injection, other local ablation, b: transarterial chemoembolization with gelfoam, beads, chemolipiodolization, chemoinfusion, radioembolization

First, the process of pre-processing the data was conducted as previously explained. The analysis index of the collected data was a total of 117 features including image features and BCLC stage (Table 2). Also, Height and weight, liver function test, liver cirrhosis status, radiologic TNM findings, and histopathological TNM findings (Table 3) were included. Therefore, we used the 51 features in the biopsy (Group I) and 62 features when biopsy is not performed (Group II). In prediction of survival time rate according to treatment methods, 57 features, 48 features were used in TACE and surgical resection, respectively. After the exception is also feature more than the absolute value of the Correlation of 0.9 it was finally classified according to the treatment method the feature used.

Table 2 BCLC Staging. Group I, patients who were diagnosed histologically; Group II, patients who were diagnosed radiologically
Table 3 TNM Staging. Group I, patients who were diagnosed histologically; Group II, patients who were diagnosed radiologically
Table 4 Target point and data processing of machine learning
Table 5 Train set and test set of group I

Data splits for machine learning processing

Because of slightly imbalanced given data, we used stratified sampling with the ratio 8:2 for train and test two disjoint sets, respectively. We performed 5 different predictions (Fig. 2), which is same as fivefold but for test.

Fig. 2
figure 2

Schematic picture of 5 different prediction procedure same as 5 folds

And then the average of accuracy, precision, sensitivity and F1 score were obtained from the 5 predictions.

Machine learning method

In this study, various machine learning algorithms were used for survival rate prediction according to mortality, survival time, and treatment method. The algorithms are voting ensembles [8,9,10,11], Logistic Regression (LR) [12], K-nearest neighbors (KNN) [13, 14], Decision Tree (DT) Classifier [15,16,17], Support Vector Machine (SVM) [18,19,20,21], Random Forest (RF) [22, 23], Extreme gradient boosting trees (XG Boost) [24], Light GBM [25, 26], and Natural Gradient Boosting (NG Boost) [27, 28]. Its prediction results are compared in Tables 6, 8, and 10.

Table 6 Train set and test set of group II before down sampling
Table 7 Prediction of mortality rate of Group I according to Machine learning methods
Table 8 Prediction of mortality rate of Group II before down sampling according to Machine learning methods
Table 9 Train set and test set of group II after down sampling
Table 10 Prediction of mortality rate of group II after down sampling according to machine learning methods

Results

The target was overall survival time, which is divided into about by 60 months (= < 60 m, > 60 m) (Table 4). After preprocessing of the given data, the target distributions for each group, were 148 samples (28.8%), whose the overall survival time is less than 60 months, 366 samples (71.2%), greater than 60 months in Group I, which has total 514 samples, and 504 samples (66.6%), less than 60 months, 253 samples (33.4%), 33.4% (253 samples) greater than 60 months in Group II, total 757 samples.

  1. 1.

    Prediction of mortality rate according to the presence or absence of biopsy

  2. 2.

    In case of biopsy (Group I)

When biopsy was performed (Group I), it can be seen that the surviving and deceased samples were relatively evenly distributed (Table 5). Therefore, in this case, down sampling or up sampling was not performed. Even in this case, the XG Boost method that obtained the best result among the methods used in the prediction was not significantly lower than the accuracy in precision, recall, and ROC value, but all indicators including accuracy were 70% (Table 6). Among the methods used in prediction, the XG Boost method obtained the best result. Pathology Portal invasion, method surgery, image M, Pathology T, needle biopsy, etc. can be seen as the most important factors for prediction (Fig. 3).

  1. (2)

    When biopsy is not performed (Group II)

Fig. 3
figure 3

Feature Importance F1 by SHAP values of Group I with surgical resection according to 5 folds shown in Fig. 2, respectively

If the biopsy was not performed, the surviving and deceased samples were unevenly distributed (Tables 7, 8), and the down sample was used to obtain the predicted results (Tables 9, 10).

It can be seen that the classification by the XG Boost method has obtained relatively the best results, and it can be seen that the precision, recall, and ROC values ​​are not significantly lower than the accuracy.

Among the methods used in prediction, the XG Boost method obtained the best result, and when looking at the method using GAIN in the importance analysis, image portal invasion, image T, image tumor size, BCLC stage, etc. can be seen as the most important factors for prediction.

Using NG Boost method, its accuracy was 83%, precision 84%, sensitivity 95%, and F1 score 89% for more than 60 months survival time in Group I with surgical resection. Moreover, its accuracy was 79%, precision 82%, sensitivity 87%, and F1 score 84% for less than 60 months survival time in Group II with TACE. The feature importance with gain criterion indicated that Pathology Portal invasion, method surgery, image M, Pathology T, needle biopsy features could be explained as important factors for prediction in case of biopsy (Group I).

  1. 2.

    Prediction of survival time rate according to treatment methods was analyzed.

To analyze the survival rate according to the treatment method, the analysis target was overall survival (Table 4). It was analyzed by dividing the survival period into less and more than 60 months. Five classifications were made among the various treatment methods of the collected data (Class 1: Surgical resection, Class 2: Liver transplantation, Class 3: Local ablation therapy, Class 4: Trans arterial Chemoembolization (TACE), Class 5: Others). Among the treatment methods, the prediction between liver transplantation and local ablation therapy was inaccurate. The problem often lies with too little data, and treatment method being determined by clinical experience. However, in the case of predicting only surgical resection and TACE, a model with good results of high accuracy and precision was developed (Table 4).

Significant treatment methods were TACE and surgical resection. According to these two treatment methods, survival rate analysis (Tables 11, 12) was performed with features (Figs. 3, 4).

Table 11 Prediction results for survival rate of group I with surgical resection: the results were obtained by the average of fivefold type testing
Table 12 Prediction results for survival rate of Group II with TACE: the results were obtained by the average of fivefold type testing
Fig. 4
figure 4

Feature Importance F1 by SHAP values of Group II with TACE according to 5 folds shown in Fig. 2, respectively

Discussion

The value of multi-center data will depend on the degree of standardization of the collected data. In addition, it must include a sufficient number to represent the population. Using a machine learning-based prediction algorithm, the authors analyzed the appropriate treatment for HCC and the properties that influence the survival period accordingly. Through this work, the authors intended to develop an algorithm that can propose the optimal personalized treatment for each individual according to liver function and HCC condition. Recently, computer-based diagnosis and prognostic prediction by machine-learning algorithms and deep-learning systems have been widely used and more individualized prediction based on a combination of variables is provided by nomogram models [29].

By developing machine-learning algorithms and deep-learning systems to predict prognosis of HCC patients, it is possible to offer individualized recurrence surveillance and adjuvant therapy. Data collection in a standardized form is the priority of national big data management. In order to solve the limitations of national multi-center data, collection in a nationally standardized format and uniformity of the processing method of the missing data is essential.

In the results of this study, among the various treatment methods, the prediction of survival rate with liver transplantation and local ablation therapy was inaccurate. The problem often lies with too little data, and treatment method being determined by clinical experience, thus different in each case.

In addition to tumor extent, hepatic reservoir plays a major role when selecting the treatment method. Before the treatment selection, laboratory tests and imaging were performed to evaluate liver function and tumor extent, and great effort was made to combine these factors in order to choose the most suitable treatment modality and predict the prognosis [30].

Generally, late recurrence (more than 2 years) after liver resection for HCC is regarded as a multi-centric tumor or a de novo cancer. Therefore, surveillance for recurrence 2 years after surgery should be targeted to the liver [31]. In this study, machine learning (ML) model was used to evaluate the relationship between preoperative and treatment modalities with treatment results expressed by overall survival.

ML consists of input and output and is unlike past previously programmed models, in that an ML program learns from the examples and processes massive data. More accuracy can be achieved by training and therefore, more data provides better predictions [32].

Korean Primary Liver Cancer Registry data provided by Korean Liver Cancer Association will be used as input for training an ML model and predicting prognosis of HCC according to preoperative findings and treatment performed. Therefore, for the establishment of a national cohort, the standardization of data and the accuracy of collection must be followed.

By adapting ML to the medical field, increasing amounts of data exceeding that of the capacity of the human brain can be processed in an efficient, time-saving manner. By supplementing records and increasing training sources, the ML model will become an important tool for the selection of appropriate treatment modality for HCC patients in consideration of patient factor, tumor extent and prediction of prognosis. In the future, it will be possible to calculate accurate predictions using a new data set development and differentiated training source for data accumulation. Information on the patient's living environment, economic ability, and social status is also required, and regional and geopolitical locations are recommended to be included as variables.

At the present time, the limitations of developing AI using big data are reliability and missing data. The method of collecting data from various institutions retrospectively has the disadvantage of data not being uniform and the interval between observations inconsistent. It is necessary to simplify and unify the clinical research form of the Korean Society for Liver Cancer. Basic sociological factors should also be included as variables, after which national cohort results can be obtained. It is essential to collect data regularly based on a given template. By using big data collected from multi-centers nationwide, it will be possible to develop a predictive program that provides the basis for treatment response, with factors leading to recurrence after initial treatment. The establishment of a large data cohort of HCC in Korea, which plays a leading role in the epidemiology, diagnosis and treatment of HCC, will greatly advance the development of HCC treatment worldwide. HCC data owned by Pusan ​​National University Hospital will be used to avoid the limitation of data suitability from the multi-center data, which aims to implement a predictive model for the HCC survival rate, survival period, or optimal treatment method based on machine learning..

Conclusion

With the statistical tools obtained through previous study, an ML program with a deep neural network by deep learning at each layer equipped with the Cox proportional hazard model was analyzed. By developing machine-learning algorithms and deep-learning systems to predict prognosis of HCC patients, it is possible to propose the optimal personalized treatment for each individual according to liver function and HCC status. In order to solve the limitations of multi-center data collected in a standardized form is the priority of national multi-center data management.