Introduction

Liver cancer is the sixth most common cancer worldwide and has the fourth highest mortality rate among cancers. Hepatocellular carcinoma (HCC) is the most common form of liver cancer and accounts for ~ 90% of liver cancer cases [1]. The incidence and mortality rates of HCC are highest in East Asia and Africa. HCC is expected to become the third leading cause of death due to cancer by 2030 [2]. The etiology of HCC is regionally variable, with hepatitis B virus being the leading cause of HCC in most of Asia (except Japan), South America, and Africa. Hepatitis C virus is the major cause of HCC in Western Europe, North America, and Japan, and alcohol consumption is the main cause of HCC in Central and Eastern Europe [3].

The pathophysiology of HCC is a complex multistage process in which the interaction of multiple factors is the starting point for the malignant transformation of hepatocytes and the development of HCC. Studies have shown that autophagy-related pathways are involved in the development of HCC [4, 5]. Autophagy is the process of transporting damaged, degenerated, or senescent proteins and organelles from cells to lysosomes for digestion and degradation. Autophagy plays a double-edged role in tumors. Under normal physiological conditions, cellular autophagy is beneficial for maintaining a self-stable state, preventing the accumulation of damaged proteins and organelles and inhibiting cellular carcinogenesis. However, once a tumor is formed, cellular autophagy provides rich nutrition to cancer cells and promotes tumor growth [6].

The diagnosis of HCC currently relies on two main methods: imaging and histopathology. Further testing is required for high-risk individuals with suspected liver nodules or abnormal serum AFP levels. On CT scan or MRI, HCC lesions are brighter than the surrounding liver in the arterial phase and less bright than the surrounding parenchyma in the venous and delayed phases [7]. This phenomenon has a sensitivity of 89% and a specificity of 96% for HCC diagnosis [8]. However, there are some tumors with atypical imaging presentations, so a biopsy is recommended. The sensitivity of a biopsy is ~ 70% [9]. In recent years, studies have also shown the significance of liquid biopsy in the diagnosis of HCC, and some scientists have worked to find new biomarkers for the early detection and diagnosis of HCC to improve the prognosis of HCC patients [10].

For patients with early-stage HCC, surgical intervention, including resection, transplantation and local ablation, is recommended in principle. The preferred treatment option for patients with intermediate-stage HCC is TACE, and patients with advanced HCC are recommended to first receive systemic therapy [11]. Nevertheless, the prognosis of patients with HCC remains unideal. The prognosis of HCC is mainly assessed on the basis of staging, such as BCLC and TNM staging. However, the results of staging are unsatisfactory, and tools that can accurately assess HCC remain lacking. Therefore, making prognostic predictions with increased accuracy is crucial for the clinical management of patients with HCC.

In consideration of the appreciable regional differences in the etiology of HCC and the fact that East Asia is the region with the highest incidence of HCC, we selected the Asian population as our study target and constructed a risk score prognostic model based on autophagy-related genes (ARGs) for the Asian HCC population through the analysis of the transcriptome and clinical data of the Asian HCC population in The Cancer Genome Atlas (TCGA) database. Our work may provide novel ideas for the discovery of prognostic assessment methods, new therapeutic targets, and biomarkers of HCC in the Asian population.

The workflow of this study is shown in Fig. 1.

Fig. 1
figure 1

Diagram of the study workflow

Data acquisition and analysis methods

Data acquisition

In May 2022, the transcriptome sequencing data and clinical information of 161 Asian patients with HCC were downloaded from the TCGA–LIHC project from the TCGA website (https://portal.gdc.cancer.gov/). The data included the transcriptome data, which contained the sequencing information of six normal samples and 160 tumor samples, and the clinical information detailed in Table 1. Human ARGs were downloaded from the HADB website (http://www.autophagy.lu/clustering/index.html, last update time: 2021–12-3), and a total of 206 ARGs were obtained.

Table 1 Clinical features of 106 Asian patients with HCC

Differential expression and enrichment analyses

Finding differential ARGs

After obtaining ARG information, ARG expression was extracted from the transcriptome data, and the genes with zero expression in normal and tumor samples were removed. Next, differential analysis was performed by applying the Wilcox test in R language. The filtering conditions for filtering out differential ARGs were set as follows: fdr ≤ 0.05 (where the fdr value represents the false discovery rate), | logFC|≥ 1 (logFC=  \({log}_2\frac{Mean\;value\;of\;tumor\;sample\;gene\;expression}{Mean\;value\;of\;normal\;sample\;gene\;expression}\)).

Enrichment analysis

Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses were performed on the obtained differential ARGs by using the enrichplot package in R language on the basis of the GO and KEGG databases. The GOs and KEGGs wherein each differential gene was located were identified in the backend database org.Hs.eg.db. Next, the differentially expressed genes were analyzed for enrichment by using Fisher's exact test. Then, their p values were calculated for testing. GOs and KEGGs that met the conditions were screened in accordance with p ≤ 0.05 and q ≤ 0.05 (q-values are corrected p values). The results of GO enrichment analysis were classified in accordance with biological process (BP), cellular component (CC), and molecular function (MF).

Prognostic model construction and validation

Finding prognosis-related autophagy genes

The survival package in R language was used to find prognosis-related ARGs by combining differential ARG expression with the survival time and survival status from the clinical data obtained from TCGA. Cox analysis was performed to compare survival time and survival status with ARG expression. The screening condition was set to p ≤ 0.05.

Construction and validation of the ARG-based prognostic model

The survival package in R was used to construct the ARG-based prognostic model. First, the survival time and survival state were read by the coxph function, and the Cox model was initially constructed. Then, the model was optimized by removing the genes with high correlation by using the step function. Next, the optimized model parameters were obtained by using the summary function, and the coefficients (coef), HR values, 95% CI of HR values, and p values were all output. After obtaining the model coefficients, the predict function was used to calculate the risk score of the patients (risk score = \(\sum_{i}^{n}{coef}_{i}\times {Exp}_{i}\), where Exp represents gene expression) and the median risk score based on the constructed prognostic model. The patients were divided into high- and low-risk groups in accordance with median values.

The difference analysis of the high- and low-risk groups was performed by using the survdiff function in the survival package. Kaplan–Meier (K–M) survival curves were plotted by using the ggsurvplot function to compare the survival conditions of the high- and low-risk groups. The survivalROC package was applied to plot the receiver operating characteristic (ROC) curve, and the accuracy of the prognostic model was assessed by using the area under the ROC curve (AUC). The plot function in the pheatmap package was used to plot the risk curve, and univariate and multivariate Cox independent prognostic analyses were performed to evaluate whether the risk score can be regarded as an independent prognostic factor.

Clinical correlation analysis

All clinical features from the clinical data obtained from the TCGA database were divided into two groups as follows: age: ≤ 65 and > 65 years, sex: male and female, pathological grade: G1–2 and G3–4, pathological stage: stages I–II and stages III–IV, T stage: T1–2 and T3–4, M stage: M0 and 1 (cases with distant metastases), and N stage: N0 and 1 (cases with lymph node metastasis). The ARGs involved in the model, as well as the risk score, were compared with each clinical trait and analyzed for correlation by using the beeswarm package in R. A p value less than 0.05 indicated that the ARG or risk score was associated with a specific clinical trait.

Statistical methods

All analyses were completed by using R version 4.1.0. Difference analysis was conducted with the Wilcox test, and the enrichment analysis of differential ARGs was performed by using Fisher's exact test. Cox regression analysis was employed for prognostic model construction, and K–M survival curves, as well as the log-rank test, were used to assess the survival differences between patients in the high- and low-risk groups. In addition, univariate and multivariate Cox independent prognostic analyses and ROC curves were applied to assess the predictive ability of the model.

Results

Differential ARGs

A total of 58 differential ARGs were screened through differential analysis using the WilcoxTest (Fig. 2A–B). The box plot in Fig. 2C shows the expression of differential ARGs between tumor and normal samples.

Fig. 2
figure 2

Expression of differential ARGs in normal and tumor samples. A Heatmap of the expression of 58 differential ARGs. Horizontal coordinates are sample names (the blue group is normal samples, and the red group is tumor samples). Vertical coordinates are the gene names of differential ARGs. Green represents low expression, black indicates intermediate expression, and red designates high expression. B Volcano plot of 58 differential ARGs. The horizontal coordinate is LogFC, and the vertical coordinate is − (fdr). A high absolute LogFC value indicates that a gene is differentially expressed in normal and tumor samples. Green dots represent genes that are downregulated in tumor samples. and red reflects upregulated genes. Black dots indicate genes that are not differentially expressed. C Box plot of 58 differential ARGs. The horizontal coordinate is differential ARGs, and the vertical coordinate is gene expression. Green represents the normal group, and red represents the tumor group

Enrichment analysis

GO and KEGG enrichment analyses were performed on the 58 screened differential ARGs. Figure 3A–B show the top 30 results of GO enrichment analysis classified into BP, CC, and MF. The GO function enrichment results showed that the differentially expressed ARGs were mainly enriched in functions such as macroautophagy and the regulation of autophagy. KEGG enrichment analysis demonstrated that the differential ARGs were mainly enriched in pathways such as autophagy–animal and apoptosis (Fig. 3C) [12,13,14].

Fig. 3
figure 3

GO and KEGG enrichment results of differential ARGs. A Bar chart of the GO enrichment analysis of differential ARGs. The horizontal coordinate is the number of genes enriched in each GO, and the vertical coordinate is the GO name, which is divided into three categories: BP, CC, and MF. The length of the bar represents the number of genes, and the color represents the degree of enrichment. The differential ARGs were enriched in macroautophagy, the regulation of autophagy, and other functions. B Bubble plot of the GO analysis of differential ARGs. The horizontal coordinate is the Z score. (Z score = \(\sqrt{\frac{Number\;of\;upregulated\;genes\;in\;a\;GO\;minus\;the\;number\;of\;downregulated\;genes}{\mathrm{Total\;number\;of\;genes\;enriched\;on\;this\;GO}}}\)). A Z score > 0 indicates that more upregulated genes are enriched in a specific GO, whereas a Z score < 0 indicates the opposite. The vertical coordinate represents − log(adjp-value). C Circles of differential ARGs obtained through KEGG enrichment analysis. The inner circle represents the Z score value. The redder the color, the more upregulated genes are enriched in that pathway, whereas the bluer the color, the more downregulated genes are enriched in that pathway. The middle circle shows the number of up- and downregulated genes in each pathway, and the outer circle shows the KEGG IDs. Differential ARGs are highly enriched in autophagy–animal and apoptosis

Prognostic model construction and validation

Prognostic model construction

A total of 29 prognosis-related ARGs with p < 0.001 were screened via single-factor Cox analysis (Fig. 4). An additional 13 significant prognosis-related ARGs were screened by multifactor Cox regression analysis. They included G protein subunit alpha I3 (GNAI3); FKBP prolyl isomerase 1A (FKBP1A); baculoviral IAP repeat-containing 5 (BIRC5); SH3 domain-containing GRB2 like; endophilin B1 (SH3GLB1); hypoxia inducible factor 1 subunit alpha (HIF1A); RAS homolog; MTORC1 binding (RHEB); eukaryotic translation initiation factor 2 subunit alpha (EIF2S1); member RAS oncogene family RAB1A (RAB1A); 5-aminoimidazole-4-carboxamide ribonucleotide formyltransferase/IMP cyclohydrolase (ATIC); NPC intracellular cholesterol transporter 1 (NPC1); protein kinase C delta (PRKCD); autophagy-related 4B cysteine peptidase (ATG4B); and CLN3 lysosomal/endosomal transmembrane protein, Battenin (CLN3) (Table 2). The risk score model for predicting prognosis was constructed by applying the above ARGs: risk score = (1.663674611) × ExpGNAI3 + (− 0.6607235) × ExpFKBP1A + (0.801978919) × ExpBIRC5 + (− 1.24003891) × ExpSH3GLB1 + (− 0.558780456) × ExpHIF1A + (0.743003202) × ExpRHEB + (0.994004247) × ExpEIF2S1 + (1.300632) × ExpRAB1A + (0.869938532) × ExpATIC + (− 0.730288207) × ExpNPC1 + (0.708888933) × ExpPRKCD + (− 1.092151453) × ExpATG4B + (10.04890334) × ExpCLN3. The risk scores of the patients were calculated by using the model, and the patients were divided into high- and low-risk groups in accordance with median values (Fig. 5).

Fig. 4
figure 4

Forest plot of the 29 prognosis-related ARGs

Table 2 Thirteen ARGs involved in the construction of the risk score prognostic model
Fig. 5
figure 5

Risk score curve of 161 Asian patients with HCC. The risk scores increase from left to right, with the median value at the dotted line. The green line on the left represents low-risk patients, whereas the red line on the right indicates high-risk patients

Reliability analysis of the risk score prognostic model

K–M survival curves were plotted to compare the survival time of the high- and low-risk groups. The survival rate of the low-risk group (60.6%, 95% CI: 48.5%–75.6%) was higher than that of the high-risk group (35.9%, 95% CI: 25.01%–51.40%) (Fig. 6). The mortality rate increased progressively with the increase in risk scores (Fig. 7A), indicating that the high-risk group had a poor prognosis. Figure 7B shows the heatmap of the 13 model ARGs in the high- and low-risk groups. The expression of these 13 ARGs was higher in the high-risk group than in the low-risk group. Univariate and multivariate Cox independent prognostic analyses found that the p values of the risk score prognostic model were less than 0.05, demonstrating that the risk score calculated by this model can be regarded as an independent prognostic factor for evaluating the prognosis of patients with primary liver cancer (Fig. 8A–B). Multi-indicator ROC curves were used to compare the predictive ability of the risk score (AUC = 0.877) with that of other clinical traits, such as age (AUC = 0.456), sex (AUC = 0.491), pathological grade (AUC = 0.434), pathological stage (AUC = 0.449), and TNM stage (AUCT = 0.822, AUCM = 0.517, AUCN = 0.517). The significantly higher AUC value of the risk score than that of the other clinical traits indicated that the ability of the risk score to predict prognosis was better than that of the remaining clinical traits (Fig. 9).

Fig. 6
figure 6

K–M survival curve of 161 Asian patients with HCC and primary liver cancer at different risk levels. The red curve represents the high-risk group, the blue curve represents the low-risk group, and the shading represents the 95% confidence interval. The survival rate and survival time were significantly higher in the low-risk group than in the high-risk group (p < 0.05)

Fig. 7
figure 7

Relationship of the risk score with survival time and gene expression. A Survival time. Red dots represent death, and green dots represent survival. With the increase in risk score, the survival time decreases, and the mortality rate gradually increases. B Heatmap of the relationship between the risk score and the expression of the 13 ARGs involved in the construction of the prognostic model. The expression of the 13 ARGs was higher in the high-risk group than in the low-risk group

Fig. 8
figure 8

Forest plots of Cox independent prognostic analysis. A Univariate Cox independent prognostic analysis B Multivariate Cox independent prognostic analysis The p values of the T stage, M stage, and risk score are less than 0.05 in univariate and multivariate Cox independent prognostic analyses, indicating that all of these indices can be regarded as independent prognostic factors of primary liver cancer

Fig. 9
figure 9

Multi-indicator ROC curve. The AUC value of the risk score was significantly higher than that of other clinical features

Clinical correlation analysis

The 13 ARGs involved in the construction of the model, as well as the risk scores, were compared with each clinical trait. The results showed that BIRC5 was closely correlated with T-stage and pathological grade and stage (Fig. 10A–C); HIF1A was highly related to pathological grade (Fig. 10D); and GNAI3 and NPC1 together with the risk score were strongly related to the T and pathological stages (Fig. 10E–J).

Fig. 10
figure 10

Clinical correlation analysis. The expression levels of BIRC5, GNAI3, and NPC1 and risk scores were lower in the T1–2 group than in the T3-4 group. The expression levels of BIRC5, GNAI3, and NPC1 and risk scores were lower in the stage I–II group than in the stage III–IV group. The expression levels of BIRC5 and HIF1A were higher in the G1–2 group than in the G3–4 group. The p values are less than 0.01, indicating statistical significance

Discussion

Autophagy is an ancient evolutionary mechanism that prompts lysosomes to degrade excess or potentially dangerous cellular content, such as damaged, degenerated or senescent proteins, and organelles within cells [6]. Although autophagy mainly plays an adaptive role in protecting organisms from various pathological changes, it can also be detrimental in specific environments.

A growing body of evidence shows that autophagy is closely associated with the development of HCC. The physical interaction of the autophagy adaptor p62 with the Nrf2 inhibitor Keap1 leads to reprogramming of the metabolic and stress response pathways of proliferating HCC cells [4]. HIF-1α-induced YTHDF1 expression is associated with HCC progression by promoting the translation of the autophagy-related genes ATG2A and ATG14 in a m6A-dependent manner [5]. However, the relationship between autophagy and the prognosis of HCC remains unclear.

In this study, by analyzing the clinical and transcriptome data of HCC in the Asian population downloaded from the TCGA database and human ARGs, 13 significant prognosis-related ARGs, namely, GNAI3, FKBP1A, BIRC5, SH3GLB1, HIF1A, RHEB, EIF2S1 RAB1A, ATIC, NPC1, PRKCD, ATG4B, and CLN3, were screened via multifactorial Cox regression analysis. We constructed a risk score prognostic model based on the above 13 genes. Subsequently, by plotting K–M survival curves, we found that the survival rate of the group with low risk scores was significantly higher than that of the group with high risk scores (p < 0.001). Univariate and multivariate Cox independent prognostic analyses revealed that the risk score could be used as an independent prognostic factor for HCC. The ROC curves showed that the AUC value of the risk score (AUC = 0.877) was significantly higher than that of other clinical traits, indicating that the ability of the risk score model to predict prognosis was better than that of other models.

A few of the 13 ARGs in our risk score model have been reported to be related to HCC. GNAI3 is downregulated in HCC relative to the noncancerous liver. Transwell assays indicated that GNAI3 inhibits HCC cell migration and invasion [15]. Starvation induction can promote the expression of EIF2S1 and P-EIF2S1 in HCC, and the increased expression of EIF2S1 may affect the invasion and metastasis ability of HCC [16]. ATIC inhibits the activation of adenosine monophosphate-activated protein kinase, thereby activating MTOR-S6K1-S6 signaling and supporting the growth and motility of HCC cells [17].

Although most of the remaining ARGs have not been reported to be associated with HCC thus far, some studies suggest that they may have some connections to other cancers. FKBP1A can be regulated by SNHG15, which is closely related to the occurrence of prostate cancer [18]. HIF1A can act as a tumor suppressor by preventing the expression of PPP1R1B and the subsequent degradation of the p53 protein in pancreatic cancer cells [19]. ATG4B expression is highly elevated in human epidermal growth factor receptor 2-positive breast cancer and colorectal cancer [20, 21]. BIRC5 has been characterized in several solid and hematologic tumors [22]. The allelic loss of SH3GLB1 may play a crucial role in the premalignant stage [23]. RAB1A overexpression promotes cancer cell migration, invasion, and metastasis by activating JAK1/STAT6 signaling [24].

The four ARGs RHEB, NPC1, PRK3D, and CLN3 have not been reported in previous studies on HCC or other cancers. RHEB is related to GTP binding and GTPase activity https://www.genecards.org/cgi-bin/carddisp.pl?gene=RHEB&keywords=rheb, and CLN3 is associated with unfolded protein binding https://www.genecards.org/cgi-bin/carddisp.pl?gene=CLN3&keywords=CLN3. The loss of NPC1 may play a specific role in neuronal death [25]. PRKCD can inhibit macroautophagy/autophagy by phosphorylating AKT [26].

However, the mechanism by which these 13 ARGs affect HCC prognosis through autophagy-related functions has not been validated in previous studies. Our study suggests that these genes are very likely to influence the prognosis of HCC through autophagy. Further understanding of the function of these ARGs and their role in autophagy may facilitate the development of HCC therapy.

This study has some limitations. First, the study data were obtained from public databases, the sample size was insufficient, and missing data were rare. Second, this study is a retrospective study with a series of biases, and further large-scale prospective and multicenter clinical trials are needed to verify the predictive ability of the model accurately. Third, although the survival analysis and ROC curve validation are excellent, further validation of external datasets, such as the GEO dataset, is still needed. In addition, the role of the 13 ARGs in the pathogenesis and prognosis of HCC remains unclear and must be explored and validated through additional in vitro and in vivo experiments.

Conclusion

We constructed and validated a prognostic risk score model for HCC in an Asian population. This model, which was based on 13 ARGs, including GNAI3, FKBP1A, BIRC5, SH3GLB1, HIF1A, RHEB, EIF2S1 RAB1A, ATIC, NPC1, PRKCD, ATG4B, and CLN3, can predict the prognosis of HCC. The 13 genes involved in this model have the potential to become new targets for HCC treatment. In the current era of precision medicine, this discovery undoubtedly provides new perspectives for the treatment and clinical management of patients with HCC.