Background

Liver cancer is the fourth leading cause of cancer-related deaths and the sixth most common malignancy worldwide, with more than 900,000 newly cases and an estimated 800,000 deaths in 2020 [1]. Hepatocellular carcinoma (HCC), the most common type of primary liver cancer, is mainly caused by viral hepatitis, alcohol abuse, moldy food consumption, and genetic factors [2]. The prognosis of HCC mainly contributes to early diagnosis, which ensures that effective treatment measures are taken. However, a growing number of patients have advanced stages when they are first diagnosed, bringing great challenges to the treatment of HCC [3]. High-level heterogeneity, along with the aggressive ability of HCC, induces poor prognosis, with a 5-year survival rate of only 14% [4]. Therefore, it is imperative to develop novel strategies to prolong survival time and provide guidance for the individualized treatment of patients with HCC.

EMT is a multi-step cell biology process that drives the reversible dedifferentiation of epithelial cells into a mesenchymal-like or mesenchymal phenotype to induce embryogenesis, wound healing, and tumorigenesis [5, 6]. The tumor microenvironment (TME) related to tumor cell reprogramming promotes tumor progression and drug resistance through the malignant transformation of EMT [7, 8]. Previous studies have shown that EMT strongly participated in the occurrence and progression strongly in patients of HCC [9, 10]. Currently, increasing attention has been given to a signature designed for forecasting the prognosis of cancer and exploring its underlying mechanism using public database information. For example, Zhao et al. constructed a cycle-related gene signature to identify the prognosis of gastric cancer and further explore its cell cycle mechanisms [11]. Moreover, Lin et al. showed that a prognostic signature related to inflammatory responses influences immune status and survival prognosis [12]. However, there have been limited reports on novel signature models of EMT-related genes that can predict the prognosis of patients with HCC.

In this study, we obtained the RNA-sequencing profiles and patient information of 370 HCC samples from TCGA dataset, as well as obtaining EMT-related genes from the Molecular Signatures database. Subsequently, based on EMT-related differentially expressed genes (DEGs), we manufactured a novel prognostic signature using the TCGA as the training cohort and ICGC as the validation cohort. We divided DEGs into high-risk and low-risk groups by risk score to confirm the relationship between the prognostic model, clinicopathological features, and immune microenvironment. Moreover, an enrichment analysis was performed to investigate the molecular mechanisms. Additionally, we demonstrated a connection between prognostic genes and drug sensitivity in the NCI-60 database. Finally, we verified the expression of the mRNA of these prognostic genes in the HCC cell lines.

Methods

Data collection

The RNA-sequencing profiles and patient information of 370 HCC samples were downloaded from the TCGA database; these samples were included to the training cohort (https://portal.gdccancer.gov/repository). The validation cohort data for this study were extracted from 231 patients with HCC and their corresponding clinical information in the ICGC database (https://dcc.icgc.org/). Moreover, the Molecular Signatures database was used to investigate EMT-related genes (http://www.gsea-msigdb.org/gsea/; systematic name: M5930). The results are demonstrated in Supplementary Table 1. The obtained public data comply with the database access policies and publication guidelines of the aforementioned databases.

Construction and validation of an EMT-related gene signature

The DEGs of HCC were identified by R package "limma" in tumor and corresponding non-tumor tissues of the TCGA cohort with |log2FC|> 2, and FDR < 0.05. A univariate Cox regression analysis was performed for DEGs. Patients from the TCGA cohort were assigned to the training group, whereas those from the ICGC database were assigned to the validation group in a 1:1 manner. The Lasso Cox regression algorithm and the "glmnet" R package were used to minimize the risk of overfitting, and the EMT-related gene model was established using tenfold cross-validation [13, 14]. The independent variable was candidate genes for the prognostic model in the regression analysis, and the dependent variable was overall survival (OS) or survival status in patients with HCC in the TCGA cohort. A multivariate Cox analysis was used to select candidate genes and establish an EMT-related risk score [15]; the formula was as follows: risk score = e^(…. corresponding coefficient + …. + SPP1 expression), with Coefi and Expi representing the risk coefficient and expression level of each gene, respectively. Based on the median risk score, patients were divided into the high-risk and low-risk groups. The R package "survminer" was used to estimate OS by a survival analysis in both the high-risk and low-risk groups. The R package "timeROC" was used to evaluate the prognostic value of the signature model. Moreover, univariate Cox analyses, multivariate Cox analyses, and the R package "survival" were used to determine whether the signature was an independent prognostic factor. Principal component analysis (PCA) and t-distributed random neighbor embedding (t-SNE) analyses were employed to investigate the distribution of groups using the “Rtsne” and “ggplot2” R packages.

Functional enrichment analysis

To further explore the results of the functional analysis, we determined to distinguish DEGs in the low-risk and high-risk groups using the R packages “limma”. The thresholds were as follows: Mann–Whitney test, |log2FC|≥ 1, and p value < 0.05. The R package “clusterProfiler” was utilized to analyze gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG).

Tumor microenvironment analysis

The infiltration of immune and stromal cells can be obtained by analyzing the immune and stromal scores of different tumors. The relationship between the risk, immune, and stromal scores was detected using Spearman’s correlations. A two-way analysis of variance (ANOVA) was used to explore the correlation between the risk score and immune subtypes. Spearman’s correlation was also applied to observe the correlation between stem cell-like features of TCGA transcriptome tumors and risk scores.

Drug sensitivity analysis

There are 60 different cancer cell lines from 9 different tumors comprising the NCI-60 CellMiner database (https://discover.nci.nih.gov/cellminer). The relationship between the prognostic signature of EMT-related genes and drug sensitivity was conducted using Pearson’s correlation analysis.

Cell culture

Human HCC cell lines (Huh-7 and HepG2) and human hepatic epithelial cell (LO2) were presented from central laboratory of the First Hospital Affiliated to Anhui Medical University who purchased from Procell Life Science (Wuhan, China), and cultured in high-glucose DMEM containing 10% fetal bovine serum (HyClone; VivaCell, Shanghai, China). The cells were subsequently incubated at 37 °C in a 5% CO2 incubator.

Quantitative real-time polymerase chain reaction (qRT-PCR)

Total RNA was extracted from cells using the TRIzol reagent (Takara). To quantify prognostic gene levels, a reverse transcription of cDNA was performed using the PrimeScript™ kit (Takara). Prognostic gene expression levels were measured using the SYBR Green qPCR Mix (Takara). The relative expression of prognostic genes was determined using the 2−ΔΔCt method. Primer sequences are listed in Supplementary Table 2.

Statistical analysis

The Wilcoxon test was applied to distinguish DEGs between HCC tumors and normal tissues. Different proportions were detected using the Chi-squared test. The Mann–Whitney test was used to assess ssGSEA scores of the immune microenvironment between the high-risk and low-risk groups. The R packages “venn”, “pheatmap”, “graph”, “ggplot2”, “ggpubr”, “corrplot”, and “survminer” were used to plot maps using the R software version 3.8. A one-way ANOVA was used to compare the expression between the human HCC cell line and human hepatic epithelial cells. A two-tailed p value < 0.05 was considered statistically significant.

Results

We investigated the TCGA-LIHC cohort (365 patients with HCC) and the ICGC cohort (231 HCC patients with HCC).

Identification and construction of EMT-related prognostic signature in the TCGA cohort

There are 52 differentially expressed genes between HCC tissues and normal tissues (Fig. 1a). The differential expression of 81 EMT-related genes was identified in tumor and normal tissues. Using a univariate Cox analysis, we found that 29 of the 52 genes were interconnected with OS (Fig. 1b). These 29 candidate genes were considered as prognostic factors (Fig. 1c, d). The relationships between candidate genes are shown in Fig. 1e. A prognostic signature was built for these genes using the LASSO regression.

Fig. 1
figure 1

Establishment of the EMT-related prognostic signature in the TCGA cohort. a The forest plots showing the association between 52 prognostic genes expression and OS. b Venn diagram to distinguish DEGs between HCC and adjacent normal tissues. c Heatmap of the 29 overlapping genes expression. d Univariate Cox regression analysis of 29 overlapping genes associated with OS. e The correlation network of prognostic genes signature. f LASSO coefficient profiles of 29 prognostic genes of HCC. g LASSO regression with tenfold cross-validation found ten prognostic genes using the minimum λ

A LASSO regression analysis was used to build prognostic models for the candidate genes to address collinearity. We identified a signature of 10 EMT-related genes based on the optimal cut-off threshold of λ (Fig. 1f, g). The risk score was as follows: risk score = 0.0344*(BDNF expression) + 0.0985*(COPA expression) + (− 0.0011)*(GADD45B expression) + 0.0215*(GPX7 expression) + 0.0122*(ITGB5 expression) + 0.0295*(LOX expression) + 0.0211*(MATN3 expression) + 0.1091*(MCM7 expression) + 0.1965*(MMP1 expression) + 0.0705*(SPP1 expression). Based on the median cut-off threshold, the TCGA cohort comprising patients with HCC was divided into the high-risk and low-risk groups (Fig. 2a). Furthermore, we observed that risk score of patients in the high-risk group was related to tumor grade and TNM stage (Table 1). Using PCA and t-SNE algorithms, we demonstrated that patients were dispersed in various directions in both the low- and high-risk groups (Fig. 2b). According to the scatter plot, compared with the low-risk group, the survival time of patients with HCC in the high-risk group was significantly reduced, which was consistent with the result of the Kaplan–Meier curve (p < 0.001). Besides, ROC curves were conducted to forecast the overall survival of the prognostic model, and the area under the curve (AUC) for the 1-, 2-, and 3-year OS were 0.767, 0.694, and 0.680, respectively, in the TCGA cohort (Fig. 2c). Moreover, we investigated the expression of each prognostic gene, which suggested that most genes were highly expressed in HCC tissues compared to normal tissues, yet the opposite was true for GADD45B (Supplementary Fig. 1A–J). In addition, the results of the survival analysis disclosed that the low expressions of BNDF, GPX7, LOX, MATN3, MCM7, MMP1, and SPP1 were interrelated with longer OS (Supplementary Fig. 2a–j).

Fig. 2
figure 2

Evaluation and validation of 10-gene signature in TCGA cohort and ICGC cohort. a Analysis of risk score value and distribution, OS status, and heatmap of 10-gene signature model in TCGA cohort. b The PCA plot and t-SNE analysis of risk score in TCGA cohort. c Kaplan–Meier curves and AUC time-dependent ROC curves for OS in TCGA cohort. d Analysis of risk score value and distribution, OS status, and heatmap of 10-gene signature model in ICGC cohort. e The PCA plot and t-SNE analysis of risk score in ICGC cohort. f Kaplan–Meier curves and AUC time-dependent ROC curves for OS in ICGC cohort. g, h Screening of OS-related pathological feature by multivariate Cox regression in TCGA and ICGC cohort

Table 1 Baseline characteristics of the HCC patients in different risk groups

Validation of the EMT-related gene signature in the ICGC cohort

The same calculation was applied to the ICGC cohort to validate the availability of prognostic signature of the TCGA cohort. The ICGC cohort was classified into low-risk and high-risk groups according to the median cut-off value of the risk score (Fig. 2d). Analogously, the risk group distribution, PCA algorithms, t-SNE algorithms, survival status, and Kaplan–Meier curve were also presented. The result of the ICGC queue was the same as that of the TCGA queue (Fig. 2e). Moreover, the AUC of the EMT-related signature was 0.677, 0.652, and 0.68 for the 1 -, 2 -, and 3-year OS, respectively (Fig. 2f). Taken together, our results demonstrate that the EMT-related prognostic signature model could distinguish favorable prognosis in patients with HCC.

Independent factor of the EMT-related prognostic signature

In the TCGA cohort, the risk score was associated with OS by a multivariate cox analysis, and the risk score was still an independent factor of OS, regardless of the TCGA cohort (HR = 1.742, 95% CI = 0.918–3.306, p < 0.05) or in the ICGC cohort (HR = 3.241, 95% CI = 2.114–4.969, p < 0.001) (Fig. 2g, h). Hence, combined with the risk score and clinicopathological characteristics, the EMT-related gene signature can better represent the prognosis of patients with HCC.

Clinicopathological characteristics and prognostic signature risk score

To further certify the value of the EMT-related prognostic signature, the relationship between clinicopathological characteristics and risk score was investigated. We demonstrated that the risk score was concerned with tumor stage and tumor grade in the TCGA cohort (p < 0.001), regardless of age and gender (p > 0.05) (Fig. 3a–d). And the higher the tumor stage and grade, the higher is score. Similarly, there was a higher risk score for patients having tumor stages III–IV in contrast to patients having lower stages I–II in the ICGC cohort (p < 0.001) (Fig. 3e–g). However, HCC tumor grade data were not available for the ICGC cohort.

Fig. 3
figure 3

Relationship between risk score and clinicopathologic characteristics. TCGA cohort: a Age. b Gender. c Tumor grade. d Tumor stage. ICGC cohort: e Age. f Gender. g Tumor stage

Analysis of immune status and immune microenvironment

We performed an ssGSEA analysis to observe immune cell subpopulations, immune functions, and pathways in the high-risk and low-risk groups or risk scores. The results showed that the immune cells including aDS (adipose-derived stem), macrophages, Th1-cells, Th2-cells, and Tregs were markedly expressed in the high-risk group of the TCGA cohort (p < 0.001) (Fig. 4a,b). Regarding immune function, the risk levels of APC co-stimulation, CCR (chemokine receptor), checkpoint, HLA, MHC-class-I, and parainflammation were higher in the low-risk group than in the high-risk group, while the functionality of the Type-II-INF-response was opposite to these functions (Fig. 4c, d). There were no statistically significant differences in cytolytic activity or type I IFN responses between groups. The relationship between other markers related to the immune system and the risk score was consistent in the TCGA database, except for iDCs (immature dendritic cells) and Th1-cells.

Fig. 4
figure 4

Evaluation immune status, tumor microenvironment, and immune checkpoints of EMT-related prognostic signature. a, b The scores of 16 immune cells and 13 immune-related functions were detected by ssGSEA analysis based on risk groups in TCGA cohort and ICGC cohort. c, d The scores of 16 immune cells and 13 immune-related functions were detected by ssGSEA analysis based on risk groups in TCGA cohort and ICGC cohort. e Risk score of different immune infiltration subtypes. f The correlation between risk score and RNAss, DNAss, Stromal Score, and Immune Score. g Expression of immune checkpoint genes in high- and low-risk groups. *p < 0.05; **p < 0.01; ***p < 0.001

Furthermore, we explored the association between immune infiltrates and risk score, aiming to reveal the specific role of EMT-related gene signatures in the immune microenvironment. There were six types of immune infiltrates in human tumors, namely C1 (wound healing), C2 (INF-G dominance), C3 (inflammatory), C4 (lymphodepletion), C5 (immune silence), and C6 (TGF-B dominance) [16].

Due to the absence of the C5 and C6 subtypes in patients with HCC, we detected the left subtypes. As shown in Fig. 4e, immune-infiltrating subtypes, especially the C1 and C2 subtypes, were strongly associated with high-risk scores in the TCGA database, indicating that the expression of the EMT-related gene signature has an effect on immune infiltrates in patients with HCC.

Tumor stemness was mainly composed of the RNA stemness score (RNAss) and DNA stemness score (DNAss), reflecting the dedifferentiation characteristics of cancers. Immune and stromal cells are the primary types of non-tumor components, and have been proposed to be valuable for tumor diagnosis and prognosis evaluation. Immune and stromal cell scores were calculated to predict immune cell infiltration by analyzing the specific gene expression features of immune cells and stromal cells. We found that the risk score was positively correlated with RNAss and immune score (p < 0.05), but not with DNAss (p = 0.83) or stromal scores (p = 0.84) (Fig. 4f).

The expression of immune checkpoints plays a crucial role in the prognosis and treatment of cancer. We first explored the association between the risk groups and immune checkpoints (Fig. 4g). Thereafter, we demonstrated that the expression of immune checkpoints, including PD-L1, CTLA4, CXCR2, and TLR8, significantly increased in the high-risk group in the TCGA cohort (Supplementary Fig. 3A–D). In addition, the high-risk group was positively associated with the expression of these immune checkpoints, indicating that the risk score highlights the capacity for immune evasion (Supplementary Fig. 3E–H).

Analysis of biological function and pathway

To explore the underlying functions and mechanisms of the prognostic signature, we conducted GO function and KEGG analyses based on the high-risk and low-risk groups. According to the enrichment score, the results with the top terms are presented in the bar plot and bubble plot. The GO analysis suggested that leukocyte migration, phagocytosis, and human immune response were extensively enriched in biological processes. The primary enrichment of cellular components was the immunoglobulin complex, collagen-containing extracellular matrix, and external side of the plasma membrane. Antigen, cell adhesion molecule binding, and immunoglobulin receptor binding were the first three enriched targets (Fig. 5a, b). In addition, KEGG pathways indicated that carbon metabolism, glycolysis/gluconeogenesis, and biosynthesis of amino acids were the top three enriched terms, suggesting that the prognostic signature was closely related to material metabolism (Fig. 5c, d).

Fig. 5
figure 5

Gene enrichment analysis for high-risk and low-risk groups. a KEGG pathway by barplot. b KEGG pathway by bubble plot. c Gene Ontology by barplot. d Gene Ontology by bubble plot. Verification of the expression of EMT-related prognostic genes mRNA in HCC cell line by qRT-RCR. e BDNF. f COPA. g GADD45B. h GPX7. i ITGB5. j LOX. k MANT3. l MCM7. m MMP1. n SPP1. *p < 0.05

Relationship between prognostic signature and drug sensitivity

We used the NCI-60 cell line to investigate the relationship between prognostic gene expression and drug sensitivity. These results suggest that GPX7, MATN3, GADD45B, ITGB5, BDNF, MMP1, and LOX are interrelated with the drug sensitivity of chemotherapy. On one hand, the increased expression of GPX7 and LOX was associated with the chemotherapy resistance of cancer cells to Fluphenazine, arsenic trioxide, nellarabine, erlotinib, and lenvatinib (Supplementary Fig. 4A). However, the increased expression of GADD45B, ITGB5, MMP1, and BNDF was correlated with a decreased chemotherapy resistance of cancer cells to mithramycin, tramrtinib, ARRY-162, dabrafenib, selumetinib, vemurafenib, nilotinib, coimetinib, cyclophosphamide, oxaliplatin, and tamoxifen (Supplementary Fig. 4B). Notably, an increased expression of MATN3 was negatively correlated with the drug resistance of cancer cells to eribulin mesylate, vinblastine, pipobroman, paclitaxel, and erlotinib.

Experimental verification of the prognostic signature expression in HCC cell line

To verify the expression of EMT-related prognostic signatures, including BDNF, COPA, GADD45B, GPX7, ITGB5, LOX, MATN3, MCM7, MMP1, and SPP1, in HCC cell lines, we conducted a qRT-PCR analysis to determine mRNA expression. Prognostic genes, except for GADD45B, were up-regulated in Huh and HepG2 cell lines compared to the LO-2 cell line (p < 0.05) (Fig. 5e–n). These results were in accordance with the mRNA expression profiles of these genes in the TCGA cohort.

Discussion

Liver cancer is a major public health problem in humans. Because of the strong invasion and metastatic capacity of HCC, most patients are at an advanced stage when diagnosed, resulting in a poor prognosis and a 5-year survival rate of < 15% [4]. Recently, with the rapid development of genomics, high-throughput sequencing, and other scientific technologies, the identification of genes with potential prognosis is an urgent problem. Dai et al. built an immune-related gene signature that could predict survival and immunotherapeutic effects [17]. Deng et al. constructed a prognostic signature of HCC subtypes based on a ferroptosis phenotype-related clinical-molecular analysis. This model can stratify patients for clinical decision-making [18]. EMT is the initiating factor for tumor invasion and metastasis. When EMT occurs in tumor cells, the adhesion between tumor cells is weakened, and a loss of cell polarity and cytoskeletal changes occur; this is conducive to cell metastasis [19]. Studies have shown that the poor prognosis of patients with lung, prostate and bladder cancers may be attributed to EMT [20,21,22]. Previous studies have reported that single prognostic markers associated with EMT in patients with HCC [23, 24]. However, during the occurrence and development of EMT, it acts a pivotal role in tumor invasion and metastasis not only as a differentiation marker, but also in cell morphology and cell function, which involves many gene-level changes. The prediction of the prognosis of patients with HCC based on EMT-related gene signatures has not been illustrated. Therefore, our study established an EMT-related gene signature by simulating the risk score, which can not only predict the prognosis of patients with HCC more accurately, but also provide guidance for individual treatment of patients.

The prognostic model raised in this study comprised ten EMT-related genes, including BDNF, COPA, GADD45B, GPX7, ITGB5, LOX, MATN3, MCM7, MMP1, and SPP1, which were associated with different stages of tumor progression. BDNF is a widely studied biomarker in the nervous system and has been strongly associated with depression and Alzheimer's disease [25]. Coatomer subunit α (COPA), a protein-recoding editing target, was edited differently in HCC, as first evidenced by Song et al. [26]. The further study found that under-editing of COPA facilitates HCC progression via PI3K/AKT/mTOR signaling pathway [27]. GADD45B, belongs to the growth arrest and DNA damage-inducible 45 gene family, and is related to oncogenic stress, cell cycle arrest, and apoptosis [28]. However, the role of GADD45B in cancer remains unclear. The suppression of GADD45B represses cell invasion and migration of cholangiocarcinoma by regulating EMT [29]. Evidence has indicated that the up-regulated expression of GADD45B is correlated with an advanced stage and decreased OS in patients with colorectal cancer [30], and it may be an independent prognostic factor for patients with papillary thyroid carcinoma [31]. Our results demonstrated that the expression of GADD45B was down-regulated. This may be attributed to the inhibition of GADD45B autophagy and promotion of apoptosis [32]. GPX7 has been verified to be up-regulated in HCC tissues, and its high expression was associated with grade III–IV [33]. The abnormal expression of GPX7 is related to pathological conditions, for example, the deletion of GPX7 increases the differentiation of preadipocytes and the risk of cancer [34]. ITGB5 not only regulates the biological behavior of cancer through the tumor microenvironment but also plays an important role in stemness and chemotherapy resistance in cancer cells [35, 36]. LOX is highly expressed in patients with gastric cancer, indicating poor prognosis and possibly promoting the progression of cancer cells through ECM receptor I interaction and TGF-β, Wnt, JAK–STAT, and mTOR signaling pathways [37, 38]. MATN3, also known as EDM [22], is primarily responsible for homeostasis in vivo and carcinogenesis in various tumors [39]. As a prognostic factor in patients with HCC postoperatively, MCM7-positive are sensitive to sorafenib treatment [40]. MMP1 is composed of tumor cells and stromal cells, and it is well known that the over-expression of MMP1 promotes the migration and invasion of HCC cells [41]. In addition, the role of SPP1 in HCC tissues and cells has been widely explored, and the results suggest that it may be a potential target for prognosis and treatment [42, 43]. Seven genes of the prognostic model (COPA, GPX7, ITGB5, MATN3, MCM7, MMP1, and SPP1) have been validated to be related to tumor progression, whereas the remaining three genes, BNDF, GADD45B, and LOX, need to be further analyzed. Moreover, we investigated the mRNA expression of EMT-related prognostic signatures in HCC cells using qRT-PCR. The novel prognostic model of EMT-related gene signature could be a better biomarker if the mechanisms of these genes in patients with HCC are extensively investigated.

Previous studies have elucidated that immune microenvironment plays a momentous role in tumorigenesis [44,45,46]. However, we confirmed that a high-risk score is markedly associated with immune cell infiltration and function, specifically those of Treg, Th_cell, and macrophages, compared to a low-risk score. A high-risk score was also related to APC co-simulation, CCR, and type II IFN response. The increase in tumor-associated macrophages and Treg cells was correlated with poor prognosis in patients with HCC, as evidenced by Zhou et al. [47, 48]. Moreover, Shankaran et al. reported that type II IFN response combined with lymphocytes acts as an effective extrinsic tumor-suppressor system, prevents the development of spontaneous epithelial cancer, and selects tumor cells with reduced immunogenicity [49]. Immunotherapy in patients with HCC has greatly changed the mode of cancer treatment, especially immune checkpoint inhibitors [50]. We found that the expression of the majority of immune checkpoint genes was up-regulated in the high-risk group compared to the low-risk group and positively correlated with the risk score. Currently, programmed cell death protein 1 (PD-1) or its ligands PD-L1 and cytotoxic T-lymphocyte-associated protein 4 (CTLA-4) are approved for the treatment of cancer patients. Particularly, the activation of helper CD4 + T cells and initiation of the immune response mainly depend on CTLA-4 [51, 52]. In contrast, we investigated the relationship between immune infiltration subtype and risk score. Results have suggested that a high-risk score was related to subtypes C1 and C2, whereas a low-risk score was related to subtypes C3 and C4, indicating that subtypes C1 and C2 had worse prognosis than subtypes C3 and C4 in patients with HCC. Therefore, this study strongly verified the value of this signature for forecasting the HCC immune microenvironment.

This study has some limitations. First, to our knowledge, the clinical information contained in both the TCGA and ICGC databases is limited. Second, because this study was retrospective, the reliability of this model needs to be confirmed by prospective cohort studies in the future. Third, although the prognostic model is a reliable way to judge the prognosis of HCC, the potential mechanisms of these ten EMT-related genes and HCC need to be further verified by in vivo and in vitro experiments.

Conclusion

We formed a novel signature of EMT-related prognostic genes that strongly predicted the prognosis of patients with HCC. The signature was proven to be absolutely linked to OS and clinicopathological characteristics in the training and validation cohorts. Furthermore, we found that the risk score structured by the model plays an essential role in the tumor microenvironment, functional enrichment analysis, and drug sensitivity. However, further prospective and multicenter cohorts to verify whether this signature contributes to the individualized treatment of patients with HCC are warranted.