Primary liver cancer, including hepatocellular carcinoma (HCC) (80% of cases), was the fourth leading cause of cancer-related death worldwide in 2018 [1]. The liver closely cross-talks with the gut and performs necessary functions related to the metabolism of nutrients, immunity, and biotransformation of bacterial metabolites, and this communication is called the gut-liver axis [2]. A complete gut-liver axis relies on an intact intestinal barrier, healthy gut microbiota, and normal liver function. The gut microbiota is known as the most important microecological system in symbiosis with the human body; moreover, several clinical characteristics (age, body mass index, dietary habits, and physical exercise) were found to differentially affect the gut microbiome of healthy Chinese people [3]. Liver diseases, such as nonalcoholic fatty liver disease (NAFLD) and liver cirrhosis, are often associated with altered gut microbiota, and it has been suggested that gut bacterial products contribute to liver carcinogenesis [4,5,6,7]. Recent studies in animal models or in patients with a background of NAFLD have indicated that some specific gut microbes promote HCC tumorigenesis through the gut-liver axis [8,9,10]. Our previous study has already suggested gut microbes to be non-invasive biomarkers for early and advanced HCC in Northwest, Central, and East China [11]. However, whether gut microbial characteristics from a large cohort of HCC patients can be used to evaluate clinical prognosis has not yet been reported, and how gut microbiota influence the transcriptome profiles of HCC remains unclear.

To explore the connections of host-microbe in HBV-related HCC, we acquired paired data on liver transcriptome and gut microbiome from patients in East China to assess whether elevated or depleted transcripts of host liver are associated with specific gut microbes (Additional file 1: Fig. S1). While it is known that a lower tumor burden always implies a better clinical outcome, the associations between gut microbiota and tumor burden in HCC patients are less well-understood. Therefore, by referring to Stage A4 of the BCLC staging classification, we divided 113 patients into small HCC and non-small HCC groups according to tumor burden [12]. More specifically, in this study, small HCC is defined as “one single tumor nodule with a maximum diameter not more than 3 cm; or no more than three tumor nodules with the sum of the maximum diameters not being more than 3 cm”. In addition, data on 5-year survival and 2-year disease-free survival were acquired to construct a random forest model and support vector machine model to further indicate the predictive effect of gut microbiota on tumor progression and recurrence/metastasis. By the simultaneous integrated analysis of the liver tumor transcriptome, gut microbiome, and clinical characteristics, we gained better insights into the nature of the gut-metabolite-liver axis in HBV-related HCC patients.


Participant information

This research was designed following the prospective specimen collection and retrospective blinded evaluation principle (PRoBE) [13]. In our previous study, a total of 281 qualified stool samples from participants (131 healthy controls and 150 HCC patients) from First Affiliated Hospital, School of Medicine, Zhejiang University were prospectively collected from November 2013 to July 2014 and subjected to 16S rRNA MiSeq sequencing; meanwhile, peripheral blood samples were acquired for routine examinations of blood, liver biochemistry, and tumor markers [11]. In September 2019, we revisited these 281 participants, and after excluding missing information and liver cancer R1 &R2 resection, 100 healthy controls and 113 HCC patients were finally enrolled. Among the HCC patients, paired tumor and adjacent non-tumor liver tissues from 32 patients (Additional file 2: Table S1: zfh001-zfh034) were finally collected based on stringent criteria. The exclusion criteria were as follows: (a) more than 7 days from stool collection to liver cancer resection; (b) the use of drugs that affect liver metabolism and function before resection, such as adenosyl methionine, ursodeoxycholic acid, polyene phosphatidylcholine, glutathione, fructose diphosphate, vitamins, and Chinese materia medica; (c) for multiple HCC tumors, a variety of types of tissue differentiation in pathological diagnosis; and (d) unqualified tumor or non-tumor specimens. The detailed criteria are listed in Additional file 3: Supplementary Methods.

Overall survival (OS) was defined as the interval between surgery and death. Disease-free survival (DFS) was defined as the interval between liver resection and the first recurrence/metastasis. OS and DFS were collected from a follow-up telephone interview or electronic medical records that were reexamined by two independent follow-up staff unrelated to this study.

16S rRNA MiSeq sequencing and operational taxonomy unit (OTU) clustering

High-throughput V3-V5 16S rRNA sequencing was performed on an Illumina MiSeq platform. The raw data of Illumina reads from 281 fecal samples were uploaded to the ENA-EMBL-EBI database under PRJEB8708 [11]. Fecal sample collection, DNA extraction, stool moisture measurement, 16S rRNA MiSeq sequencing, and OTU annotations are shown in Additional file 2: Table S2 and Additional file 3: Supplementary Methods.

Bacterial diversity and taxonomic analysis

For alpha diversity, the Shannon, Simpson, and Invsimpson indexes were calculated by the R program package “vegan” [14]. The data are listed in Additional file 2: Table S3. Nonmetric multidimensional scaling (NMDS) was conducted by “vegan” to display bacterial beta diversity using Manhattan’s method [14]. Linear discriminant analysis effect size (LEfSe) was applied to evaluate the differentially abundant taxa [15].

RNA extraction, RNA-seq, and data analysis

Frozen liver tissues were used for RNA extraction, and DNA libraries were sequenced by Illumina HiSeq 2500 (pair-end 150-nucleotide read length). For the paired tumor and adjacent non-tumor liver tissues from the 32 patients, an average of 47.92 M and 45.53 M high-quality reads were generated by RNA-seq (Additional file 1: Fig. S2A). Clean data were deposited in the NCBI Gene Expression Omnibus (GSE138485/PRJNA576155). HISAT2 was used to align the sequencing reads to the human reference sequence (UCSC/hg38.p12) [16]. The featureCounts function was performed for each gene count from trimmed reads against the GENCODE (release 30) transcript models [17]. For the data from the paired liver tissue samples of the 32 patients, gene expression levels were quantitated by edgeR (Additional file 1: Fig. S2B) [18]. Differential genes were defined as its adjusted p value < 0.05 and |log2FC| > 0.8.

For single paired tumor and adjacent non-tumor liver tissue samples from each patient, a total of 32 datasets of transcriptome profiles were calculated by GFOLD [19]. Differentially expressed genes were defined as |gfold| > 0.8 and |log2FC| > 0.8. When calculating the correlation between differential gene expression and OTU abundance, the |log2FC| values of these genes that did not satisfy the above conditions were forcibly defined as zero. The details are provided in Additional file 3: Supplementary Methods.

Correlation between OTUs and differentially expressed genes

The Pearson correlation coefficient was calculated to measure the connections between OTU abundance and differential gene expression level for each OTU-gene pair across all 32 patients. To reduce the computational load and avoid contingency, OTUs that were present in < 10% of the 32 patients were excluded, leaving 310 OTUs that constituted 95.85% of the initial abundance (Additional file 2: Table S4). A total of 1,820,940 OTU-gene pairs were calculated. The significance of each OTU-gene pair was determined based on adjusted P value < 0.05. Independent Student’s t test was applied to evaluate the difference in log2FC values calculated by GFOLD between the small HCC and non-small HCC groups. The detailed scripts of the correlation calculations are provided in Additional file 3: Supplementary Methods.

Microbial biomarker identification and prediction model construction for clinical prognosis

Based on all 113 HCC patients, random forest and support vector machine with fivefold cross-validation were executed to build prediction models of 5-year survival and 2-year disease-free survival, as implemented in the python package “scikit-learn” (version 0.21.3). The detailed scripts for OTU-marker identification and model construction are described in Additional file 3: Supplementary Methods.

Statistical analysis

Continuous variables following a normal distribution were compared by Student’s t test; otherwise, the Wilcoxon rank sum test was used. Fisher’s exact test was used to compare categorical variables in a 2 × 2 table. All statistical tests were two-sided. The P values were adjusted by the Benjamini-Hochberg correction. Variables associated with survival rate were identified by the Cox proportional hazards regression model. Statistical analyses were performed using Python and SPSS V23.0 for Mac (SPSS Inc., USA).


Summary of clinical characteristics

All 113 HCC patients with HBV infection and 100 healthy volunteers were Han Chinese individuals from East China and practiced comparable dietary habits (mixed diet) to exclude dietary differences [20]. The clinicopathological characteristics (Fig. 1a, Additional file 2: Table S1, S5, S6) of these groups [healthy controls versus HCC group, small HCC subgroup (n = 36) versus non-small HCC (n = 77) subgroup, non-cirrhotic HCC subgroup (n = 22) versus cirrhotic HCC subgroup (n = 91)] were generally matched, including age, sex, BMI, tumor differentiation, and Child-Pugh score, suggesting that there were no established confounding factors affecting group discrimination before sample collection. Serum alpha-fetoprotein (AFP) levels were significantly higher in patients with HCC than in healthy controls, but the AFP levels could not distinguish tumor size or cirrhosis in HCC patients.

Fig. 1
figure 1

Clinicopathological features and gut microbial diversity of all patients. a Clinicopathological features and clinical outcomes of all 113 HCC patients. The green dotted line represents 5-year survival; the purple dotted line represents 2-year disease-free survival. b Shannon-Wiener curves between numbers of fecal samples and estimated richness. Compared with small HCC, fecal microbial diversities, as estimated by the Shannon index (c), Simpson index (d), and Invsimpson index (e), were significantly increased in patients with non-small HCC (p = 0.048, 0.027, and 0.027, respectively; *p < 0.05, Wilcoxon rank sum test). f A Venn diagram illustrates that 541 of the total richness of 1002 OTUs were shared among three groups, while 576 out of 877 OTUs were shared between the small HCC and non-small HCC subgroups. g Beta diversity was evaluated using NMDS by Bray-Curtis. Boxplot “boxes” indicate the first, second, and third quartiles of the data

Increased gut microbial diversity in non-small HCC

Following taxonomic assignment, a total of 7,934,068 qualified sequences (median = 31,679) and 1296 OTUs were obtained (Additional file 2: Table S2 and S3). Rarefaction analysis revealed that the estimated richness of OTUs almost reached saturation in healthy controls (n = 100) and HCC group (n = 113) (Fig. 1b). Compared with their own counterparts, the fecal microbial diversities of species in each sample were significantly increased in non-small HCC (Fig. 1c–e); however, there was no difference between healthy volunteers and HCC patients. A Venn diagram illustrated that 541 the total richness of 1002 OTUs (the abundances of 294 OTUs among all participants were all zero) were shared among the three groups, while 576 out of 877 OTUs were shared between the small HCC and non-small HCC subgroups (Fig. 1f). Notably, 75 out of 1002 OTUs were unique to the non-small HCC group. For beta diversity, NMDS plots evaluated by Bray-Curtis distances revealed a symmetrical distribution of gut microbiota among all pairs of samples (Fig. 1g, Additional file 1: Fig. S3).

Phylogenetic profiles of fecal microbial communities in non-small HCC patients

The three predominant bacterial phyla in each group were Bacteroidetes, Firmicutes, and Proteobacteria, altogether constituting up to 90% of the OTUs on average (Fig. 2a, b). The average compositions of the bacterial community (top 10) at the genus level are shown in Fig. 2c and Fig. 2d. To compare the differences in the fecal microbial communities between each group, Wilcoxon rank sum test was performed at the phylum and genus levels (Additional file 2: Table S7 and S8). Compared with healthy controls, Bacteroides, Lachnospiracea incertae sedis, and Clostridium XIVa were enriched in the HCC group (all p < 0.05, Fig. 2e). When further analyzing stratified additional clinical characteristics in the HCC group, Bacteroides (p = 0.0152), Lachnospiracea incertae sedis (p = 0.0249), Clostridium XIVa (p = 0.0168), and Parabacteroides (p = 0.006) were significantly enriched in non-small HCC subgroup compared with small HCC subgroup (Fig. 2f, g); however, these three genera showed no difference in patients with cirrhotic HCC, portal hypertension, or low-albumin levels (Additional file 1: Fig. S4).

Fig. 2
figure 2

Phylogenetic profiles of gut microbes among healthy controls and all patients. Compositions of bacterial community at the phylum level between healthy controls and HCC patients (a), and patients from subgroups of HCC (b). Compositions of bacterial community (top 10) at the genus level between healthy controls and HCC patients (c), and patients from subgroups of HCC (d). e The differential microbial community at the genus level in HCC patients versus healthy controls. f, g The differential microbial community at the genus level in patients with small HCC versus non-small HCC. Levels of significance: *p < 0.05 (Wilcoxon rank sum test). h The distributions of Bacteroides, Lachnospiracea incertae sedis, and Clostridium XIVa normalized by a Z-score among healthy controls and patients with small HCC and non-small HCC. i Sankey analysis of healthy controls and patients with small HCC and non-small HCC. Error bars are presented as the SD

To discover high-dimensional biomarkers, LEfSe was applied to identify predominant bacterial taxa associated with different clinical characteristics. Bacteroides, Lachnospiracea incertae sedis, and Clostridium XIVa were significantly overrepresented [log10(LDA score) > 3] in the feces of patients in non-small HCC subgroup (Additional file 1: Fig. S5). The relative abundances of these three genera were further subjected to clustering analysis, which indicated that these three genera were abundant in non-small HCC subgroup (Fig. 2h). Sankey diagrams showed the major proportion of taxa (phylum and genus) among healthy controls and small HCC and non-small HCC patients. In participants with a high tumor burden, the proportion of these three genera gradually increased and became predominant, accompanied by the changes of other bacteria (Fig. 2i).

These data suggested that Bacteroides, Lachnospiracea incertae sedis, and Clostridium XIVa were sufficient to differentiate the microbiota of patients with small HCC from that of patients with non-small HCC. The interrelationship between 16S OTU genus clusters (Bacteroides, Lachnospiracea incertae sedis, Clostridium XIVa) and taxonomic compositions (NCBI Taxonomy ID) is listed in Additional file 2: Table S9.

Global overview of liver tumor transcriptome in HCC patients

Since HCC patients demonstrated significant associations of microbiota with clinical characteristics (tumor burden), we hypothesized that changes in the liver tumorigenesis transcriptome may be correlated with gut microbiota. Among 113 patients, paired tumor and adjacent non-tumor liver tissues from 32 HCC patients were finally collected based on stringent criteria for NGS RNA-seq. Of these 32 patients, 23 had non-small HCC. According to our definition, we identified a total of 8101 differentially expressed genes among 32 paired tumor and adjacent non-tumor liver tissues by edgeR (Fig. 3a). Furthermore, due to tumor heterogeneity and no biological duplication for a single case, we adopted GFOLD to further filter the candidate genes, in which a non-tumor specimen was used to normalize the expression of tumor specimen for each patient. Furthermore, the genes whose log2FC values calculated by GFOLD were zero in > 90% of patients were excluded, leaving 5874 genes for further investigation of their correlations with gut microbes (310 OTUs). To evaluate the predictive values of the candidate genes for OS or DFS, GEPIA was used as an independent diagnostic tool to compensate for the bias caused by the small sample sizes used in this study [21].

Fig. 3
figure 3

The associations between host liver gene expression and gut microbes in patients with HCC. a Differential expression of the microbe-associated genes from 32 paired tumor and adjacent non-tumor liver tissue samples. b Pearson correlation coefficients of 31 OTU-gene pairs, and P values evaluated by Student’s t test for comparing the difference in log2FC values calculated by GFOLD between small HCC and non-small HCC subgroups. Scatter diagrams and Cox hazards models of two typical OTU-gene pairs, OTU_0134-CD6 (c) and OTU_0002-MAPK10 (d). The x-axis of the scatter diagram indicates the OTU abundance. The y-axis indicates log2FC of gene expression calculated by GFOLD. Each point represents a patient (red: non-small HCC; blue: small HCC), and some points coincided at the origin. Immunohistochemical staining showed that CD6 (e) and MAPK10 (f) were highly expressed in tumor tissues of patients with small HCC versus non-small HCC (× 200)

Host liver transcriptome profiles influenced by gut microbiota via tumor immune microenvironment

Based on the above-described transcriptome and fecal microbiota data, Pearson’s correlation-based analysis was performed to discover microbe-associated genes and to test whether host transcriptional profiles in HCC patients could be partially influenced by gut microbiota (Additional file 2: Table S4). A total of 53 genes and 11 OTUs were identified, forming 56 OTU-gene pairs (Additional file 2: Table S10). The small and non-small HCC subgroups were clearly distinguished by the log2FC values of these 53 differential genes (all p < 0.05). Based on preliminary gene annotation and pathway analysis, 29 differentially expressed genes were identified as negatively correlated with 6 OTUs (members of Bacteroides, Lachnospiracea incertae sedis, and Clostridium XIVa), forming 31 OTU-gene pairs (median r = − 0.73, Fig. 3b, Additional file 1: Fig. S6). Figure 3 c and d show two typical OTU-gene pairs (OTU_0134-CD6 and OTU_0002-MAPK10). In 32 patients, the increased abundance of OTU_0134 was accompanied by a decrease in CD6 expression in HCC tissues, and it was clear that this type of downregulation was particularly pronounced in patients with non-small HCC. Small HCC usually implies a better clinical outcome, and Cox hazards models based on GEPIA indicated that CD6 and MAPK10 were tumor suppressors associated with a good clinical prognosis, which was consistent with the results of our study. Different expression levels at the protein level between the two groups also showed the same results as transcriptomics (Fig. 3e, f). To further validate the patterns of these 29 genes under different tumor burdens, edgeR was used to calculate their own values of log2FC and adjusted P values in 9 small HCC and 23 non-small HCC patients, respectively (Additional file 1: Fig. S7). The results showed that these 29 genes satisfied the definition of differential genes in the non-small HCC subgroup but not the small HCC subgroup.

Figure 4a describes in detail the major functional annotations of these 29 genes and the differential expression levels of each patient. According to the GEPIA dataset, 100% of the differentially expressed genes (29/29) had a low hazard ratio in HCC tissues, and 69% of the differentially expressed genes (20/29) predicted good clinical prognosis (Additional file 1: Fig. S8). Based on the GEPIA dataset, we also compared the performance of these 29 genes in cholangial carcinoma, pancreatic adenocarcinoma, colon adenocarcinoma, and rectum adenocarcinoma. Of note, their performance was consistent between HCC and cholangial carcinoma, but these genes were not associated with the prognoses of pancreatic, colon, and rectum adenocarcinoma, which indicated that the functions of these microbe-associated genes were specific in liver carcinoma. Pathway analysis revealed that these 29 genes converged on immune-related pathways (T cell receptor signaling, positive regulation of T cell proliferation, natural killer cell activation and NOD-like receptor signaling, response to molecule of bacterial origin, etc.), which are components of tumor immune microenvironment (Fig. 4b, c). GPBAR1, a member of the G protein-coupled bile acid receptor family, functions in the tumorigenesis of hepatocytes and mesenchymal immune cells [22]. CXCR6 is considered to be an important marker of NKT cells [9]. Pearson analysis was also performed to explore the correlations between these 29 genes and acute liver inflammatory factors (ALT and AST), and the results indicated that high ALT and AST levels likely do not drive an inflammatory transcriptome profile in this case (mean R2 = 0.045 and 0.023, Additional file 1: Fig. S9).

Fig. 4
figure 4

Functional annotations and pathway analysis of the microbe-associated genes. a Integrated analysis of key microbe-associated genes: functionalannotation and log2FC value of gene from each patient (n = 32, left panel), DFS (based on GEPIA, middle panel), and OS (based on GEPIA, right panel). The red line denotes the average values. Heatmaps represent the log10 (hazard ratio) value of each gene in each kind of tumor. The dark square box denotes that the clinical prognosis was statistically significant. b The GO enrichment analysis based on Metascape. c Pathway analysis based on DAVID 6.8. d Pairwise gene expression correlation analysis using the Pearson method shows interdependent relationships and gene co-expression according to GEPIA. LIHC, Hepatocellular carcinoma; CHOL, Cholangial carcinoma; PAAD, Pancreatic adenocarcinoma; COAD, Colon adenocarcinoma; READ, Rectum adenocarcinoma

To elucidate the localization and roles of these genes in tumor immune microenvironment, we evaluated their expressions in an independent single-cell map database (based on SMART-seq2) of HCC [23]. Twenty-two genes were highly expressed in CD4+ T cells, 23 genes were highly expressed in CD8+ T cells, 23 genes were highly expressed in NK cells, 18 genes were highly expressed in macrophages, and 17 genes were highly expressed in B cells (Additional file 1: Fig. S10). Interestingly, these genes also had an interdependent relationship according to GEPIA, suggesting that gut microbiota may influence the transcriptome of HCC through a common factor (Fig. 4d).

Serum bile acids: important communication mediators between gut microbiota and host transcriptome of HCC

For intra-individual correlation analyses, we focused on associations between clinical characteristics (such as all values or abnormal values of BMI, AST, and triglyceride) and gut microbiota (75 OTUs matched to Bacteroides, Lachnospiracea incertae sedis, and Clostridium XIVa). Pearson’s correlation-based analysis revealed four bile acid-associated OTUs, while no positive and meaningful result was found from the other clinical characteristics (Fig. 5a, Additional file 2: Table S11 and S12). Among these four OTUs, OTU_0002, OTU_0033, and OTU_0134 were found to be gut microbes that were associated with the abovementioned transcriptome changes in tumor immune microenvironment. Using the Virtual Metabolic Human database, we identified the metabolite distributions of Bacteroides, Lachnospiracea incertae sedis, and Clostridium XIVa, which suggested that most of them metabolized cholic acid, chenodeoxycholic acid, 3-dehydrocholic acid, 7-dehydrochenodeoxycholic acid, taurocholic acid, etc. (Additional file 2: Table S13) [24, 25].

Fig. 5
figure 5

Associations among liver transcriptome, serum bile acid, and gut microbiota. a Scatter diagrams of OTU abundance (x-axis, square root, and arcsin transformation) and serum bile acids (y-axis). The red horizontal line denotes 12 μmol/L of serum bile acid (n = 52). b Heatmap representing the Pearson correlation coefficients of 100 OTU-gene pairs (upper panel); functional annotation and log2FC of genes related to bile acid metabolism for each patient (lower panel). The red line denotes the average values of log2FC. c Immunohistochemical staining showed that ABCC4 was highly expressed in tumor and adjacent non-tumor liver tissues of patient with high level of bile acid (× 200)

Further, based on 32 pairs of liver tissue samples, 25 genes related to these four OTUs were also identified (Fig. 5b), and these genes affected liver bile acid metabolism (extra- and intra-cellular signaling, transcriptional factor, and bile acid synthesis and efflux). Immunohistochemistry confirmed that ATP binding cassette subfamily C member 4 (ABCC4) proteins were highly expressed in both tumor and adjacent non-tumor liver tissues of patients with high levels of bile acid (Fig. 5c). These results suggest that bile acids may be important mediators for the communication between gut microbiota and host transcriptome of HCC.

Identification of microbial-based markers for clinical prognosis

Among 113 patients, high tumor burden (non-small HCC) and higher bile acid levels (≥ 16 μmol/L, n = 41) indicated worse clinical outcomes (Fig. 6a, b). We assessed the power of gut microbiota to predict clinical prognosis by assessing the following OTUs: OTU_0002, OTU_0033, and OTU_0134 (associated with both tumor immune microenvironment and bile acid metabolism), OTU_0794 (associated with tumor immune microenvironment), and OTU_0030 (associated with bile acid metabolism). Random forest and support vector machine classifier models with five repeats of 5-fold cross-validation were constructed to predict clinical prognosis of the 113 patients (Fig. 6c). These OTUs had good classification performance for discriminating 5-year survival (81% ± 5% and 70% ± 6%, respectively, in RF and SVM). In addition, the AUC values reached 68% ± 7% (RF) and 70% ± 12% (SVM) for 2-year disease-free survival. The abundances and corresponding bacterial genera of these 6 OTU markers in each patient are listed in Additional file 2: Table S14 and S15. The data indicated that these OTU markers related to tumor immune microenvironment and bile acid metabolism in HCC have the potential to predict clinical prognosis.

Fig. 6
figure 6

Identification of microbial-based markers for clinical prognosis by machine learning models. Kaplan-Meier curves for 5-year survival and 2-year disease-free survival based on tumor burden (a) and serum bile acids (b). c ROC curves for classifiers designed to predict clinical prognosis (left panel: RF; right panel: SVM). Plots showed the true positive rate (y-axis) versus the false positive rate (x-axis). AUC scores (including 95% confidence interval) of ROC curves with fivefold cross-validations are listed on the right for classification accuracy


Although our study and many other studies in humans have reported that the gut community is highly differential and variable between any two individuals, the transcriptome of HCC appears to be a strong correlate of this variation in the composition of gut microbiome [26, 27]. In this study, we provided evidence that the tumor burden of HCC is associated with the presence of specific gut microbes, distinguished by the enrichment of Bacteroides, Lachnospiracea incertae sedis, and Clostridium XIVa in patients with non-small HCC [28]. We observed that patients with these three genera mounted a weaker host liver anti-tumor inflammatory response, which was confirmed by an independent database of single-cell sequencing and pathway enrichment analysis. A wide range of immune cells, such as T cells, NK cells, and monocytes, are well-known linchpins in the process of hepatocarcinogenesis [23]. Tregs, especially CD4+FOXP3+ regulatory T cells, in mouse models of colitis and allergic diarrhea can be induced by Clostridium XIVa from human microbiota [29]. In our study, inter-individual differences in gut microbiota correlated with corresponding changes in microbe-associated gene expressions in tumor immune microenvironment, which is consistent with the findings of previous reports. For example, a low level of CCL21 expressed in LAMP3+ DCs hinders CCR7+ cells, which facilitates tumor growth and indicates an unfavorable prognosis for HCC patients [30]. CD160+ NK cells had high expression of the transcription factor EOMES, which is expressed at low levels in patients with non-small HCC [31]. PIK3CD, MAP4K1, and MAPK10, which are key members of the LPS-stimulated MAPK pathway, are highly expressed in patients with high tumor burden. Previous reports in animals are consistent with our findings, in which Clostridium XIVa influences bile acid-controlled NKT cell accumulation. Bacteroides activate GPBAR1 and uncoupling protein 2 (UCP2) signaling to improve liver metabolism, indicating that metabolites may be important mediators for communication between gut microbiota and host transcriptome of HCC [9, 32, 33].

Cholic acid, deoxycholic acid, chenodeoxycholic acid, and lithocholic acid, as metabolites of Bacteroides, Lachnospiracea incertae sedis, and Clostridium XIVa in the gut, have been shown to promote liver carcinogenesis [24, 34,35,36,37]. Our data also suggested that compared to lower bile acid levels, higher bile acid levels (≥ 16 μmol/L) indicated worse clinical outcomes among patients with enrichment of Bacteroides, Lachnospiracea incertae sedis, and Clostridium XIVa (OTU2, OTU30, OTU33, and OTU134). As reported, sterol carrier protein 2 (SCP2) and fibroblast growth factor 19 (FGF19) regulated bile acid secretion, yet phospholipid transfer protein (PLTP) increased fecal bile acid excretion in mice [38,39,40]. These gut microbes and genes may be important targets for maintaining the homeostasis of bile acid metabolism in the gut-liver axis. As serum bile acids are a kind of mixture, it is essential to differentiate the chemical components for further analysis, but this is difficult to perform in a retrospective study.

In recent years, principal component analysis and k-means have been used to analyze multi-omics data. The strongest microbe-gene correlation coefficients were approximately 0.7 to 0.8 in our study, and the significant associations of many other microbe-gene pairs would not survive correction for multiple hypothesis testing (FDR) when all genes and OTUs were simultaneously analyzed. If the primary goal is to determine as many associations as possible among multiple variables, dimensionality reduction is a key workflow. Nevertheless, it must be noted that our data were evaluated against a stringent screening criterion to highlight the most significant microbes and genes, which facilitated the construction and improved the accuracy of the subsequent prognostic model. However, this did not mean that there were only a few dozen of liver tumor transcripts influenced by gut microbiota.

In our study, we identified six important OTUs that were thought to be strongly associated with bile acid metabolism and tumor burden due to changes in tumor immune microenvironment. The RF and SVM models based on these OTU markers had good prediction potential for clinical outcomes of HCC patients. Importantly, this suggests that under the premise of strict control of the treatment plan (R0 resection, but excluding acute and chronic complications), gut microbiota is a vital influencing factor for clinical prognosis of HCC patients. The predictive efficacy of the model is much better for 5-year survival than 2-year disease-free survival, suggesting that the tumorigenesis caused by gut microbiota has a long-lasting and processive association with an individual’s dietary habits, lifestyle, and even mental stress.

The strengths of our study include the large cohort of patients with clinical outcomes, whose clinical characteristics, liver tumor transcriptome, and gut microbiota data were obtained and comprehensively analyzed at the level of host gene expression, metabolites, and gut microbes. Moreover, due to the difficulty in obtaining recurrent/metastatic tumor samples and the routine applications of antibiotics after liver cancer resection, we could associate gut microbiota and host transcriptome with clinical characteristics at only one timepoint under stringent criteria.


This study revealed that changes in tumor immune microenvironment correlated with gut microbiota via serum bile acids are possibly important factors associated with tumor burden and adverse clinical outcomes in HBV-related HCC. Gut microbes can be used as biomarkers of clinical features and outcomes, and microbe-associated transcripts of host tumors can partly explain how gut microbiota promotes HCC pathogenesis. Although our study has some limitations, it provides new insights into exploring the connections of gut microbiome-transcriptome for human biomarker discovery.