Background

Hepatocellular carcinoma (HCC) is the sixth most prevalent cancer and the second most lethal tumour worldwide; its incidence continues to rise [1]. Prognosis and treatment options for HCC are poor and mainly dependent on tumour stage [2] and histological grade, which reflects tumour biology [3]. High-grade HCC can develop from low-grade HCC in a course of several months in 75% of patients [4]. Notably, cancer tissues with different histological grades can be frequently found in individual HCC nodules [5], a phenomenon that provides a ‘nodule-in-nodule appearance’. It may be assumed by these histological findings that there is a stepwise development from low- to high-grade HCC based on molecular mechanisms that are still unknown. Likewise, a stepwise increase in chromosomal instability and aneuploidy occurs during progression from well differentiated to moderately or low differentiated HCC [6]. Chromosomal instability predicts drug resistance and poor prognosis in multiple cancer types, but the causes of chromosomal instability in HCC and other tumour types are still poorly understood [7, 8]. By gene expression profiling, HCC may be divided into two main groups: one with upregulation of genes responsible for cell proliferation and anti-apoptosis and poor outcome, and the other with the opposite findings [9, 10].

Although a number of studies have been performed using individual analytical methods, a combined approach for the investigation of HCC at histological, genomic and gene expression levels with a bioinformatics search for transcriptional regulators and master regulators of signal transduction pathways is rare. Therefore, the aim of our study was to gain a closer insight into the network of histological alterations, chromosomal aberrations and gene expression profiles and apply bioinformatics tools to find possible key players in the process of malignant transformation. For this purpose, we analysed 23 HCC samples of different histological grades and adjacent non-tumourous (NT) liver tissues with regard to the above-mentioned aspects. With this approach, we identified different genes and gene regulatory networks linked to distinct biological features observed in HCCs with low and high proliferative profiles. These findings provide additional candidate genes for the development of novel, subtype-dependent options for HCC therapy. Based on our results and comprehensive analysis of available published data, we also hypothesise key molecular events that underlie HCC onset and progression.

Methods

Tumour samples

The samples for this study comprised tissue from partial hepatectomy of 15 HCC patients treated at the Department of Abdominal Surgery of the Inselspital (Bern, Switzerland) in 2002–2008 (clinicopathological data in Additional file 1). Twenty-three tumour samples were collected from these surgical specimens, as well as 17 NT samples adjacent to the carcinoma nodules. Tissue samples were formalin fixed and paraffin embedded following standard procedures. Edmonson–Steiner grading [11] was applied using haematoxylin and eosin (H&E) stained slides. For molecular analyses, tumours and paired NT tissues were macro-dissected after defining the regions of interest by an experienced pathologist to ensure a content of tumour cells of at least 75%.

Gene expression analyses and data evaluation

Total RNA was isolated using the Qiagen AllPrep DNA/RNA FFPE kit strictly according to the manufacturer’s protocol. DNA was checked for contamination with qPCR using Brilliant SYBRGreen QPCR Master Mix (Stratagene) and qPCR normalisation primers for single-copy regions of non-coding genomic DNA from the SideStep II Cell Lysis Analysis Kit (Agilent). From the kit, Primer Set 1 (233 bp) and Primer Set 2 (244 bp) were used, and the PCR cycling program was 95 °C for 10 min followed by 40 cycles of 95 °C for 30 s, 60 °C for 30 s, and 72 °C for 30 s. PCR reactions were run on a Versant kPCR real-time PCR instrument (Siemens) with MxPro™ QPCR Software for the Mx3000P/3005P QPCR System (Stratagene). Analysis of RNA quality was performed using an Agilent RNA 6000 Nano Assay on an Agilent 2100 Bioanalyzer. RNA integrity numbers for the isolated RNAs were 2.1–2.5.

Further procedures strictly followed the Gene Expression FFPE Workflow v. 2.0.1 protocol developed by Agilent Technologies. Briefly, a complementary DNA (cDNA) library was generated from the total RNA using the TransPlex™ Whole Transcriptome Amplification Kit (Sigma). cDNA was labelled with Cy3 using the Genomic DNA ULS Labeling Kit (Agilent) and purified with the Agilent KREApure column. Next, 500 ng of Cy3-cDNA was hybridized to one-colour Agilent SurePrint G3 Gene Expression v2 8 × 60 K Microarray (G4851B). The array contained 50,559 probes covering over 40,000 transcripts, including lncRNAs and transcripts of unknown coding potential (TUCPs). Hybridization, washing, and scanning procedures were performed exactly as described in the One-Colour Microarray-Based Gene Expression Analysis Protocol, v. 6.6 (Agilent). Washed arrays were immediately scanned on the Agilent C scanner and Agilent Feature Extraction software v. 11.0.1.1 was used to extract signal intensities from images.

Microarray data evaluation was performed using the United States Food and Drug Administration’s genomic tool ArrayTrack with embedded PCA and HCA [12]. Raw gene expression values were normalized using the 75th percentile of the overall signal value, as recommended by the manufacturer, and log2 transformed prior to further statistical and bioinformatic analyses. Genes with significant differences in their expressions were identified by one-way ANOVA in multiple-group testing, or by unpaired Welch’s t test in pair-wise comparisons. For multiple probe sets for one gene, the median value was calculated.

For functional analysis of differently expressed genes, we employed Pathway Enrichment Analysis implemented in ArrayTrack which is based on canonical pathways supplied by the KEGG database. Fisher’s exact test was applied to identify biological pathways with statistically significant enrichment in regulated genes, the cut off being P ≤ 0.05.

Bioinformatics search for transcriptional regulators and master regulators of gene networks

To identify potential transcriptional regulators and master regulator molecules, we applied promoter analysis and upstream network analysis of differentially expressed genes provided by the GeneXplain platform (https://genexplain.com/) with integrated MATCH™ tool [13], TRANSFAC® [14] and TRANSPATH® [15] databases (release 2.4.1).

Gene promoters were defined as sequences from − 1000 to + 100 relative to the transcription start sites (Ensemble database). Promoter sequences of co-regulated genes were searched for potential transcription factor binding sites (TFBSs) using MATCH™ tool and TRANSFAC® library of positional weight matrices. The frequency of putative TFBSs in the genes differentially expressed in tumours (Yes set) was compared to the corresponding value for the default set of 300 human housekeeping genes of which expression is kept unchanged (No set). Transcription factors with overrepresented binding sites in the Yes set versus No set (ratio > 1) were considered as potential regulators.

Master regulator molecules in the signal transduction pathways upstream of the dysregulated genes were identified using the curated database of pathways and protein interactions TRANSPATH® with default parameters. Molecules with the lowest Rank Sum values were considered as the most likely candidates.

Real-time polymerase chain reaction (qPCR)

For validation of gene expression data obtained with microarrays, we employed qPCR for selected genes. Equal amounts of cDNAs from 12 NTs, 8 GE1-HCCs and 12 GE2-HCCs were pooled into the corresponding groups, and 10–20 ng of each cDNA pool were used to perform three independent qPCR experiments, each experiment in triplicate. We used a Versant kPCR real-time PCR instrument (Siemens) with the MxPro™ QPCR Software for Mx3000P/3005P QPCR System (Stratagene). PCR assays were performed for MKI67, HJURP, KPNA2, XPOT and TSNARE1 (internal control) with predesigned TaqMan® Gene Expression Assays and the TaqMan® Gene Expression Master Mix (Applied Biosystems) according to the manufacturer’s protocol. Assays for non-coding RNAs H19, HNF-AS1, SNAR-A3, SNAR-B2, SNAR-H were performed with Brilliant II SYBR® Green QPCR Low Rox Master Mix (Agilent) according to the manufacturer’s ‘Recommended Protocol with Two-Step Cycling (All Targets)’ with subsequent ‘Dissociation Program for All Targets’. The SNAR gene primers were derived from Parrott et al. [16]. We designed H19, HNF-AS1 and TSNARE1 (an internal control) primers with the NCBI Primer-BLAST tool (https://www.ncbi.nlm.nih.gov/tools/primer-blast/) and purchased them from Biomers (Ulm, Germany). The sequences are as follows: SNAR-A3, F 5′-AGCCATTGTGGCTCAGGC-3′, R 5′-TTTTCCGACCCATGTGGACC-3′; SNAR-B2, F 5′-GCCATTGTGGCTCCGGC-3′, R 5′-AACCCATGTGGACCAGGTTG-3′; SNAR-H, F 5′-CCACTGTGGCTCCGGC-3′, R 5′-AATGTGGACCAGGTTGGCCT-3′; H19, F 5′-GACGTGACAAGCAGGACATGA-3′, R 5′-TAAGGTGTTCAGGAAGGCCG-3′; HNF1A-AS1, F 5′-ACTAAAATTCGGGCGAGGCA-3′, R 5′-GACTGGCTGAAGGGACACTC-3′; and TSNARE1, F 5′-GAAGAAAATTGCAGAAAAGTCCAGA-3′, R 5′-GTCACTCCCGTTAAAGACCTTC-3′. All PCR products were checked with an agarose gel. We used the comparative threshold cycle method (ΔΔCt) to quantify the relative amount of product transcripts. ∆Ct values = ((Ct target gene)—(Ct reference gene)) from three independent experiments were evaluated with t test to identify the statistically significant differences between the means of the sample groups. We calculated the FC for each gene as 2−∆∆Ct values, where ΔΔCt = ΔCtHCC − ΔCtNT.

Array-based comparative genomic hybridisation (aCGH)

aCGH was performed exactly as previously described [17]; the detailed protocol is also freely accessible in the ArrayExpress database under E-MTAB-8886. Cy5-labelled genomic DNA from tissue samples and Cy3-labeled human male genomic DNA (Promega, Germany) as a reference were hybridised to Agilent G4450A arrays. We normalised, visualised and analysed data with the Agilent Genomic Workbench v7.0 software using ADM-2 algorithm. We set cut-offs for chromosomal amplifications/deletions to the minimum number of probes in aberration interval ≥ 3 and average absolute log2 ratio ≥ 0.2, which corresponds to the presence of single copy aberration in about one third of analysed cell population.

Nuclear size measurement

Nuclear size measurement was done using Leica Application Suite, v.3.7.0 on 2-µM thick H&E-stained tissue sections in the same tumour and non-tumour regions used for molecular analyses. At least 100 cells were measured.

Fluorescence in situ hybridisation (FISH).

FISH was performed on 4 µm thick tissue sections using locus-specific probes for centromeres, CEP1, CEP3, CEP7, CEP8, and CEP17 (all supplied by Abbott, Wiesbaden, Germany) according to the supplier’s instructions. An Axio Imager microscope (Zeiss, Jena, Germany) was used for evaluation of fluorescent signals. A total of 100 nuclei were evaluated in each sample for calculation of the mean value.

Results

Gene expression analysis divided HCC into two distinct groups: GE1 and GE2

Analysis of 23 HCC and 17 NT samples (Additional file 1) yielded 458 differently expressed probes with a mean absolute fold change (FC) ≥ 2 and false discovery rate (FDR) ≤ 0.05. These probes represented 321 unique genes: 131 were upregulated and 190 were downregulated in tumours (Additional file 2). Principal component analysis (PCA) (Fig. 1a) and hierarchical clustering analysis (HCA) (Fig. 1b) based on the gene expression of 458 probes divided all samples into two main groups of 13 HCC samples and another group of 10 HCC and 17 NT samples. The full image for HCA analysis with gene expression values is shown in Additional file 3.

Fig. 1
figure 1

Two distinctive subtypes of hepatocellular carcinoma (HCC) were defined by a principal component analysis and b hierarchical cluster analysis. Gene expression values were analysed from 23 HCC samples of various histological grade, as well as 17 surrounding non-tumourous (NT) liver samples. We selected probe sets differentially expressed between 23 HCC samples and 17 NT samples, using a mean absolute ratio ≥ 2 and false discovery rate ≤ 0.05 for the analyses. Based on the results of the analyses, 10 HCC samples with gene expression patterns similar to NT tissues were assigned to the GE1-HCC subgroup, while 13 HCC samples that clearly demonstrated a different gene expression pattern from both GE1-HCC and NT samples were assigned to the GE2-HCC subgroup

Gene expression patterns of 10 HCC samples (GE1-HCC) were rather similar to the NT tissues. In contrast, gene expression pattern of 13 HCCs (GE2-HCC) clearly differed from GE1-HCC and adjacent NT tissues. After analyses, one tumour sample, HCC-17, was re-classified as a cholangiocellular carcinoma (CCC). The expression profile of this sample was very close to GE1-HCC.

To identify genes specifically deregulated in the two HCC subtypes, we performed three pair-wise t tests—GE1-HCC versus NT, GE2-HCC versus NT and GE2-HCC vs GE1-HCC—and multiple comparison of GE1-HCC, GE2-HCC and NT groups using one-way analysis of variance (ANOVA). The cut-offs for a significant change in gene expression for GE2-HCC versus NT and GE2-HCC versus GE1-HCC were FDR ≤ 0.05 and mean absolute FC ≥ 2. For the GE1-HCC versus NT comparison, we applied a less stringent cut-off, namely P ≤ 0.01, instead of a more stringent FDR threshold to recover enough differentially expressed genes for further analyses. All genes identified in the three pairwise t tests showed a significant change in a multiple-group testing (P ≤ 0.05), with the exception of two genes. Additional files 4–6 list the details for each gene.

In general, there was a strong perturbation in gene expression with regard to the number of regulated genes as well as expression fold changes in the GE2-HCC group. In contrast, in GE1-HCC there were only moderate gene expression changes and far fewer differentially regulated genes compared with surrounding NT tissue. In GE1-HCC, there were 209 differentially expressed probes with a mean absolute FC in the range 2 to 4.9, despite applying the less stringent P value cut-off. These probes represented 141 unique genes: 26 upregulated and 115 downregulated. In GE2-HCC, 1254 probes were differentially regulated in comparison with NT adjacent tissues, with an absolute FC from 2 to 31.7. They represented 859 unique genes: 395 upregulated and 464 downregulated. Analysis of the differentially expressed genes in tumours showed that a part of them were significantly regulated in both groups of tumours (Fig. 2a).

Fig. 2
figure 2

Comparison of differentially expressed genes and deregulated pathways in the GE1 and GE2 hepatocellular carcinoma (HCC) subgroups. a The two gene lists resulting from the pair-wise t tests—10 GE1-HCC versus 17 NT and 13 GE2-HCC versus 17 NT—were compared via Venn Diagram analysis. b Pathway enrichment analysis was performed using the same two gene lists, and the Kyoto Encyclopedia of Genes and Genomes (KEGG) was used to identify pathways with the significant enrichment in the deregulated genes (Fisher’s P value ≤ 0.05) in GE1 and GE2 HCC subgroups

The direction of the common gene expression changes, i.e. activation or repression, was concordant in both tumour groups, with the exception of long non-coding RNA H19, which was repressed in GE1-HCC and increased in GE2-HCC. Differentially expressed genes in GE1-HCC and GE2-HCC were further used for analyses of functional pathways, gene networks and gene promoters. Pathway enrichment analysis (Fisher’s exact test) revealed five pathways specifically deregulated in GE1-HCC (Fig. 2b). In contrast, 26 pathways were affected in GE2-HCC. The dysregulated pathways in GE1-HCC were linked to immune response and NAD metabolism, both reported to be involved in cancer [18, 19]. The prominent feature of GE2-HCC was strong perturbation in the cell cycle, in particular DNA replication and the p53 signalling pathway. Additionally, cytochrome P450–catalysed drug metabolism, retinoic acid metabolism, fatty acid metabolism, cysteine and methionine metabolism and the metabolism of several other amino acids were specifically affected in GE2-HCC. Pathways related to cancer and immune diseases were also significantly changed in GE2-HCC. There were only two commonly regulated pathways in both HCC subgroups: complement and coagulation cascades and arachidonic acid metabolism. These pathways are involved in immune defence and inflammation and reportedly have a significant role in the development and progression of numerous malignancies [20, 21]. In general, the functional analysis of genes regulated in tumours and associated pathways indicated that GE2-HCC tumours exhibited a high proliferative capacity and strongly perturbed cell metabolism. They likely represent a much more aggressive phenotype compared with GE1-HCC tumours.

Array-based comparative genomic hybridisation (aCGH) revealed deletions to be typical for GE2-HCC

Numerous chromosomal aberrations were identified, including both gains and losses, which were shared by 30–81.8% of the HCC samples; some of these chromosomal aberrations were also found in the adjacent non-tumorous tissue (Table 1, Additional files 7–8). There were no frequently occurring deletions in NT samples, but some gains, e.g., 7q, 8q, 11p, and 19p, were observed in 50%-69.2% of non-tumorous tissues, derived from both GE1 and GE2 groups.

Table 1 Chromosomal aberrations in hepatocellular carcinoma (HCC) tumours compared to adjacent non-tumourous tissues (NT)

When analysing the GE1-HCC and GE2-HCC subgroups separately, obvious differences in their genomic patterns are observed (Table 2, Additional files 9–10). Most gains were found repeatedly in both the GE1-HCC and GE2-HCC subgroups. In contrast, losses were typical for the GE2-HCC subgroup, in which they occurred in 40–80% of the samples. Some loci, e.g., 1p, 16q, and 17p, which demonstrated recurrent gains in GE1 samples showed recurrent losses in the GE2-HCC subgroup. In the GE1-HCC subgroup, the only repeatedly identified loss was at 8p, which was shared by 30–50% of the samples.

Table 2 Chromosomal aberrations in hepatocellular carcinoma (HCC) subgroups with a low (GE1) and high (GE2) proliferation gene expression profile

Some amplified chromosomal loci encode genes with tumour-promoting functions that were strongly upregulated in tumours, as denoted by gene expression analysis (Table 3).

Table 3 Selected upregulated genes with oncogenic functions located in amplified chromosomal regions

Nuclear size was significantly increased in tumours, most notably in GE2-HCC

Nuclear size, which is an important diagnostic criterion for tumour grading, was significantly larger in all HCC samples when compared to the adjacent NT tissues (Fig. 3). The median nuclear size in NT tissues was 7.44 µM, which falls in the range reported for the average normal hepatocyte nucleus of 7–8 µM [22]. Meanwhile, the median nuclear size values in the GE1-HCC and GE2-HCC subgroups fell above this range, 8.49 µM and 10.50 µM, respectively. The interquartile ranges were 0.4 µM, 1.4 µM, and 1.8 µM for the NT, GE1-HCC and GE2-HCC groups, respectively, indicating a much more variable nuclear size in the HCC groups.

Fig. 3
figure 3

Nuclear sizes in the GE1 and GE2 hepatocellular carcinoma (HCC) subtypes and non-tumourous (NT) tissue. Nuclear size (µM) was determined in NT tissue (n = 10), as well as in two distinctive subgroups of HCC that were defined by gene expression profiling, GE1 (n = 10) and GE2 (n = 11); 100 cells per sample were evaluated. The box plot diagram shows the distribution of nuclear sizes in each tissue group, where the horizontal line demonstrates median nuclear size, the boxes span the interquartile range, and the “antennas” represent the minimum and maximum observed values

GE2-HCC showed high-level aneuploidy

Fluorescence in situ hybridisation (FISH) for the five selected chromosomes revealed a mean number of signals in GE1-HCC close to NT samples, with the highest value of 2.73 (Table 4). In contrast, GE2-HCC, which typically also had a higher histological grade and enlarged nuclei compared with GE1-HCC, were highly aneuploid, showing a variable increased number of chromosomes (up to 5.44).

Table 4 Fluorescence in situ hybridisation (FISH) analyses of selected chromosomes and histological grade in GE1-HCC and GE2-HCC

Notably, histological grading and molecular group assignment did not match completely. Whereas all grade 1 HCC samples were assigned to GE1-HCC, based on gene expression profiling, and all grade 3 or 4 HCC samples were allotted to GE2-HCC, grade 2 tumours were in both molecular groups. Furthermore, GE2-HCC included more samples with a hepatitis background, although there was no pivotal difference between the two groups. Overall, 77% of GE2-HCC and 56% of GE1-HCC samples (1 sample had unknown status) were derived from patients with positive hepatitis serology.

Validation of microarray gene expression changes by real-time polymerase chain reaction (qPCR)

Microarray gene expression data were confirmed by qPCR analyses for selected genes and non-coding RNAs that were overexpressed in the GE2-HCC subgroup. (Table 5). There was good concordance between qPCR and microarray data for most genes; in particular, the strong up regulation of these genes in GE2 HCC was confirmed. In a number of cases, the fold changes in gene expression obtained via the qPCR method were much larger than those identified via microarray analyses, possibly due to the higher sensitivity of the qPCR assay, especially in cases of genes expressed at low levels. For example, in the case of both MKI67 and HJURP, microarray analysis revealed significant increases in expression in the GE2-HCC subgroup only, whereas qPCR analysis detected upregulation in both of the HCC subgroups, although a much higher relative level was detected in the GE2-HCC subgroup as well.

Table 5 Validation of microarray gene expression data by real-time polymerase chain reaction (qPCR) analysis

Master regulator molecules of gene networks in GE1-HCC and GE2-HCC

Master regulator molecules in the signal transduction pathways, at a distance of up to 10 steps upstream of the dysregulated genes, were identified for up- and downregulated genes in the GE1-HCC and GE2-HCC subgroups (Table 6, Additional files 11–14) using the curated database of pathways and protein interactions, TRANSPATH®, integrated in the GeneXplain platform. The top three master regulators identified for the 26 genes upregulated in the GE1-HCC subgroup included two cell-surface proteins involved in pro-survival signaling in the liver and E2F1, a key regulator of the cell cycle. Moderate and strong overexpression of E2F1 was observed in the GE1-HCC and GE2-HCC subgroups, respectively. The top three master regulator molecules identified for the 395 upregulated genes in the GE2-HCC subgroup comprised essential regulators of progression through the mitotic phase of the cell cycle, and overexpression of the genes coding these proteins was specifically observed in the GE2-HCC subgroup.

Table 6 Top three master regulators upstream of dysregulated genes in GE1-HCC and GE2-HCC

The top regulatory molecules identified in the networks of genes downregulated in the GE1-HCC (115 input genes) and GE2-HCC (464 input genes) were related to reduced anti-tumour immune response and inflammation.

Transcriptional regulators of gene networks in GE1-HCC and GE2-HCC

A bioinformatic search for potential transcription factor binding sites (TFBSs) using the MATCH™ tool and TRANSFAC® library of positional weight matrices integrated in the GeneXplain platform identified a number of transcription factors potentially involved in the coordinated regulation of gene expression in the GE1-HCC and GE2-HCC subgroups (Additional files 15–18). The majority of the top potential transcriptional regulators identified have already been found to play roles in various cancers, including HCC (Tables 7 and 8), thus demonstrating the validity of our in silico findings.

Table 7 Top* TFBSs enriched in genes upregulated in GE1-HCC
Table 8 Top* TFBSs enriched in genes upregulated in GE2-HCC

The results in GE1-HCC (Table 7, Additional file 15) focus at first on androgen receptor signalling (AR) (Yes/No ratio = 14.1), whereas its role in GE2-HCC seems to be low, as indicated by a Yes/No ratio of 1.1 (Additional file 16). Furthermore, expression data showed AR downregulation (mean FC = -3.22) in advanced GE2-HCC (Additional file 5).

Upregulated genes in GE1-HCC also showed enrichment in TFBSs for progesterone (PGR). Unlike AR, PGR binding sites enrichment was found also in GE2-HCC (Table 8). We also identified potential transcriptional factors with known oncogenic roles in HCC, e.g. ZEB1, or in other cancer types, e.g. HES1, NR6A1, PATZ1, PAX4 and MTF1, in GE1-HCC. A novel finding was the transcription factor ZNF219; its binding site showed a high Yes/No ratio of 4.4 and was found in promoters of more than half of upregulated genes in GE1-HCC.

In GE2-HCC, top potential transcriptional activators included transcription factors with reported tumour-promoting activity in HCC and/or other cancer types, e.g., NKX6-1, POU2F1/2, cJUN, E2F1, HSF2, SRF, TFCP2 and NFY (Table 8).

Some transcription factors seem to have an important role in activation of gene expression in both GE1-HCC and GE2-HCC. These were general transcription factors (GTF2A1/2) with a role in general enhancement of transcription and PGR, as mentioned above. A few transcription factors with high rank in one group (Tables 7 and 8) might also play a less significant role in the other group, where they were ranked lower (Additional files 15 and 16). This phenomenon relates to ZEB1 and PATZ1, which binding sites were highly presented in genes upregulated in GE1-HCC but less so in GE2-HCC. By contrast, the rank of E2F1, GABP and TFCP2 in regulation of gene networks in liver carcinomas strongly increased from GE1-HCC to GE2-HCC.

Discussion

As described by Edmondson and Steiner in 1954, HCC includes a heterogeneous group of carcinomas [11]. Morphological dedifferentiation is accompanied by increased genomic instability, in particular chromosomal instability, and a worse prognosis [6,7,8]. In this study, we demonstrated that molecular tumour typing based on gene expression profiling correlated to histological grade and chromosomal instability—defining two main types of HCC—and may be used for detection of molecular markers to improve HCC therapy options.

DNA level/chromosomal instability

Previously reported HCC chromosomal changes include gains of 1q, 6p, 8q, 17q and 20q and losses of 1p, 4q, 6q, 8p and 13q. Furthermore, 4q and 13 q deletions have been repeatedly correlated to HCC dedifferentiation [17, 60,61,62]. We also observed these characteristic aberrations in our set of hepatic malignancies. Amplification of genomic material could contribute to ectopic expression of genes with essential roles in cell growth, survival, and proliferation (Table 3). Notably, while gains of genetic material were observed in both the GE1-HCC and GE2-HCC subgroups, losses were observed mostly in the GE2-HCC subgroup (i.e., high-grade tumours). We assume that the loss of genetic material observed in GE2-HCCs is a result of severely dysregulated control of mitotic mechanisms and chromosomal missegregation resulting from progressive malignant derailment. Interestingly, recurrent gains at 7q, 8q, 11p, 19p, and 19q were also observed in the adjacent NT tissues from both HCC sample groups and most likely represent initial cancer-predisposing events.

Similar to the results in our previous study of childhood HCCs [17] we observed recurrent gain of chromosome 19 in GE1-HCC (100% frequency) and GE2-HCC (60% frequency) as well as in 50% of adjacent NT tissues. This finding may indicate a particular role for this chromosomal aberration in the development of liver carcinogenesis.

RNA level

Overall, gene expression analysis revealed two main HCC groups: GE1-HCC and GE2-HCC, with GE1-HCC closer to NT than to GE2-HCC—the latter typically having a higher histological grade and higher level of chromosomal instability (as described above). There were a number of dysregulated genes in GE1-HCC and GE2-HCC, suggesting their essential role in the initiation and development of liver carcinogenesis. By contrast, other genes that were only dysregulated in GE2-HCC indicated their specific roles in cancer progression.

Genes upregulated in GE1-HCC and GE2-HCC

The genes significantly dysregulated in GE1-HCC and GE2-HCC revealed changes in the same direction; most of the dysregulation was more pronounced in GE2-HCC. The only exception was the long non-coding RNA (lncRNA) H19, which is frequently reported as dysregulated in cancer including HCC, sometimes up- and sometimes downregulated [63, 64]. In our samples, H19 was highly expressed in NT, downregulated in GE1-HCC and upregulated in GE2-HCC. These findings support the notion about the essential but context-specific role of H19 during all steps of liver carcinogenesis. Most of the genes upregulated in both GE1-HCC and GE-2 HCC have been reported to exert an oncogenic action. For example, the lncRNA HNF1A-AS1 is functionally involved in various carcinomas including HCC [65,66,67,68,69]. Another example is MDK (encodes a secreted growth factor): its upregulation correlated with a worse prognosis and chemotherapy resistance in diverse malignant tumours including HCC [70,71,72]. MDK-targeted molecular therapy approaches are under investigation [73]. Other genes with increased expression in GE2-HCC compared with GE1-HCC included transcription factor E2F1, with a crucial role in cell cycle regulation, BIRC5 (survivin), an inhibitor of apoptosis, and helicase RECQL4, essential for maintaining genomic stability. The overexpression of these genes is critical for the development of various malignancies (including HCC) [47, 74,75,76,77,78,79]. Moreover, there have been reported direct interactions of E2F1 and retinoblastoma protein (RB) with the promoter of RECQL4 in prostate cancer cells [77] and RECQL4 with BIRC5 in breast cancer cells [78]. These data suggest a ↑E2F1 → ↑RECQL4 → ↑BIRC5 axis in oncogenic derailment. A new finding in our study was upregulation of C1orf35, a novel potential oncogene [80]. To our knowledge, our study is the first to show its overexpression in solid tumours. Notably, we found the genomic loci coding for H19 (11p15.5), RECQL4 (8q24.3), E2F1 (20q11.2), BIRC5 (17q25.3) and C1orf35 (1q42.13) to be frequently amplified in both GE1-HCC and GE2-HCC. Furthermore, loci 11p15.5 and 8q24.3 were frequently amplified in NT tissues. These findings suggest the role of gene amplification in the overexpression of these genes.

Genes downregulated in GE1-HCC and GE2-HCC

The genes repressed in both GE1-HCC and GE2-HCC included those with supposed tumour-inhibiting activity, e.g. CXCL14, CCBE1, IGFBP1, RND3, BMP10 and COLEC10, which encode proteins involved in extracellular matrix remodelling, migration and the immune response. Their decreased expression in various cancers has been reported [81,82,83,84,85,86].

Genes upregulated in GE1-HCC

A few genes specifically upregulated in GE1-HCC have been found to promote carcinogenesis of various cancer types when overexpressed. Some examples are CACNA1H, DBP, B3GnT8, ASAP3 and CYHR1, with CACNA1H and DBP linked to chemoresistance [87, 88]. One exception is DNM3, a gene that encodes a protein with growth-inhibiting function in HCC cells [89].

Genes upregulated in GE2-HCC: cell proliferation

Most of the genes strongly upregulated only in the GE2-HCC subgroup (Additional files 5 and 6) encode proteins involved in the cell cycle and its regulation. Many of these genes have previously been linked to advanced stages of different cancer types including HCC. These genes enclose KIFC1 [90, 91], GPC3 [92], PEG10 [93,94,95], and UBE2C [96,97,98].

In GE2-HCC, there was marked upregulation (up to 17 fold) in the expression of several small non-coding RNAs—SNAR-A3, SNAR-B2, SNAR-H, SNAR-I, SNAR-D and SNAR-G2—located on 19q13.33, a genomic region that showed recurrent gain in HCC and NT samples. An in vivo study showed that the SNAR family members are associated with NF90, a double-stranded RNA (dsRNA) binding protein implicated in transcriptional and translational control [99] and suggested to be involved in viral replication, e.g. hepatitis C virus [100]. ILF3, which encodes the NF90 protein, was also upregulated in GE2-HCC and located on the amplified 19p13.2 region. A recent study reported ectopic expression of some SNAR family members in HCC patients. This dysregulation promoted invasion and migration in HCC cells [101].

Genes upregulated in GE2-HCC: nuclear size and transport

The distinct feature of tumour cells is an enlarged nucleus, although the molecular mechanisms that regulate nuclear size in normal and cancer cells remains unknown. We found that the expression of two major nuclear transport receptor genes, importin alpha 1—also known as karyopherin alpha 2 (KPNA2) —and exportin (XPOT), were significantly increased specifically in GE2-HCC. These changes could lead to profound dysregulation of nuclear-cytoplasmic transport [102]. KPNA2 is the most abundant nuclear import receptor for proteins. An excess of this protein directly increased the nuclear size in a Xenopus model [103]. It contributes to enhanced nuclear entry of structural proteins necessary for building up the nuclear envelope. The signalling pathways are also dependent on the nuclear transport system, namely import of activated transcription factors and signalling proteins and export of target gene products. Thus, KPNA2 overexpression might strongly deregulate gene transcription associated with increased proliferation and survival in GE2 carcinomas. Increased KPNA2 expression has been found in HCC patients, and its overexpression correlated with the expression of essential mitotic proteins CCNB2 and CDK1 in HCC cells [104]. These findings are consistent with our study. Similar results have been reported for exportin-t (encoded by XPOT), a nuclear export receptor responsible for most export of mature tRNA from the nucleus to cytoplasm. The enhanced export of mature tRNA could hasten protein synthesis in the cytoplasm. A study revealed XPOT upregulation in HCC tumour tissues, poor outcome for patients with the high expression of the protein and association of XPOT expression with expression of molecules that regulate the cell cycle (i.e. several cyclins and cyclin-dependent kinases) in HCC cells [105].

Genes upregulated in GE2-HCC: chromosomal instability

GE2-HCC showed a high level of chromosomal instability, with large genomic losses and aneuploidy. This dysfunction might be linked to overexpression of genes involved in DNA repair, e.g. HJURP and PARP1, and in chromosome segregation and spindle checkpoint function, e.g. CDC20, ANAPC1, ANAPC11, TTK, BUB1B, MAD2L1, KIFC1, PTTG1 and UBE2C. HJURP encodes a protein involved in the repair of double-stranded DNA breaks. Its overexpression leads to mitotic defects in human cells [106]. HJURP has already been shown to be an independent prognostic marker in HCC [107, 108]. PARP1 is involved in microhomology-mediated end joining (MMEJ), an error-prone DNA repair pathway. PARP1 is aberrantly expressed in different cancers, where its overexpression rather than underexpression is associated with genomic instability, likely due to accumulation of errors through increased MMEJ [109]. Higher PARP1 expression has been correlated to a higher stage in HCC [110]. PARP1 inhibitors are already in clinical use to suppress various tumours [109]. We also observed high expression of some other genes involved in the spindle checkpoint mechanism and chromosome segregation, e.g. CDC20, MAD2L1 and TTK. Their overexpression, as opposed to mutation or underexpression, has been found in cancer cells with genomic instability and mitotic spindle defects [111], and has also been reported in HCC, where it was linked to low patient survival [112, 113].

Master regulators in GE1-HCC

Master regulators identified for upregulated gene networks in GE1-HCC included membrane-type metalloproteinases (MT-MMPs), TGM2 and E2F1. TGM2 is a cell-surface protein involved in cell binding to fibronectin. Fibronectin is a reactive extracellular matrix component implicated in cell adhesion, migration and signalling in the liver [114]. Abnormal fibronectin accumulation in the liver due to sustained injury or disease has been suggested to elicit aberrant cell signalling that promotes further tissue damage and contributes to tumour genesis [114, 115]. E2F1 plays a critical role in the control of cellular proliferation and can activate genes that promote cell proliferation or apoptosis. Pro-survival signalling might abrogate the pro-apoptotic capacity of E2F1 and thus favour E2F1-mediated cell proliferation. MMP15 (associated with the plasma membrane) and TGM2, besides their involvement in invasion and metastasis, can convey cell signalling to enhance cell survival [116, 117]. Therefore, MT-MMPs, TGM2 and E2F1 might be conjoined by pathological processes in the liver to trigger inappropriate cell proliferation and tumour development. In our study, there was a nearly twofold increase in E2F1 in GE1-HCC. We discuss its pivotal role in the process of liver transformation below.

Transcriptional regulators in GE1-HCC

A bioinformatics search identified AR as an important transcription factor involved in upregulation of genes in GE1-HCC, with BIRC5 (an apoptosis inhibitor) and C1orf35 (a transcript with transforming activity) as its possible target genes. Androgen-dependent and androgen-independent AR activation has been reported in hepatocarcinogenesis [24]. However, anti-androgenic treatment did not benefit male patients with advanced HCC [118]. Consistent with this observation, our analysis showed that in GE2-HCC, AR gene expression and its role in gene transcription (Yes/No ratio in regulated genes) were significantly reduced compared with GE1-HCC. Information about PGR role in HCC is rare [26]. The results of our study support the notion that PGR may be implicated in HCC carcinogenesis at early and advanced stages of the disease. Understanding the roles of AR and PGR in HCC is important because AR and PGR inhibitors are already in clinical use [24, 27]. A number of other potential transcriptional regulators identified in GE1-HCC—e.g. ZEB1, HES1, NR6A1, PATZ1, PAX4 and MTF1—have a reported oncogenic role in cancer types other than HCC [28,29,30,31,32,33,34,35]. Our results implicate them in hepatocarcinogenesis too. Transcription factor ZNF219, whose potential binding sites are found in several genes with an obvious role in carcinogenesis, i.e. E2F1, MDK, REQL4 and C1orf35, has emerged as a new potential transcriptional regulator in GE1-HCC.

Master regulators in GE2-HCC

The most significant master regulators identified for upregulated gene networks in GE2-HCC are important regulators of the cell cycle, namely the G2/M transition and mitotic spindle formation. The kinase PKMYT1 inhibits the G2/M transition, allowing time for DNA damage recovery. It is overexpressed rather than underexpressed in HCC, and exerts oncogenic activity in HCC cells [119]. Targeted inhibition of PKMYT1 activity would shorten the time between checkpoint abrogation and mitotic entry and, therefore, should preferably damage cancer cells with an already defective G1 checkpoint mechanism. Efforts have been made to develop small molecule inhibitors of PKMYT1 as a promising anti-cancer therapy approach [120]. CCNB2 overexpression hyperactivates the centrosomal kinases AURKA and PLK1, leading to accelerated centrosome separation, lagging chromosomes and aneuploidy [121]. The APC/C complex bound to the regulatory protein CDC20 initiates anaphase entry for chromosome separation, and CDC20 is a critical molecule in the spindle assembly checkpoint mechanism. The ubiquitin-conjugating enzyme UBE2C collaborates with the APC/C–CDC20 complex in the removal of mitotic regulators and thereby contributes to cell cycle progression. Increased CDC20 expression has been associated with defective spindle formation and progression of multiple tumours, including HCC [122]. UBE2C overexpression also leads to chromosomal missegregation and tumour formation [123]. Nath et al. [124] linked UBE2C overexpression and chromosomal instability in cancer cells to excess E2F1, which recruits the CDC20–APC/C to upregulate UBE2C transcription. This phenomenon highlights the role of deregulated RB–E2F1 pathway in premature anaphase, chromosomal abnormalities and aneuploidy beyond its well-documented role in G1/S progression. All aforementioned genes, i.e. CCNB2, PKMYT, some components of the APC/C complex (e.g. ANAPC1 and ANAPC11), CDC20, UBE2C, E2F1 and many others involved in progression and control of G2/M phase, were strongly upregulated in GE2-HCC. They might represent candidates for targeted therapy in advanced HCCs with high-proliferation signature.

Transcriptional regulators in GE2-HCC

Promoter analysis of genes upregulated in GE2-HCC suggests the activation of transcription factor NKX6.1. A few upregulated genes with potential TFBSs for NKX6-1 in their promoters are critical genes essential for cell proliferation (CDK1 and two replication-dependent histones), growth (KPNA2 and XPOT) and motility (ROBO1) and might mediate the oncogenic effects of NKX6-1. NKX6-1 overexpression in HCC is associated with progressive features and an unfavourable prognosis [37].

Transcription factor POU2F1 might be pro-oncogenic in multiple contexts. It has been implicated in cell proliferation, immune modulation, oxidative and cytotoxic stress resistance, metabolic regulation, stem cell identity and a poised transcription state [38, 125]. It has been hypothesised to integrate multiple signal inputs through diverse posttranslational modifications and interaction with various partners to direct changes in gene transcription in general, by opening chromatin, and in cell-specific contexts. In prostate cancer, POU2F1 is a co-regulator of AR, leading to AR hypersensitivity and driving androgen-independent cancer progression [126]. A similar phenomenon might occur in liver carcinogenesis. Accumulation of DNA damage and cell stress during the disease progression might activate POU2F1, leading, in turn, to changes of gene transcription induced by other transcription factors through modifying their interaction with ligands and/or binding to DNA. Enhanced POU2F1 activity has been described in different epithelial malignancies as a result of focal amplification, increased mRNA, augmented protein level and enhanced activity [38]. We found amplification of the 1q24.2 locus, which encodes POU2F1, in more than 80% of all HCCs, but its transcriptional level was not changed. These data suggest the involvement of other mechanisms in the enhancement of its transcriptional activity. POU2F1 has already emerged as a prognostic marker and a potential therapeutic target in several types of cancer including HCC [39]. The fact that POU2F1 is important for the cell response to genotoxic and oxidative stress, but is not critical under standard conditions, makes its inhibition an attractive anti-cancer therapeutic approach. One tactic is targeting the POU2F1-binding DNA sequence by pyrrole-imidazole polyamides, which has been successfully implemented in mice [126].

The second highest transcriptional regulator in GE2-HCC was the proto-oncogene JUN, overexpression of which has been shown to reprogramme oestrogen receptor signalling in breast cancer cells and confer them resistance to tamoxifen [41]. JUN overexpression in HCC has been linked to sorafenib resistance [42].

Another significant transcription factor was E2F1, the expression of which was also strongly increased in GE2-HCC. As discussed in the section about master regulator molecules, the E2F family plays a crucial role in controlling the cell cycle and the action of tumour suppressor proteins [74]. When overexpressed, it can be responsible for promoting the cell cycle as well as genomic instability [124]. Sustained activation of E2F1 and its target genes might be induced by impaired regulatory mechanisms and aberrant signalling due to chronic liver disorders or injury or by DNA tumour virus oncoproteins, e.g. the hepatitis B virus X protein known to target the Rb–E2F1 pathway [127]. Subtle yet significant gains in E2F1 and E2F3 copy number rather than mutations have been found in advanced HCC, and a direct, cell-autonomous role for E2f activators in driving HCC in mice has been demonstrated [47]. In our HCC cohort, the genetic locus 20q11.2, which encodes E2F1, was recurrently amplified in tumours, E2F1 expression was moderately and strongly upregulated in GE1-HCC and GE2-HCC, respectively.

To achieve growth regulation, E2F1 requires partners. One of these is NFY [128]. More than a half of the genes upregulated in GE2-HCC with a high proliferation signature were enriched in CCAAT box and NFY binding motif for CCAAT-binding transcription factor NFY. This finding agrees with studies of other cancers where CCAAT boxes were over-represented and bound to NFY in variety of upregulated genes [57]. NFY is an essential regulator of cell cycle progression. It activates transcription of numerous genes that regulate cell cycle, including E2F1. It also has a direct non-transcriptional role in the overall efficiency of DNA replication. NFY increases cell type–specific gene expression by promoting chromatin accessibility for cell type–specific master transcription factors [129]. Therefore, NFY activation might contribute to a broad augmentation of gene expression changes, first of all, E2F1 overexpression, associated with cancer progression in GE2-HCC. Considerable efforts have being made to find compounds that specifically inhibit NFY activity in cancer cells [57].

The transcription factor GABP is a negative prognostic biomarker in HCC and inhibits HCC cell migration and invasion [53]. On the other hand, GABP strongly binds and activates a mutant promoter of the telomerase reverse transcriptase gene (TERT), reactivating its expression and thereby increasing the replicative potential of tumour cells [51]. Two specific non-coding mutations in the TERT promoter required for the activation of TERT transcription by GABP occur with high frequency in aggressive cancer types. In one study, these mutations were found in 47% of HCCs [130]. In our study, the promoter analysis identified GABP as potential important transcriptional regulator in GE2-HCC, but we did not find a significant difference in TERT transcription among GE1-HCC, GE2-HCC and NT samples.

Two additional transcriptional regulators upregulated in GE2-HCC were SRF and TFCP2. Increased SRF and TFCP2 levels accelerate tumour cell migration and invasion and are linked to cancer progression [49, 55] and to high-grade HCC [50, 56]. TFCP2 is involved in regulation of cell proliferation, invasion, angiogenesis, metastasis and chemoresistance in HCC [56]. Small molecular inhibitors of TCPF2 have emerged as promising potent and effective therapeutics for HCC [131].

Candidate molecules for targeted anti-tumour therapy in GE1-HCC and GE2-HCC

As described above, the GE1-HCC and GE2-HCC subgroups were determined based on specific changes in histological, DNA and RNA levels that reflect their different biology. As a result, we uncovered many potentially valuable targets for anti-cancer therapy, as already discussed earlier and listed in Table 9.

Table 9 Candidate molecules for targeted anti-tumour therapy in GE1-HCC and GE2-HCC

GE gene expression, HCC hepatocellular carcinoma, MR master regulator, TR transcriptional regulator

Our findings point to a demand for differentiated approaches for treatment of HCC with low and high proliferation. Whereas E2F1, MDK, BIRC5, IL10, PGR, C1orf35 and TFCP2 (among others) seem to represent promising targets for intervention in all HCCs, therapy directed against KIFC1, PAFAH1B3, PKMYT1, PEG10, PARP1, POU2F1, NFY or TNF might be effective specifically in advanced GE2-HCC. Conversely, anti-androgen therapy, anti-ZNF219 and SOSC3-peptidomimetics may have a therapeutic potential for HCCs with low proliferation signature.

Conclusion

The findings from this study, in accordance with abundant previously published data, argue for a pivotal role of dysregulation of the E2F1 pathway in liver carcinogenesis. This dysfunction, due to diverse pathological processes in the liver, is capable of initiating both inappropriate cell proliferation and chromosomal instability.

A dedifferentiation switch manifested by strong propagation of gene expression changes and genomic instability might be linked to turning on transcriptional co-regulators, e.g. POU2F1 (OCT1) and NFY, as a response to accumulating cell stress during malignant development (Fig. 4).

Fig. 4
figure 4

Proposed scheme for key steps in liver cancer development and progression

Activation of transcriptional co-activators might explain, at least in part, the insensitivity of advanced HCC to anti-androgen treatment.

Our findings highlight the demand for different approaches to treat HCC entities that exhibit low or high proliferation signatures and provide new, promising candidates for developing targeted HCC therapy.