Patients with polyclonal hepatocellular carcinoma are at a high risk of early recurrence and have a poor recurrence-free survival period

Background/purpose of the study Tumor heterogeneity based on copy number variations is associated with the evolution of cancer and its clinical grade. Clonal composition (CC) represents the number of clones based on the distribution of B-allele frequency (BAF) obtained from a genome-wide single nucleotide polymorphism (SNP) array. A higher CC number represents a high degree of heterogeneity. We hypothesized and evaluated that the CC number in hepatocellular carcinoma (HCC) tissues might be associated with the clinical outcomes of patients. Methods Somatic mutation, whole transcriptome, and CC number based on copy number variations of 36 frozen tissue samples of operably resected HCC tissues were analyzed by targeted deep sequencing, transcriptome analysis, and SNP array. Results The samples were classified into the heterogeneous tumors as poly-CC (n = 26) and the homogeneous tumors as mono-CC (n = 8). The patients with poly-CC had a higher rate of early recurrence and a significantly shorter recurrence-free survival period than the mono-CC patients (7.0 months vs. not reached, p = 0.0084). No differences in pathogenic non-synonymous mutations, such as TP53, were observed between the two groups when targeted deep sequencing was applied. A transcriptome analysis showed that cell cycle-related pathways were enriched in the poly-CC tumors, compared to the mono-CC tumors. Poly-CC HCC is highly proliferative and has a high risk of early recurrence. Conclusion CC is a possible candidate biomarker for predicting the risk of early postoperative recurrence and warrants further investigation. Supplementary Information The online version contains supplementary material available at 10.1007/s12072-021-10278-4.


Introduction
Liver cancer is one of the most common malignancies, with more than 800,000 new cases diagnosed globally each year [1]. Hepatocellular carcinoma (HCC) accounts for approximately 75% of liver cancers. Although surgical resection improves the survival of HCC patients, early recurrence after hepatic resection is a poor prognostic factor for patients with HCC. HCC is a highly heterogeneous cancer [2][3][4]. Molecular heterogeneity and a lack of biomarkers for recurrence have contributed to the poor prognosis of HCC patients. The intratumoral heterogeneity of cancer cells remains largely unexplored. Copy number variations reflect genomic structural changes that give rise to gene amplification or deletion. Chromosomal microarray and other array-based approaches have been widely adopted for the detection of whole-genome copy number variations [5,6]. The OncoScan FFPE Assay Kit (Thermo Fisher Scientific, Wilmington, DE) relies on molecular inversion prove (MIP) technology to detect genome-wide copy number alterations, loss of heterozygosity, and somatic mutation [7]. This assay provides the B-allele frequency (BAF), the log2 ratio (log2R), and the copy number for each of over 220,000 analyzed polymorphic genomic locations. The copy number is derived from the log2R and BAF values. The clonal composition of a tumor can be analyzed based on the BAF and log2R, which can be determined from the whole-genome copy number profiles obtained using the OncoScan FFPE Assay Kit. We previously demonstrated that the whole-genome single nucleotide polymorphism (SNP) array could be applied to the detection of the clonal composition of human ovarian cancer. Genome-wide segmentation data consisting of the log2R and BAF have been previously used to estimate the clonal composition (CC) number for individual tumors [8]. A similar approach resulted in a report indicating that the clonal composition can be derived from both somatic mutations as well as the log2R and BAF for loci with an aberrant copy number [9]. The CC number is not equivalent to the number of clones in the tumor tissue, but instead represents and estimates the number of clones that show copy number changes. For instance, a BAF value of 0.5 indicates a heterozygous genotype (AB), whereas values of 0 and 1 indicate homozygous genotypes (AA and BB, respectively). The BAF distribution in segments with copy number changes suggests the presence of different clones with different BAF patterns intratumorally. In segments with copy number gains, the percentage of aberrant cells with different BAF patterns due to BAF variation is difficult to evaluate because the copy number gain is not constant. On the other hand, there is no such limitation in evaluating the BAF in segments with copy number loss, and the clonal composition number can be derived from the presence of aberrant cells with a different BAF pattern.
We hypothesized that HCC with a high CC would exhibit highly aggressive behaviors when compared with HCC with a low CC. We examined the CC numbers of previously collected frozen tissue samples of HCC and investigated the association with early recurrence in postoperative HCC patients. In addition, we also used NGS-based deep sequencing to detect somatic mutations and to examine the whole transcriptome of HCC tissues to investigate any associations with CC.

Hepatocellular carcinoma samples
Frozen tumor tissue samples were obtained from 36 patients who underwent surgical resection at Kansai Medical University Hospital between August 2011 and December 2013. All the patients provided written informed consent to participate in the study, including the collection of tumor frozen tissues for analysis. Tumor stage is defined by General Rules for the Clinical and Pathological Study of Primary Liver Cancer [10] as stage I-IVa. All cases including five cases of IVa tumors were completely resected. The early recurrence of postoperative HCC has been previously defined as recurrence within 1 year of resection [11,12]. The recurrence-free survival (RFS) and overall survival (OS) were defined as the time interval between the date of hepatectomy and the date of recurrence or death, respectively. After surgery, patients were followed in the outpatient clinic every 3 months for the first 2 years and every 6 months thereafter. Early recurrence was defined as the appearance of a new lesion compatible with HCC on radiologic examination during the follow-up period.

Isolation of genomic DNA and RNA
A 0.5 cm square tumor legion without necrosis visually was sectioned and used for the analysis. DNA and RNA were isolated from frozen tissues using an AllPrep DNA/RNA Mini Kit (Qiagen, Valencia, CA) according to the manufacturer's instructions. The quality and quantity of the nucleic acid were verified using a NanoDrop 2000 device, PicoGreen dsDNA Reagent, and RiboGreen RNA reagent (all from Thermo Scientific).

Microarray-based comparative genomic hybridization assay
A MIPs array for the genome-wide estimation of copy number aberrations was performed using the OncoScan FFPE Assay Kit (Thermo Fisher Scientific), as previously reported [8]. Briefly, 80 ng of DNA was subjected to annealing with these MIPs for 16-18 h, followed by enzyme digestion and two separate gap-fill reactions. The circular MIPs were then separately linearized for each gap fill with a cleavage enzyme and amplified using PCR. The PCR products were subjected to enzymatic cleavage and fragmentation, followed by hybridization for 16-18 h with two OncoScan arrays (one for each gap fill). The arrays were then stained and washed using a GeneChip Fluidics Station 450 and loaded into a GeneChip Scanner 3000 7G (Affymetrix). Array fluorescence intensity (CEL) files were generated using Affymetrix GeneChip Command Console (AGCC) software, version 4.0, and the CEL files were converted to OSCHP files using OncoScan Console software 1.3 and visualized with Chromosome Analyses Suite software (version 4.0). A representative whole-genome view is shown in Fig. S1.

Clonal composition analysis
Clonal composition numbers were calculated using BAF and log2R information obtained from the OncoScan FFPE Assay using the Onco Clone Composition program, as described previously [8]. A given copy number segment can be associated with a percentage of aberrant tumor cells (%AC) and is assumed to result from a single underlying event attributable to a single clone; in other words, the copy number segments are associated with the same %AC and belong to the same clone. The clonal composition number is calculated by identifying the number of different %AC detected among the aberrant segments. The number of minor alleles (NOMA) at a heterozygous site can be defined as the BAF of a segment. The calculation of clone composition number only uses NOMA = 0, which represents a segment with 100% loss of heterozygosity and a normal copy number. If only one of such alleles is present, the BAF for such a segment will be zero. The Onco Clone Composition program provides analytical estimations of clonal composition using a clustering approach that clusters all segments with the same logR and BAF and then combines clusters corresponding to the same %AC. A graphical representation of the aberrated segments and their association with %AC is shown in Supplementary Fig. S2.

Targeted DNA sequencing
A targeted DNA library comprising approximately 1.2-Mb of the coding regions of 409 genes for panel sequencing was constructed using an Ion AmpliSeq Comprehensive Cancer Panel (CCP) (Thermo Fisher Scientific), as described previously [13]. Briefly, 40 ng of DNA was subjected to multiplex PCR amplification using an Ion AmpliSeq Library Kit 2.0 and the Ion AmpliSeq Comprehensive Cancer Panel (Thermo Fisher Scientific), which covers all exons in 409 genes. After multiplex PCR, Ion Xpress Barcode Adapters (Thermo Fisher Scientific) were ligated to the PCR products, which were then purified using Agencourt AMPure XP beads (Beckman Coulter, Brea, CA). The purified libraries were pooled and then sequenced using an Ion Torrent S5 instrument and the Ion 550 Chip Kit (all from Thermo Fisher Scientific). DNA sequencing data were accessed through the Torrent Suite ver. 5.10 program (Thermo Fisher Scientific). Reads were aligned against the hg19 human reference genome, and variants were called using Variant Caller, ver. 5.10. Raw variant calls were filtered with a quality score of < 100 and were manually checked using the integrative genomics viewer (IGV; Broad Institute, Cambridge, MA). Germline mutations were excluded using the Human Genetic Variation Database (http:// www. genome. med. kyoto-u. ac. jp/ SnpDB) and the Exome Aggregation Consortium database. Variants with FATHMM scores greater than 0.7 were classified as pathogenic [14]. The tumor mutation burden (TMB mutations/Mb) was assessed using Ion Reporter Software, ver. 5.10 (Thermo Fisher Scientific).

Whole transcriptome analysis
The whole transcriptome analysis was performed using the AmpliSeq Transcriptome Human Gene Expression Kit (Thermo Fisher Scientific). For library preparation, cDNA was generated using the SuperScript VILO cDNA Synthesis kit (Thermo Fisher Scientific) from 10 ng of total RNA. Then, cDNA was amplified for 12 cycles by adding PCR Master Mix and the AmpliSeq human transcriptome gene expression primer pool (Thermo Fisher Scientific). After multiplex PCR, Ion Xpress Barcode Adapters (Thermo Fisher Scientific) were ligated to the PCR products, which were then purified using Agencourt AMPure XP beads (Beckman Coulter). The purified libraries were pooled and then sequenced using an Ion Torrent S5 instrument and the Ion 550 Chip Kit (all from Thermo Fisher Scientific). The Ion Torrent Suite v5.10 software (Thermo Fisher Scientific) was used for base calling, alignment to the human reference genome (hg19), and quality control. Raw reads were then analyzed automatically using the AmpliSeqRNA plugin to generate gene-level expression values for all 20,802 RefSeq human genes.

Gene set enrichment analysis (GSEA)
A gene-set enrichment analysis (GSEA) was performed to identify pathways enriched in the Molecular Signatures Database (MSigDB) Hallmark gene set [15,16]. A nominal p value of < 0.05 and an FDR (false discovery rate) q value of < 0.05 were considered statistically significant.

Gene selection and pathway analysis
A total of 2408 genes that were differentially expressed according to CC numbers were selected using the following criteria: p value < 0.05 and FDR q value < 0.5. Among these 2408 genes, the top 500 markers of differential expression were identified based on the signal-to-noise ratio. Hierarchical clustering was performed using one minus the Pearson correlation coefficient as a distance measure and the average linkage method. To explore the potential biological pathways, genes in a cluster were submitted to the Metascape tool (https:// metas cape. org/ gp/ index. html) [17].

Statistical analysis
Categorical variables were compared using the Fisher exact test. Continuous variables were compared between groups with the Mann-Whitney U test. Variables with a statistical significance in the univariate analysis were included in the multiple logistic regression analysis. Survival was estimated using the Kaplan-Meier method and the log-rank test method. All the statistical analyses were performed using JMP software, version 14.2 (SAS Institute, Cary, NC), and Prism software, version 8.4 (GraphPad Software, San Diego, CA). A p value of < 0.05 was considered statistically significant.

Clonal composition number of resectable hepatocellular cancer
We analyzed 36 frozen samples of HCC using a wholegenome SNP array (Fig. S1) and estimated the clonal composition number using the Onco Clone Composition estimation program (Fig. S2). The mean ± SD value for the CC number was 1.0 ± 0.8 (range 0-3). It should be noted that a case with a CC number of 0 indicates a homogenous tumor mass. The CC number could not be estimated in two cases because the copy number data did not plot onto the fitting curve in the program. Representative plots for the CC profile are shown in Fig. S2. The HCC samples were categorized into mono-CC (CC = 0, n = 8) and poly-CC (CC ≥ 1, n = 26) groups that respectively reflected homogenous and heterogenous clones.

Clinicopathological features associated with clonal composition status
The clinicopathological characteristics of patients with mono-CC or poly-CC are summarized in Table 2. No significant differences were observed in sex, mean age, liver function status (Child-Pugh A or B), or stage as determined using the Fisher exact test. The presence of microscopic vascular invasion (p = 0.0180), moderately to poorly differentiated HCC (p = 0.0469), absence of neoadjuvant therapy (p = 0.0374), the presence of single tumor nodule (p = 0.0374), and higher levels of PIVKA-II (p = 0.0351) were significantly frequent in poly-CC group. This result suggests that poly-CC shows an aggressive tumor phenotype compared to mono-CC group. Early recurrence was significantly frequent in the poly-CC group than in the mono-CC group (p = 0.0127). There was no difference in the incidence of extrahepatic metastasis between poly-CC and mono-CC. The RFS and OS of the mono-and poly-CC groups were compared (Fig. 1). The median RFS was not reached in the mono-CC HCC groups and 7.0 months in the poly-CC HCC groups. The RFS of patients with poly-CC HCC was significantly shorter than that of the mono-CC patients (Fig. 1a, p= 0.0084). The OS of patients with poly-CC tended to be shorter than that of patients with mono-CC, although the difference was not significant (Fig. 1b, p= 0.2966). Since poly-CC status appeared to be associated with the poorly differentiated HCC, we investigated the relationship with progenitor markers, cytokeratin 19 (CK19) and epithelial cell adhesion molecule (EpCAM). No significant difference of CK19 expressions was observed between mono-and poly-CC groups determined by immunohistochemistry. On the other hand, higher expression of EpCAM was observed in mono-CC than poly-CC significantly (p = 0.0304). Thus, CC status may not indicate the features of progenitor cells (Fig. S3). were frequently identified mutations seen in the 36 samples that were analyzed, as reported previously [21]. No significant association was seen between the presence of a pathogenic TP53 gene mutation and grouping in the mono-and poly-CC groups (12.5% vs. 30.8%, p = 0.4030). Pathogenic CTNNB1 mutations tended to occur frequently in the mono-CC group, but the difference was not significant (37.5% vs. 7.7%, p = 0.0721). The tumor mutation burden (TMB) has  been reported as the total number of non-synonymous variants per tumor genomic region [22]. The TMB level of HCC is relatively low, compared with other types of solid cancers [23]. The TMB score of the presently reported cohort was also not high (median TMB = 6.7), and no difference in TMB was observed between the poly-CC (median 6.7; range 2.5-12.7) and the mono-CC (median 6.7; range 4.2-12.6) groups.

Gene expression profile and functional enrichment analysis
Thus far, our findings have indicated that the biological behavior of mono-and poly-CC tumors may differ. To investigate the biological functional differences between monoand poly-CC tumors, we performed gene-set enrichment and pathway analysis using gene expression data from the Ion AmpliSeq Transcriptome Human Gene Expression assay. GSEA was performed for the enrichment of 50 hallmark gene sets from the MSigDB database. The enriched up-and down-regulated hallmarks with nominal p < 0.05 and a false discovery rate (FDR) with q < 0.05 are shown in Table 3. The top five enriched gene set in the poly-CC group were related to E2F targets, the G2M checkpoint, Myc targets, the mitotic spindle, and DNA repair (Fig. 3a). On the other hand, the top five down-regulated signatures in the poly-CC group were related Allograft rejection, IFNγ response, Inflammatory response, TNFα signaling via NFκB, and IL6-JAK-STAT3 signaling (Fig. 3b), indicating that the immune response is upregulated in mono-CC group.
The top 100 up-regulated and 100 down-regulated genes (p < 0.05) in poly-CC group are listed in Table S1. Among the 100 up-regulated genes in poly-CC, the number of genes corresponding to the E2F target, G2M checkpoint,   CCL2, and MARCO), respectively. The analysis of immune infiltrates affecting CC status showed a significant correlation between mono-CC and infiltration of macrophages (Fig. S4). Previous reports revealed the association between the expression of OATP1B3, which is encoded by the gene SLCO1B3, was strongly associated with Wnt/β-catenin signaling [28]. SLCO1B3 gene expression was compared between mono-CC and poly-CC. Increased expression of SLCO1B3 was observed in mono-CC (mean ± SD, 2.98 ± 2.08) compared with poly-CC (mean ± SD, 1.69 ± 2.24) (p = 0.1173). Although not significantly different, expression of SLCO1B3 tended to be higher in mono-CC. In addition, expression of SLCO1B3 was higher in the CTNNB1 mutation positive cases (mean ± SD, 4.15 ± 2.73) than negative cases (mean ± SD, 1.62 ± 1.97) (p = 0.0691). This suggests that mono-CC may be associated with Wnt/β-catenin activated subtype HCC. To further explore this notion, we analyzed significant gene sets in the mono-and poly-CC tumors. A hierarchical clustering analysis of 500 differentially expressed genes revealed the enrichment of cluster I and cluster II genes in mono-and poly-CC tumors, respectively (Fig. 4a). Early recurrent patients exhibited the enrichment of cluster II genes. We conducted a gene ontology (GO) analysis using Metascape and found significant enrichment of the cellular pathways associated with the immune response and cytokine-related pathways in cluster I, which was enriched in mono-CC tumors (Fig. 4b). On the other hand, cell cycle-related pathways including those related to E2F targets, the G2M checkpoint, and DNA replication were enriched in cluster II, which was enriched in poly-CC tumors (Fig. 4c).

Univariate and multivariate analyses for risk factors of early recurrence
To identify the risk factors contributing to early postoperative recurrence of HCC, univariate and multivariate analysis were performed (Table 4). Univariate analysis showed that NBNC HCC, large tumor size, and elevated PIVKA-II level were significantly correlated with early recurrence. In multivariate analysis, large tumor size was found to be a significant risk factor; CC status was not significant, but there was a trend toward a higher risk of early recurrence in poly-CC (p = 0.0846).

Discussion
Tumor heterogeneity is considered to be associated with a poor prognosis and outcome in cancer patients. In this study, we evaluated the CC number, a surrogate marker of intratumor heterogeneity, to investigate its clinical relevance in HCC. In general, HCC is reported to be highly heterogeneous [2][3][4]. In our sample set, the rate of homogeneous tumors with a CC number of 0 and the rate of heterogeneous tumors with a CC number of greater than 1 were 23.5% and 76.5%, respectively, suggesting that HCC was often a heterogeneous tumor in our cohort. Heterogeneous tumors with poly-CC had a significantly higher rate of early recurrence. In addition, patients with poly-CC tumors were significantly associated with a poor postsurgical RFS. This finding suggests that HCC with high heterogeneity has a higher risk of recurrence after surgery. The CC number may be useful for predicting the risk of recurrence, but further studies are essential.
A gene mutation analysis was performed on the same sample set, and pathogenic mutations were selected using the FATHMM score. Therefore, the frequencies of TP53 and CTNNB1 mutations were lower than previously reported [21]. The gene mutation analysis showed no association between the CC number and the gene mutations status, including the TMB. The calculation of CC number is based on the copy number loss, suggesting that the diversity of copy number variations is reflected by the CC number. In lung cancer, subclonal evolution based on copy number changes is reportedly associated with the RFS and OS to a greater degree than those based on single nucleotide mutations [24]. Evaluation of the CC number based on copy number loss may be more informative for predicting clinical outcome than the detection of single nucleotide variants in hepatocellular cancer.
In an enriched transcriptome analysis using a supervised approach, cell cycle-related pathways were among the top 5 upregulated pathways in poly-CC tumors. Oshi et al. showed that G2/M scoring is associated with cell proliferation-related factors and the breast cancer grade using an enrichment analysis based on a gene expression analysis of metastatic estrogen-receptor-positive breast cancer [25]. The pathways enriched in our cohort were related to the entire cell cycle progression in the G1, S and G2/M phases, suggesting active involvement in cell cycle progression and proliferation. The BIRC5, which was common to all three pathways and showed the highest difference, plays important rules in the E2F, G2M, and mitosis pathways, highlighting the importance of BIRC5 involvement in poly-CC tumor, particularly with regards to enhanced functions in cell cycle progression, division, and proliferation [21]. The other four genes have also been reported to be involved in proliferation in hepatocellular carcinoma [22][23][24][25], suggesting that these genes could be related to the increased proliferative phenotype of poly-CC. Recently, Llovet et al. [26]. reviewed the molecular classification of HCC that classifies HCC into "proliferation" and "non-proliferation" classes based on a previous transcriptome analysis [27]. The proliferation class is characterized by enriched pathways related to the cell cycle, TP53 mutation, and HBV-positive populations. Compared with the mono-CC group, the poly-CC group in our sample cohort tended to exhibit enriched E2F and other cell cycle pathways, a high TP53 mutation rate ( , although these differences were not significant because of the small sample size. Thus, poly-CC tumors (cluster II) are consistent with these molecular and etiological characteristics of the "proliferative class." On the other hand, the non-proliferation class is associated with HCV and a molecular profile characterized by enhanced IL6-JAK-STAT signaling or CTNNB1 mutationmediated Wnt/β-catenin signaling associated with inflamed pathogenicity [28][29][30]. Mono-CC tumors, compared with  , and enriched cytokine-related pathways. The analysis of immune infiltrates affecting CC status showed a significant correlation between mono-CC and infiltration of macrophages (Fig. S4). This result might be linked to the enrichment of IFNγ-and IL-6-mediated immune response pathway in mono-CC [31,32]. Thus, mono-CC tumors (cluster I) are consistent with these molecular and etiological characteristics of the "non-proliferative class". Tumor heterogeneity of HCC may be different between tumor tissue of different etiology because of the disease features and the period of fibrogenesis as well the extent of chromosomal mutations and this phenomena is exemplified in this study which includes a heterogenous population HCC patients with different etiologies. The limitations of this study include its retrospective design and small sample size. Future studies examining larger sample sizes are needed to confirm the findings of this study and differential signatures of HCC with various etiologies. In the present study, pathway-related analysis was performed to characterize poly-CC tumors. However, the further analysis will be planned in the next study. In addition, isolation and transplantation of different poly-CC HCC cells in animals may elucidate copy number relevance in infiltration and proliferation rate.
In conclusion, HCCs with poly-CCs, a surrogate of heterogeneity based on copy number variation, are enriched in genes involved in the cell cycle pathway, have a proliferative phenotype, and have a higher risk of early postoperative recurrence.
Author contributions MK contributed substantially to the conception, design, and planning of the study, acquisition of data, analysis of the data, interpretation of the results, and drafting the manuscript; KS contributed substantially to the conception, design, and planning of the study, acquisition of data, analysis of the data, interpretation of the results, and drafting the manuscript; KN contributed substantially to the conception, design, or planning of the study, analysis of the data, Poly-CC/mono-CC 20/2 (90.9/9.1) 6/6 (50.0/50.0) 0.0127* 0.0846 interpretation of the results, and drafting the manuscript; HM, HK, KM, MAV, and MS contributed substantially to the acquisition of data, and interpretation of the results. All authors contributed substantially to critically reviewing or revising the manuscript for important intellectual content, and approved the final manuscript.
Funding This work was supported in part by a Grant-in Aid for Scientific Research (C) from the Japan Society for the Promotion of Science Grant Numbers JP19K07722 (KS).
Data Availability Yes. Data generated or analyzed during this study are available from the corresponding author on reasonable request.