Journal of Molecular Medicine

, Volume 97, Issue 6, pp 879–888 | Cite as

Random gene sets in predicting survival of patients with hepatocellular carcinoma

  • Timo Itzel
  • Rainer Spang
  • Thorsten Maass
  • Stefan Munker
  • Stephanie Roessler
  • Matthias P. Ebert
  • Hans J. Schlitt
  • Wolfgang Herr
  • Matthias Evert
  • Andreas TeufelEmail author
Original Article


Despite multiple publications, molecular signatures predicting the course of hepatocellular carcinoma (HCC) have not yet been integrated into clinical routine decision-making. Given the diversity of published signatures, optimal number, best combinations, and benefit of functional associations of genes in prognostic signatures remain to be defined. We investigated a vast number of randomly chosen gene sets (varying between 1 and 10,000 genes) to encompass the full range of prognostic gene sets on 242 transcriptomic profiles of patients with HCC. Depending on the selected size, 4.7 to 23.5% of all random gene sets exhibit prognostic potential by separating patient subgroups with significantly diverse survival. This was further substantiated by investigating gene sets and signaling pathways also resulting in a comparable high number of significantly prognostic gene sets. However, combining multiple random gene sets using “swarm intelligence” resulted in a significantly improved predictability for approximately 63% of all patients. In these patients, approx. 70% of all random 50-gene containing gene sets resulted in equal and stable prediction of survival. For all other patients, a reliable prediction seems highly unlikely for any selected gene set. Using a machine learning and independent validation approach, we demonstrated a high reliability of random gene sets and swarm intelligence in HCC prognosis. Ultimately, these findings were validated in two independent patient cohorts and independent technical platforms (microarray, RNASeq). In conclusion, we demonstrate that using “swarm intelligence” of multiple gene sets for prognosis prediction may not only be superior but also more robust for predictive purposes.

Key messages

  • Molecular signatures predicting HCC have not yet been integrated into clinical routine

  • Depending on the selected size, 4.7 to 23.5% of all random gene sets exhibit prognostic potential; independent of the technical platform (microarray, RNASeq)

  • Using “swarm intelligence” resulted in a significantly improved predictability

  • In these patients, approx. 70% of all random 50-gene containing gene sets resulted in equal and stable prediction of survival

  • Overall, “swarm intelligence” is superior and more robust for predictive purposes in HCC


HCC Liver cancer Prognostic Signature Gene set Bioinformatics Transcriptome Profiling Random Swarm intelligence Microarray RNA Seq 



The authors thank Dr. Snorri Thorgeirsson, NIH/NCI, Bethesda, MD for his generous support and providing clinical parameters to the GSE4024 and GSE1898 data sets. S.R. was supported by the German Research Foundation (DFG) CRC SFB/TR 209 Liver Cancer project B01.

Authors’ contribution

Study concept and design: TI, RS, TM, SST, and AT; acquisition of data—public expression data, analysis, and interpretation of data: TI, RS, TM, SM, SR, MPE, ME, and AT; drafting of the manuscript: TI, RS, TM, SM, HJS, WH, ME, and AT; critical revision of the manuscript for important intellectual content: TI, RS, TM, SM, SR, SST, MPE, HJS, WH, ME, and AT; statistical analysis: RS; obtained funding: WH, ME, HJS, and AT.

Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflict of interest.

Supplementary material

109_2019_1764_MOESM1_ESM.png (69 kb)
Supplemental figure 1 Differences in clinical characteristics between patients with good and poor prognosis. (PNG 69 kb)
109_2019_1764_MOESM2_ESM.png (1.1 mb)
Supplemental figure 2 Unsupervised clustering based on 50 gene containing gene sets separating prognostic subgroups with high significance (p < 0.0001). (PNG 1126 kb)
109_2019_1764_MOESM3_ESM.pdf (56 kb)
Supplemental table 1 Randomly chosen gene expression gene sets ranging from 1 to 10,000 genes (compare Fig. 1) were investigated for prognostic capability in patients with HCC. Fivefold re-iteration demonstrated stable results. (PDF 56 kb)
109_2019_1764_MOESM4_ESM.pdf (41 kb)
Supplemental table 2 Summary of theoretical possible number of gene sets and percent of evaluated gene sets performing 500,000 (5 × 100,000) re-iterations. (PDF 41 kb)
109_2019_1764_MOESM5_ESM.pdf (5.6 mb)
Supplemental table 3 Analysis of gene sets and signaling pathways obtained from KEGG, Biocarta, Reactome, and PID as well as GO terms (Biological Product (BP), Cellular Component (CC), and Molecular Function (MF)) for prognostic capability in patients with HCC. (PDF 5756 kb)
109_2019_1764_MOESM6_ESM.pdf (299 kb)
Supplemental table 4 397 randomly chosen 50 genes containing gene sets evaluated for survival prediction with a significance level of p = 0.0001. (PDF 299 kb)
109_2019_1764_MOESM7_ESM.pdf (95 kb)
Supplemental table 5 Reduced data set containing only data from patients whose samples were assigned to either the good or poor prognosis group. Randomly chosen gene expression gene sets ranging from 1 to 10,000 genes (compare Fig. 1) were investigated for prognostic capability in patients with HCC. Fivefold re-iteration demonstrated stable results. (PDF 94 kb)
109_2019_1764_MOESM8_ESM.pdf (12 kb)
Supplemental table 6 Average clinical characteristics of patient groups with good, poor, or undetermined prognosis. (PDF 12 kb)
109_2019_1764_MOESM9_ESM.pdf (23.7 mb)
Supplemental table 7 Exemplary listing of 100 re-iterations of machine learning approach and results for validation of our swarm intelligence approach. Left: Heatmap of learning approach for the randomly chosen gene set. Middle: Survival analysis (Kaplan–Meier) of learning samples. Right: Survival analysis (Kaplan–Meier) of test samples. Full procedure contained 5 independent runs including 1000 re-iterations each. (PDF 24297 kb)


  1. 1.
    Cao H, Phan H, Yang LX (2012) Improved chemotherapy for hepatocellular carcinoma. Anticancer Res 32:1379–1386Google Scholar
  2. 2.
    Llovet JM, Montal R, Sia D, Finn RS. Molecular therapies and precision medicine for hepatocellular carcinoma. Nat Rev Clin Oncol 2018 Google Scholar
  3. 3.
    Teufel A, Staib F, Kanzler S, Weinmann A, Schulze-Bergkamen H, Galle PR (2007) Genetics of hepatocellular carcinoma. World J Gastroenterol 13:2271–2282CrossRefGoogle Scholar
  4. 4.
    Marquardt JU, Galle PR, Teufel A (2012) Molecular diagnosis and therapy of hepatocellular carcinoma (HCC): an emerging field for advanced technologies. J Hepatol 56:267–275CrossRefGoogle Scholar
  5. 5.
    Teufel A, Marquardt JU, Galle PR (2012) Novel insights in the genetics of HCC recurrence and advances in transcriptomic data integration. J Hepatol 56:279–281CrossRefGoogle Scholar
  6. 6.
    Kim K, Zakharkin SO, Allison DB (2010) Expectations, validity, and reality in gene expression profiling. J Clin Epidemiol 63:950–959CrossRefGoogle Scholar
  7. 7.
    Lee JS, Chu IS, Heo J, Calvisi DF, Sun Z, Roskams T, Durnez A, Demetris AJ, Thorgeirsson SS (2004a) Classification and prediction of survival in hepatocellular carcinoma by gene expression profiling. Hepatology 40:667–676CrossRefGoogle Scholar
  8. 8.
    Samur MK (2014) RTCGAToolbox: a new tool for exporting TCGA firehose data. PLoS One 9:e106397CrossRefGoogle Scholar
  9. 9.
    Lee JS, Heo J, Libbrecht L, Chu IS, Kaposi-Novak P, Calvisi DF, Mikaelyan A, Roberts LR, Demetris AJ, Sun Z, Nevens F, Roskams T, Thorgeirsson SS (2006) A novel prognostic subtype of human hepatocellular carcinoma derived from hepatic progenitor cells. Nat Med 12:410–416CrossRefGoogle Scholar
  10. 10.
    Yamashita T, Forgues M, Wang W, Kim JW, Ye Q, Jia H, Budhu A, Zanetti KA, Chen Y, Qin LX, Tang ZY, Wang XW (2008) EpCAM and alpha-fetoprotein expression defines novel prognostic subtypes of hepatocellular carcinoma. Cancer Res 68:1451–1461CrossRefGoogle Scholar
  11. 11.
    Ayers M, Symmans WF, Stec J, Damokosh AI, Clark E, Hess K, Lecocke M, Metivier J, Booser D, Ibrahim N, Valero V, Royce M, Arun B, Whitman G, Ross J, Sneige N, Hortobagyi GN, Pusztai L (2004) Gene expression profiles predict complete pathologic response to neoadjuvant paclitaxel and fluorouracil, doxorubicin, and cyclophosphamide chemotherapy in breast cancer. J Clin Oncol 22:2284–2293CrossRefGoogle Scholar
  12. 12.
    Gerlinger M, Rowan AJ, Horswell S, Math M, Larkin J, Endesfelder D, Gronroos E, Martinez P, Matthews N, Stewart A, Tarpey P, Varela I, Phillimore B, Begum S, McDonald N, Butler A, Jones D, Raine K, Latimer C, Santos CR, Nohadani M, Eklund AC, Spencer-Dene B, Clark G, Pickering L, Stamp G, Gore M, Szallasi Z, Downward J, Futreal PA, Swanton C (2012) Intratumor heterogeneity and branched evolution revealed by multiregion sequencing. N Engl J Med 366:883–892CrossRefGoogle Scholar
  13. 13.
    Ioannidis JP (2010) Expectations, validity, and reality in omics. J Clin Epidemiol 63:945–949CrossRefGoogle Scholar
  14. 14.
    Wooden B, Goossens N, Hoshida Y, Friedman SL (2017) Using big data to discover diagnostics and therapeutics for gastrointestinal and liver diseases. Gastroenterology 152:53–67 e3CrossRefGoogle Scholar
  15. 15.
    Roessler S, Budhu A, Wang XW (2014) Deciphering cancer heterogeneity: the biological space. Front Cell Dev Biol 3:2–12Google Scholar
  16. 16.
    Itzel T, Scholz P, Maass T, Krupp M, Marquardt JU, Strand S, Becker D, Staib F, Binder H, Roessler S, Wang XW, Thorgeirsson S, Müller M, Galle PR, Teufel A (2015) Translating bioinformatics in oncology: guilt-by-profiling analysis and identification of KIF18B and CDCA3 as novel driver genes in carcinogenesis. Bioinformatics 31:216–224CrossRefGoogle Scholar
  17. 17.
    Zhang Y, Wang S, Li D, Zhnag J, Gu D, Zhu Y, He F (2011) A systems biology-based classifier for hepatocellular carcinoma diagnosis. PLoS One 6:e22426CrossRefGoogle Scholar
  18. 18.
    Consortium M, Shi L, Reid LH et al (2006) The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat Biotechnol 24:1151–1161CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  • Timo Itzel
    • 1
  • Rainer Spang
    • 2
  • Thorsten Maass
    • 3
  • Stefan Munker
    • 4
  • Stephanie Roessler
    • 5
  • Matthias P. Ebert
    • 6
  • Hans J. Schlitt
    • 7
  • Wolfgang Herr
    • 8
  • Matthias Evert
    • 9
  • Andreas Teufel
    • 1
    Email author
  1. 1.Division of Hepatology & Division of Clinical Bioinformatics, Department of Medicine II, Medical Faculty MannheimHeidelberg UniversityMannheimGermany
  2. 2.Statistical Bioinformatics, Department of Functional GenomicsUniversity Medical CenterRegensburgGermany
  3. 3.Hepacult GmbHRegensburgGermany
  4. 4.Department of Medicine II, Großhadern University Medical CenterLudwig Maximilians UniversityMunichGermany
  5. 5.Institute of PathologyHeidelberg UniversityHeidelbergGermany
  6. 6.Department of Medicine II, Medical Faculty MannheimHeidelberg UniversityHeidelbergGermany
  7. 7.Department of SurgeryUniversity Medical CenterRegensburgGermany
  8. 8.Department of Medicine IIIUniversity Medical CenterRegensburgGermany
  9. 9.Department of PathologyUniversity of RegensburgRegensburgGermany

Personalised recommendations