Skip to main content

Gene Expression High-Dimensional Clustering Towards a Novel, Robust, Clinically Relevant and Highly Compact Cancer Signature

  • Conference paper
  • First Online:
Bioinformatics and Biomedical Engineering (IWBBIO 2019)

Abstract

Precision medicine, a highly disruptive paradigm shift in healthcare targeting the personalizing treatment, heavily relies on genomic data. However, the complexity of the biological interactions, the important number of genes as well as the lack of substantial patient’s clinical data consist a tremendous bottleneck on the clinical implementation of precision medicine. In this work, we introduce a generic, low dimensional gene signature that represents adequately the tumor type. Our gene signature is produced using LP-stability algorithm, a high dimensional center-based unsupervised clustering algorithm working in the dual domain, and is very versatile as it can consider any arbitrary distance metric between genes. The gene signature produced by LP-stability reports at least 10 times better statistical significance and \(35\%\) better biological significance than the ones produced by two referential unsupervised clustering methods. Moreover, our experiments demonstrate that our low dimensional biomarker (27 genes) surpass significantly existing state of the art methods both in terms of qualitative and quantitative assessment while providing better associations to tumor types than methods widely used in the literature that rely on several omics data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Center BITGDA: Analysis-ready standardized TCGA data from broad GDAC firehose 2016\(\_\)01\(\_\)28 run (2016)

    Google Scholar 

  2. Cowen, L., Ideker, T., Raphael, B.J., Sharan, R.: Network propagation: a universal amplifier of genetic associations. Nat. Rev. Genet. 18(9), 551–562 (2017)

    Article  Google Scholar 

  3. van Dam, S., Võsa, U., van der Graaf, A., Franke, L., de Magalhães, J.P.: Gene co-expression analysis for functional classification and gene-disease predictions. Brief. Bioinf. 19(4), 575–592 (2018). bbw139

    Google Scholar 

  4. Drucker, E., Krapfenbauer, K.: Pitfalls and limitations in translation from biomarker discovery to clinical utility in predictive and personalised medicine. EPMA J. 4(1), 7 (2013)

    Article  Google Scholar 

  5. Dunne, P.D., et al.: Cancer-cell intrinsic gene expression signatures overcome intratumoural heterogeneity bias in colorectal cancer patient classification. Nat. Commun. 8, 15657 (2017)

    Article  Google Scholar 

  6. Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On clustering validation techniques. J. Intell. Inf. Syst. 17(2), 107–145 (2001)

    Article  Google Scholar 

  7. Hanahan, D., Weinberg, R.A.: Hallmarks of cancer: the next generation. Cell 144(5), 646–674 (2011)

    Article  Google Scholar 

  8. Hasin, Y., Seldin, M., Lusis, A.: Multi-omics approaches to disease. Genome Biol. 18(1), 83 (2017)

    Article  Google Scholar 

  9. Hoadley, K.A., et al.: Cell-of-origin patterns dominate the molecular classification of 10, 000 tumors from 33 types of cancer. Cell 173, 291–304 (2018)

    Article  Google Scholar 

  10. Kaufman, L., Rousseeuw, P.: Clustering by Means of Medoids. In: Dodge, Y. (ed.) Proceedings of the Statistical Data Analysis Based on the L1 Norm Conference, Neuchatel, 1987. North-Holland (1987)

    Google Scholar 

  11. Kendall, M.G.: A new measure of rank correlation. Biometrika 30, 81–93 (1938)

    Article  Google Scholar 

  12. Kingrani, S.K., Levene, M., Zhang, D.: Estimating the number of clusters using diversity. Artif. Intell. Res. 7(1), 15 (2017)

    Article  Google Scholar 

  13. Komodakis, N., Paragios, N., Tziritas, G.: Clustering via LP-based stabilities. In: Koller, D., Schuurmans, D., Bengio, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems, vol. 21, pp. 865–872. Curran Associates, Inc., New York (2009)

    Google Scholar 

  14. Kovács, F., Legány, C., Babos, A.: Cluster validity measurement techniques. In: 6th International Symposium of Hungarian Researchers on Computational Intelligence. Citeseer (2005)

    Google Scholar 

  15. Kurian, A.W., et al.: Clinical evaluation of a multiple-gene sequencing panel for hereditary cancer risk assessment. J. Clin. Oncol. 32(19), 2001–2009 (2014)

    Article  Google Scholar 

  16. Luxburg, U.V.: A tutorial on spectral clustering. Stat. Comput. 17, 395–416 (2007)

    Article  MathSciNet  Google Scholar 

  17. MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics. University of California Press (1967)

    Google Scholar 

  18. Pepke, S., Steeg, G.V.: Comprehensive discovery of subsample gene expression components by information explanation: therapeutic implications in cancer. BMC Med. Genom. 10(1), 12 (2017)

    Article  Google Scholar 

  19. Ramaswamy, S., et al.: Multiclass cancer diagnosis using tumor gene expression signatures. Proc. Natl. Acad. Sci. 98(26), 15149–15154 (2001)

    Article  Google Scholar 

  20. Sibson, R.: SLINK: an optimally efficient algorithm for the single-link cluster method. Comput. J. 16(1), 30–34 (1973)

    Article  MathSciNet  Google Scholar 

  21. Sun, R., et al.: A radiomics approach to assess tumour-infiltrating CD 8 cells and response to anti-PD-1 or anti-PD-l1 immunotherapy: an imaging biomarker, retrospective multicohort study. Lancet Oncol. 19(9), 1180–1191 (2018)

    Article  Google Scholar 

  22. Ver Steeg, G., Galstyan, A.: Discovering structure in high-dimensional data through correlation explanation. In: Advances in Neural Information Processing Systems, pp. 577–585 (2014)

    Google Scholar 

  23. Xu, R., Wunsch II, D.: Survey of clustering algorithms. Trans. Neur. Netw. 16(3), 645–678 (2005)

    Article  Google Scholar 

Download references

Acknowledgements

We would like to acknowledge the partial support of Amazon Web Services and Pr. Stefano Soatto for fruitful discussions. We also thank Y. Boursin, M. Azoulay and Gustave Roussy Cancer Campus DTNSI team for providing the infrastructure resources used in this work. This work was supported by the Fondation pour la Recherche Médicale (FRM; no. DIC20161236437).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Enzo Battistella .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Battistella, E. et al. (2019). Gene Expression High-Dimensional Clustering Towards a Novel, Robust, Clinically Relevant and Highly Compact Cancer Signature. In: Rojas, I., Valenzuela, O., Rojas, F., Ortuño, F. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2019. Lecture Notes in Computer Science(), vol 11465. Springer, Cham. https://doi.org/10.1007/978-3-030-17938-0_41

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-17938-0_41

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-17937-3

  • Online ISBN: 978-3-030-17938-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics