Abstract
Precision medicine, a highly disruptive paradigm shift in healthcare targeting the personalizing treatment, heavily relies on genomic data. However, the complexity of the biological interactions, the important number of genes as well as the lack of substantial patient’s clinical data consist a tremendous bottleneck on the clinical implementation of precision medicine. In this work, we introduce a generic, low dimensional gene signature that represents adequately the tumor type. Our gene signature is produced using LP-stability algorithm, a high dimensional center-based unsupervised clustering algorithm working in the dual domain, and is very versatile as it can consider any arbitrary distance metric between genes. The gene signature produced by LP-stability reports at least 10 times better statistical significance and \(35\%\) better biological significance than the ones produced by two referential unsupervised clustering methods. Moreover, our experiments demonstrate that our low dimensional biomarker (27 genes) surpass significantly existing state of the art methods both in terms of qualitative and quantitative assessment while providing better associations to tumor types than methods widely used in the literature that rely on several omics data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Center BITGDA: Analysis-ready standardized TCGA data from broad GDAC firehose 2016\(\_\)01\(\_\)28 run (2016)
Cowen, L., Ideker, T., Raphael, B.J., Sharan, R.: Network propagation: a universal amplifier of genetic associations. Nat. Rev. Genet. 18(9), 551–562 (2017)
van Dam, S., Võsa, U., van der Graaf, A., Franke, L., de Magalhães, J.P.: Gene co-expression analysis for functional classification and gene-disease predictions. Brief. Bioinf. 19(4), 575–592 (2018). bbw139
Drucker, E., Krapfenbauer, K.: Pitfalls and limitations in translation from biomarker discovery to clinical utility in predictive and personalised medicine. EPMA J. 4(1), 7 (2013)
Dunne, P.D., et al.: Cancer-cell intrinsic gene expression signatures overcome intratumoural heterogeneity bias in colorectal cancer patient classification. Nat. Commun. 8, 15657 (2017)
Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On clustering validation techniques. J. Intell. Inf. Syst. 17(2), 107–145 (2001)
Hanahan, D., Weinberg, R.A.: Hallmarks of cancer: the next generation. Cell 144(5), 646–674 (2011)
Hasin, Y., Seldin, M., Lusis, A.: Multi-omics approaches to disease. Genome Biol. 18(1), 83 (2017)
Hoadley, K.A., et al.: Cell-of-origin patterns dominate the molecular classification of 10, 000 tumors from 33 types of cancer. Cell 173, 291–304 (2018)
Kaufman, L., Rousseeuw, P.: Clustering by Means of Medoids. In: Dodge, Y. (ed.) Proceedings of the Statistical Data Analysis Based on the L1 Norm Conference, Neuchatel, 1987. North-Holland (1987)
Kendall, M.G.: A new measure of rank correlation. Biometrika 30, 81–93 (1938)
Kingrani, S.K., Levene, M., Zhang, D.: Estimating the number of clusters using diversity. Artif. Intell. Res. 7(1), 15 (2017)
Komodakis, N., Paragios, N., Tziritas, G.: Clustering via LP-based stabilities. In: Koller, D., Schuurmans, D., Bengio, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems, vol. 21, pp. 865–872. Curran Associates, Inc., New York (2009)
Kovács, F., Legány, C., Babos, A.: Cluster validity measurement techniques. In: 6th International Symposium of Hungarian Researchers on Computational Intelligence. Citeseer (2005)
Kurian, A.W., et al.: Clinical evaluation of a multiple-gene sequencing panel for hereditary cancer risk assessment. J. Clin. Oncol. 32(19), 2001–2009 (2014)
Luxburg, U.V.: A tutorial on spectral clustering. Stat. Comput. 17, 395–416 (2007)
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics. University of California Press (1967)
Pepke, S., Steeg, G.V.: Comprehensive discovery of subsample gene expression components by information explanation: therapeutic implications in cancer. BMC Med. Genom. 10(1), 12 (2017)
Ramaswamy, S., et al.: Multiclass cancer diagnosis using tumor gene expression signatures. Proc. Natl. Acad. Sci. 98(26), 15149–15154 (2001)
Sibson, R.: SLINK: an optimally efficient algorithm for the single-link cluster method. Comput. J. 16(1), 30–34 (1973)
Sun, R., et al.: A radiomics approach to assess tumour-infiltrating CD 8 cells and response to anti-PD-1 or anti-PD-l1 immunotherapy: an imaging biomarker, retrospective multicohort study. Lancet Oncol. 19(9), 1180–1191 (2018)
Ver Steeg, G., Galstyan, A.: Discovering structure in high-dimensional data through correlation explanation. In: Advances in Neural Information Processing Systems, pp. 577–585 (2014)
Xu, R., Wunsch II, D.: Survey of clustering algorithms. Trans. Neur. Netw. 16(3), 645–678 (2005)
Acknowledgements
We would like to acknowledge the partial support of Amazon Web Services and Pr. Stefano Soatto for fruitful discussions. We also thank Y. Boursin, M. Azoulay and Gustave Roussy Cancer Campus DTNSI team for providing the infrastructure resources used in this work. This work was supported by the Fondation pour la Recherche Médicale (FRM; no. DIC20161236437).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Battistella, E. et al. (2019). Gene Expression High-Dimensional Clustering Towards a Novel, Robust, Clinically Relevant and Highly Compact Cancer Signature. In: Rojas, I., Valenzuela, O., Rojas, F., Ortuño, F. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2019. Lecture Notes in Computer Science(), vol 11465. Springer, Cham. https://doi.org/10.1007/978-3-030-17938-0_41
Download citation
DOI: https://doi.org/10.1007/978-3-030-17938-0_41
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-17937-3
Online ISBN: 978-3-030-17938-0
eBook Packages: Computer ScienceComputer Science (R0)