UMAP guided topological analysis of transcriptomic data for cancer subtyping

Rather, Arif Ahmad; Chachoo, Manzoor Ahmad

doi:10.1007/s41870-022-01048-y

UMAP guided topological analysis of transcriptomic data for cancer subtyping

Original Research
Published: 26 August 2022

Volume 14, pages 2855–2865, (2022)
Cite this article

International Journal of Information Technology Aims and scope Submit manuscript

169 Accesses
2 Citations
Explore all metrics

Abstract

Clustering cancer patients into different homogenous subgroups can facilitate the development of subgroup specific therapies. This forms the fundamental principle in personalised medicine. However, the process is complex because of greater variation in the phenotypic and genotypic characteristics of patients involved, even within the same cancer type. Consequently, most of the proposed methods fail to guarantee separability of patients with regard to subtype-specific Kaplan–Meier survival plots. In this study, we propose a novel clustering framework for patient subtyping based on the ideas from algebraic topology and manifold learning. The proposed method is able to discover subtypes that have statistically significant dissimilarity in survival outcome. The methodology is tested on three cancer datasets obtained via The Cancer Genome Atlas and the results are quantified in terms of Restricted Life Expectancy Difference and the \(cox\) log-rank p value. The novelty of our methodology is that it is independent of the notion of similarity used and able to discover subtypes that have significant difference in terms of Kaplan–Meier survival plots even if it uses a single omics profile of patients.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Spatial transcriptomics: a new frontier in cancer research

Article Open access 04 June 2024

Multi-omics identification of an immunogenic cell death-related signature for clear cell renal cell carcinoma in the context of 3P medicine and based on a 101-combination machine learning computational framework

Article 31 May 2023

Perseus: A Bioinformatics Platform for Integrative Analysis of Proteomics Data in Cancer Research

Abbreviations

TCGA:: The Cancer Genome Atlas
CoD:: Curse of Dimensionality
RLED:: Restricted Life Expectancy Difference
RMST:: Restricted Mean Survival Time
UMAP:: Uniform Manifold Approximation and Projection
t-SNE:: t-Distributed Stochastic Neighbourhood Embedding
SNF:: Similarity Network Fusion
RSC:: Robust and Sparse Correlation
PCA:: Principal Component Analysis

References

Saria S, Goldenberg A (2015) Subtyping: what it is and its role in precision medicine. IEEE Intell Syst 30:70–75. https://doi.org/10.1109/MIS.2015.60
Article Google Scholar
Zhao L, Lee VHF, Ng MK et al (2019) Molecular subtyping of cancer: current status and moving toward clinical applications. Brief Bioinform 20:572–584. https://doi.org/10.1093/bib/bby026
Article Google Scholar
Golub TR, Slonim DK, Tamayo P et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science (80-) 286:531–527. https://doi.org/10.1126/science.286.5439.531
Article Google Scholar
Seemann L, Shulman J, Gunaratne GH (2012) A robust topology-based algorithm for gene expression profiling. ISRN Bioinform 2012:1–11. https://doi.org/10.5402/2012/381023
Article Google Scholar
Liu Y, Hayes DN, Nobel A, Marron JS (2008) Statistical significance of clustering for high-dimension, low-sample size data. J Am Stat Assoc 103:1281–1293. https://doi.org/10.1198/016214508000000454
Article MathSciNet MATH Google Scholar
Oyelade J, Isewon I, Oladipupo F et al (2016) Clustering algorithms: their application to gene expression data. Bioinform Biol Insights 10:237–253. https://doi.org/10.4137/BBI.S38316
Article Google Scholar
Altman N, Krzywinski M (2018) The curse(s) of dimensionality this-month. Nat Methods 15:399–400. https://doi.org/10.1038/s41592-018-0019-x
Article Google Scholar
Beyer K, Goldstein J, Ramakrishnan R, Shaft U (1998) When is “nearest neighbor” meaningful? In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence lecture notes in bioinformatics), vol. 1540. pp 217–235. https://doi.org/10.1007/3-540-49257-7_15
Reuter JA, Spacek DV, Snyder MP (2015) High-throughput sequencing technologies. Mol Cell 58:586–597. https://doi.org/10.1016/j.molcel.2015.05.004
Article Google Scholar
Brunet J, Tamayo P, Golub TR, Mesirov JP (2004) Metagenes and molecular pattern discovery using matrix factorization. Proc Natl Acad Sci. https://doi.org/10.1073/pnas.0308531101
Article Google Scholar
McLachlan GJ, Bean RW, Peel D (2002) A mixture model-based approach to the clustering of microarray expression data. Bioinformatics 18:413–422. https://doi.org/10.1093/bioinformatics/18.3.413
Article Google Scholar
Handhayani T, Hiryanto L (2015) Intelligent kernel K-means for clustering gene expression. Procedia Comput Sci 59:171–177. https://doi.org/10.1016/j.procs.2015.07.544
Article Google Scholar
Perou CM, Sørile T, Eisen MB et al (2000) Molecular portraits of human breast tumours. Nature 406:747–752. https://doi.org/10.1038/35021093
Article Google Scholar
Rappoport N, Shamir R, Schwartz R (2019) NEMO: cancer subtyping by integration of partial multi-omic data. Bioinformatics 35:3348–3356. https://doi.org/10.1093/bioinformatics/btz058
Article Google Scholar
Wang B, Mezlini AM, Demir F et al (2014) Similarity network fusion for aggregating data types on a genomic scale. Nat Methods 11:333–337. https://doi.org/10.1038/nmeth.2810
Article Google Scholar
Shen R, Olshen AB, Ladanyi M (2009) Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis. Bioinformatics 25:2906–2912. https://doi.org/10.1093/bioinformatics/btp543
Article Google Scholar
Speicher NK, Pfeifer N (2015) Integrating different data types by regularized unsupervised multiple kernel learning with application to cancer subtype discovery. Bioinformatics 31:i268–i275. https://doi.org/10.1093/bioinformatics/btv244
Article Google Scholar
Andrew YN (2017) On spectral clustering: analysis and an algorithm. Encycl Mach Learn Data Min. https://doi.org/10.1007/978-1-4899-7687-1_100437
Article Google Scholar
Coretto P, Serra A, Tagliaferri R (2018) Robust clustering of noisy high-dimensional gene expression data for patients subtyping. Bioinformatics 34:4064–4072. https://doi.org/10.1093/bioinformatics/bty502
Article Google Scholar
Serra A, Coretto P, Fratello M, Tagliaferri R (2018) Robust and sparse correlation matrix estimation for the analysis of high-dimensional genomics data. Bioinformatics 34:625–634. https://doi.org/10.1093/bioinformatics/btx642
Article Google Scholar
Lin ZI, Zhang X (2005) Mining the structural knowledge of high-dimensional medical data using Isomap. Med Biol Eng Comput 43:410–412. https://doi.org/10.1007/BF02345820
Article Google Scholar
Van Der ML, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9:2579–2605
MATH Google Scholar
Gan Y, Li N, Zou G et al (2018) Identification of cancer subtypes from single-cell RNA-seq data using a consensus clustering method. BMC Med Genom. https://doi.org/10.1186/s12920-018-0433-z
Article Google Scholar
Rafique O, Mir AH (2020) Weighted dimensionality reduction and robust Gaussian mixture model based cancer patient subtyping from gene expression data. J Biomed Inform 112:103620. https://doi.org/10.1016/j.jbi.2020.103620
Article Google Scholar
Becht E, McInnes L, Healy J et al (2019) Dimensionality reduction for visualizing single-cell data using UMAP. Nat Biotechnol 37:38–47. https://doi.org/10.1038/nbt.4314
Article Google Scholar
Hu F, Zhou Y, Wang Q et al (2019) Gene expression classification of lung adenocarcinoma into molecular subtypes. IEEE/ACM Trans Comput Biol Bioinform. https://doi.org/10.1109/tcbb.2019.2905553
Article Google Scholar
Monti S, Tamayo P, Mesirov J, Golub T (2003) Consensus clustering a resampling-based method for class discovery and visualization of gene expression microarray data. Mach Learn 52:91–118. https://doi.org/10.1023/A:1023949509487
Article MATH Google Scholar
Kaplan EL, Meier P (1958) Nonparametric estimation from incomplete observations. J Am Stat Assoc 53:457–481. https://doi.org/10.2307/2281868
Article MathSciNet MATH Google Scholar
Ahmad A, Fröhlich H, Fro H (2017) Towards clinically more relevant dissection of patient heterogeneity via survival-based Bayesian clustering. Bioinformatics 33:3558–3566. https://doi.org/10.1093/bioinformatics/btx464
Article Google Scholar
Gurjeet S (2007) Topological methods for the analysis of high dimensional data sets and 3D object recognition. Eurographics Symp Point-Based Graph 151:2551–2552. https://doi.org/10.2312/SPBG/SPBG07/091-100
Article Google Scholar
McInnes L, Healy J, Melville J (2018) UMAP: uniform manifold approximation and projection for dimension reduction. https://doi.org/10.48550/arXiv.1802.03426
Nicolau M, Levine AJ, Carlsson G (2011) Topology based data analysis identifies a subgroup of breast cancers with a unique mutational profile and excellent survival. Proc Natl Acad Sci USA 108:7265–7270. https://doi.org/10.1073/pnas.1102826108
Article Google Scholar
Royston P, Parmar MKB (2013) Restricted mean survival time: an alternative to the hazard ratio for the design and analysis of randomized trials with a time-to-event outcome. BMC Med Res Methodol 13:152. https://doi.org/10.1186/1471-2288-13-152
Article Google Scholar
Diaz-Papkovich A, Anderson-Trocmé L, Gravel S (2018) Revealing multi-scale population structure in large cohorts. bioRxiv. https://doi.org/10.1101/423632
Article Google Scholar
Rather AA, Chachoo MA (2022) Manifold learning based robust clustering of gene expression data for cancer subtyping. Inform Med Unlocked 30:100907. https://doi.org/10.1016/j.imu.2022.100907
Article Google Scholar
Cao K, Bai X, Hong Y, Wan L (2020) Unsupervised topological alignment for single-cell multi-omics integration. bioRxiv. https://doi.org/10.1101/2020.02.02.931394
Article Google Scholar
Lum PY, Singh G, Lehman A et al (2013) Extracting insights from the shape of complex data using topology. Sci Rep 3:1–8. https://doi.org/10.1038/srep01236
Article Google Scholar
Xu T, Le TD, Liu L et al (2017) CancerSubtypes: an R/Bioconductor package for molecular cancer subtype identification, validation and visualization. Bioinformatics 33:3131–3133. https://doi.org/10.1093/bioinformatics/btx378
Article Google Scholar
Yang J, Su AI, Li WH (2005) Gene expression evolves faster in narrowly than in broadly expressed mammalian genes. Mol Biol Evol 22:2113–2118. https://doi.org/10.1093/molbev/msi206
Article Google Scholar
Månsson R, Tsapogas P, Åkerlund M et al (2004) Pearson correlation analysis of microarray data allows for the identification of genetic targets for early B-cell factor. J Biol Chem 279:17905–17913. https://doi.org/10.1074/jbc.M400589200
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Sciences, University of Kashmir, Srinagar, JK, India
Arif Ahmad Rather & Manzoor Ahmad Chachoo

Authors

Arif Ahmad Rather
View author publications
You can also search for this author in PubMed Google Scholar
Manzoor Ahmad Chachoo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Arif Ahmad Rather or Manzoor Ahmad Chachoo.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Rather, A.A., Chachoo, M.A. UMAP guided topological analysis of transcriptomic data for cancer subtyping. Int. j. inf. tecnol. 14, 2855–2865 (2022). https://doi.org/10.1007/s41870-022-01048-y

Download citation

Received: 13 April 2022
Accepted: 22 July 2022
Published: 26 August 2022
Issue Date: October 2022
DOI: https://doi.org/10.1007/s41870-022-01048-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

UMAP guided topological analysis of transcriptomic data for cancer subtyping

Abstract

Access this article

Similar content being viewed by others

Spatial transcriptomics: a new frontier in cancer research

Multi-omics identification of an immunogenic cell death-related signature for clear cell renal cell carcinoma in the context of 3P medicine and based on a 101-combination machine learning computational framework

Perseus: A Bioinformatics Platform for Integrative Analysis of Proteomics Data in Cancer Research

Abbreviations

References

Author information

Authors and Affiliations

Corresponding authors

Rights and permissions

About this article

Cite this article

Keywords

Navigation

UMAP guided topological analysis of transcriptomic data for cancer subtyping

Abstract

Access this article

Similar content being viewed by others

Spatial transcriptomics: a new frontier in cancer research

Multi-omics identification of an immunogenic cell death-related signature for clear cell renal cell carcinoma in the context of 3P medicine and based on a 101-combination machine learning computational framework

Perseus: A Bioinformatics Platform for Integrative Analysis of Proteomics Data in Cancer Research

Abbreviations

References

Author information

Authors and Affiliations

Corresponding authors

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation