Integrative clustering methods of multi-omics data for molecule-based cancer classifications

Wang, Dongfang; Gu, Jin

doi:10.1007/s40484-016-0063-4

Integrative clustering methods of multi-omics data for molecule-based cancer classifications

Review
Published: 04 March 2016

Volume 4, pages 58–67, (2016)
Cite this article

Download PDF

Quantitative Biology

Integrative clustering methods of multi-omics data for molecule-based cancer classifications

Download PDF

Dongfang Wang¹ &
Jin Gu¹

3595 Accesses
21 Citations
1 Altmetric
Explore all metrics

Abstract

One goal of precise oncology is to re-classify cancer based on molecular features rather than its tissue origin. Integrative clustering of large-scale multi-omics data is an important way for molecule-based cancer classification. The data heterogeneity and the complexity of inter-omics variations are two major challenges for the integrative clustering analysis. According to the different strategies to deal with these difficulties, we summarized the clustering methods as three major categories: direct integrative clustering, clustering of clusters and regulatory integrative clustering. A few practical considerations on data pre-processing, post-clustering analysis and pathway-based analysis are also discussed.

Article PDF

Benchmarking joint multi-omics dimensionality reduction approaches for the study of cancer

Article Open access 05 January 2021

Multi-omic tumor data reveal diversity of molecular mechanisms that correlate with survival

Article Open access 26 October 2018

Multi-omics integration with weighted affinity and self-diffusion applied for cancer subtypes identification

Article Open access 19 January 2024

References

Garraway, L. A., Verweij, J. and Ballman, K. V. (2013) Precision oncology: an overview. J. Clin. Oncol., 31, 1803–1805
Article PubMed Google Scholar
Shrager, J. and Tenenbaum, J. M. (2014) Rapid learning for precision oncology. Nat. Rev. Clin. Oncol., 11, 109–118
Article PubMed Google Scholar
Hoadley, K. A., Yau, C.,Wolf, D. M., Cherniack, A. D., Tamborero, D., Ng, S., Leiserson, M. D., Niu, B., McLellan, M. D., Uzunangelov, V., et al. (2014) Multiplatform analysis of 12 cancer types reveals molecular classification within and across tissues of origin. Cell, 158, 929–944
Article CAS PubMed PubMed Central Google Scholar
Ritchie, M. D., Holzinger, E. R., Li, R., Pendergrass, S. A. and Kim, D. (2015) Methods of integrating data to uncover genotype-phenotype interactions. Nat. Rev. Genet., 16, 85–97
Article CAS PubMed Google Scholar
Liu, Z., Zhang, X. S. and Zhang, S. (2014) Breast tumor subgroups reveal diverse clinical prognostic power. Sci. Rep., 4, 4002
PubMed Google Scholar
Han, L., Yuan, Y., Zheng, S., Yang, Y., Li, J., Edgerton, M. E., Diao, L., Xu, Y., Verhaak, R. G. and Liang, H. (2014) The Pan-Cancer analysis of pseudogene expression reveals biologically and clinically relevant tumour subtypes. Nat. Commun., 5, 3963
CAS PubMed PubMed Central Google Scholar
Curtis, C., Shah, S. P., Chin, S. F., Turashvili, G., Rueda, O. M., Dunning, M. J., Speed, D., Lynch, A. G., Samarajiwa, S., Yuan, Y., et al. (2012) The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature, 486, 346–352
CAS PubMed PubMed Central Google Scholar
Cancer Genome Atlas, N. (2012) Comprehensive molecular portraits of human breast tumours. Nature, 490, 61–70
Article Google Scholar
Popat, S., Hubner, R. and Houlston, R. S. (2005) Systematic review of microsatellite instability and colorectal cancer prognosis. J. Clin. Oncol., 23, 609–618
Article CAS PubMed Google Scholar
Issa, J. P. (2004) CpG island methylator phenotype in cancer. Nat. Rev. Cancer, 4, 988–993
Article CAS PubMed Google Scholar
Kristensen, V. N., Lingjærde, O. C., Russnes, H. G., Vollan, H. K., Frigessi, A. and Børresen-Dale, A. L. (2014) Principles and methods of integrative genomic analyses in cancer. Nat. Rev. Cancer, 14, 299–313
Article CAS PubMed Google Scholar
Zhang, W., Liu, Y., Sun, N., Wang, D., Boyd-Kirkup, J., Dou, X. and Han, J. D. (2013) Integrating genomic, epigenomic, and transcriptomic features reveals modular signatures underlying poor prognosis in ovarian cancer. Cell Reports, 4, 542–553
Article CAS PubMed Google Scholar
Mo, Q., Wang, S., Seshan, V. E., Olshen, A. B., Schultz, N., Sander, C., Powers, R. S., Ladanyi, M. and Shen, R. (2013) Pattern discovery and cancer gene identification in integrated cancer genomic data. Proc. Natl. Acad. Sci. USA, 110, 4245–4250
Article CAS PubMed PubMed Central Google Scholar
Lock, E. F., Hoadley, K. A., Marron, J. S. and Nobel, A. B. (2013) Joint and Individual Variation Explained (Jive) for integrated analysis of multiple data types. Ann. Appl. Stat., 7, 523–542
Article PubMed PubMed Central Google Scholar
Wu, D., Wang, D., Gu, J. and Zhang, M. Q. (2015) Fast dimension reduction and integrative clustering of multi-omics data using low-rank approximation: application to cancer molecular classification. BMC Genomics, 16, 1022.
Article PubMed PubMed Central Google Scholar
Zhang, S., Liu, C. C., Li, W., Shen, H., Laird, P. W. and Zhou, X. J. (2012) Discovery of multi-dimensional modules by integrative analysis of cancer genomic data. Nucleic Acids Res., 40, 9379–9391
Article CAS PubMed PubMed Central Google Scholar
Drier, Y., Sheffer, M. and Domany, E. (2013) Pathway-based personalized analysis of cancer. Proc. Natl. Acad. Sci. USA, 110, 6388–6393
Article CAS PubMed PubMed Central Google Scholar
Kirk, P., Griffin, J. E., Savage, R. S., Ghahramani, Z. and Wild, D. L. (2012) Bayesian correlated clustering to integrate multiple datasets. Bioinformatics, 28, 3290–3297
Article CAS PubMed PubMed Central Google Scholar
Lock, E. F. and Dunson, D. B. (2013) Bayesian consensus clustering. Bioinformatics, 29, 2610–2616
Article CAS PubMed PubMed Central Google Scholar
Wang, B., Mezlini, A. M., Demir, F., Fiume, M., Tu, Z., Brudno, M., Haibe-Kains, B. and Goldenberg, A. (2014) Similarity network fusion for aggregating data types on a genomic scale. Nat. Methods, 11, 333–337
Article CAS PubMed Google Scholar
Vaske, C. J., Benz, S. C., Sanborn, J. Z., Earl, D., Szeto, C., Zhu, J., Haussler, D. and Stuart, J. M. (2010) Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM. Bioinformatics, 26, i237–i245
Article CAS PubMed PubMed Central Google Scholar
Shen, R., Olshen, A. B. and Ladanyi, M. (2009) Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis. Bioinformatics, 25, 2906–2912
Article CAS PubMed PubMed Central Google Scholar
Zhang, S., Li, Q., Liu, J. and Zhou, X. J. (2011) A novel computational framework for simultaneous integration of multiple types of genomic data to identify microRNA-gene regulatory modules. Bioinformatics, 27, i401–i409
Article CAS PubMed PubMed Central Google Scholar
Candes, E. J., Li, X. D., Ma, Y. and Wright, J. (2011) Robust principal component analysis? J. ACM, 58
Boyd, S. Parikh, N. Chu, E. Peleato, A B. Eckstein (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning, 3, 1–122
Article Google Scholar
Candès, E. J. and Recht, B. (2009) Exact matrix completion via convex optimization. Found. Comput. Math., 9, 717–772
Article Google Scholar
Cai, J. F., Candes, E. J. and Shen, Z. W. (2010) A singular value thresholding algorithm for matrix completion. SIAM J. Optim., 20, 1956–1982.
Article Google Scholar
Zhou, X., Liu, J., Wan, X. and Yu, W. (2014) Piecewise-constant and low-rank approximation for identification of recurrent copy number variations. Bioinformatics, 30, 1943–1949
Article CAS PubMed Google Scholar
Chung, N. C. and Storey, J. D. (2015) Statistical significance of variables driving systematic variation in high-dimensional data. Bioinformatics, 31, 545–554
Article PubMed PubMed Central Google Scholar
Linting, M., van Os, B. J. and Meulman, J. J. (2011) Statistical significance of the contribution of variables to the PCA solution: an alternative permutation strategy. Psychometrika, 76, 440–460.
Article Google Scholar
Friedman, J., Hastie, T. and Tibshirani, R. (2009) The Elements of Statistical Learning. New York: Springer-Verlag
Jain, A. K., Murty, M. N., and Flynn, P. J. (1999) Data clustering: a review. ACM computing surveys (CSUR), 31, 264–323
Article Google Scholar
Han, J., Kamber, M. and Pei, J. (2011) Data mining: concepts and techniques: concepts and techniques. San Francisco: Morgan Kaufmann
Rodriguez, A. and Laio, A. (2014) Clustering by fast search and find of density peaks. Science, 344, 1492–1496
Article CAS PubMed Google Scholar
Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003) Latent dirichlet allocation. J. Mach. Learn. Res., 3, 993–1022
Google Scholar
Nguyen, X. and Gelfand, A. E. (2011) The Dirichlet labeling process for clustering functional data. Stat. Sin., 21, 1249–1289.
Article Google Scholar
Dahl, D. B. (2006) Model-based clustering for expression data via a Dirichlet process mixture model. In Bayesian inference for gene expression and proteomics, 201–218, Cambridge: Cambridge University Press
Chapter Google Scholar
Savage, R. S., Ghahramani, Z., Griffin, J. E., Kirk, P. and Wild, D. L. (2013) Identifying cancer subtypes in glioblastoma by combining genomic, transcriptomic and epigenomic data. arXiv:1304.3577
Google Scholar
Nguyen, N. and Caruana, R. (2007) Consensus clusterings. In Data Mining, ICDM 2007. Seventh IEEE International Conference, 607–612
Google Scholar
Goder, A. and Filkov, V. (2008) Consensus Clustering Algorithms: Comparison and Refinement. in Alenex, SIAM., 109–117
Google Scholar
Girvan, M. and Newman, M. E. (2002) Community structure in social and biological networks. Proc. Natl. Acad. Sci. USA, 99, 7821–7826
Article CAS PubMed PubMed Central Google Scholar
Newman, M. E. (2006) Modularity and community structure in networks. Proc. Natl. Acad. Sci. USA, 103, 8577–8582
Article CAS PubMed PubMed Central Google Scholar
Ng, A. Y., Jordan, M. I. and Weiss, Y. (2001) On spectral clustering: Analysis and an algorithm. In Advances in Neural Information Processing Systems. 849–856, Cambridge: MIT Press
Google Scholar
von Luxburg, U. (2007) A tutorial on spectral clustering. Stat. Comput., 17, 395–416.
Article Google Scholar
Enright, A. J., van Dongen, S. and Ouzounis, C. A. (2002) An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res., 30, 1575–1584
Article CAS PubMed PubMed Central Google Scholar
Levandowsky, M. and Winter, D. (1971) Distance between sets. Nature, 234, 34–35.
Article Google Scholar
Hubert, L. and Arabie, P. (1985) Comparing partitions. J. Classif., 2, 193–218.
Article Google Scholar
Alizadeh, A. A., Aranda, V., Bardelli, A., Blanpain, C., Bock, C., Borowski, C., Caldas, C., Califano, A., Doherty, M., Elsner, M., et al. (2015) Toward understanding and exploiting tumor heterogeneity. Nat. Med., 21, 846–853
Article CAS PubMed Google Scholar
Kan, Z., Jaiswal, B. S., Stinson, J., Janakiraman, V., Bhatt, D., Stern, H. M., Yue, P., Haverty, P. M., Bourgon, R., Zheng, J., et al. (2010) Diverse somatic mutation patterns and pathway alterations in human cancers. Nature, 466, 869–873
Article CAS PubMed Google Scholar
Lohr, J. G., Stojanov, P., Lawrence, M. S., Auclair, D., Chapuy, B., Sougnez, C., Cruz-Gordillo, P., Knoechel, B., Asmann, Y.W., Slager, S. L., et al. (2012) Discovery and prioritization of somatic mutations in diffuse large B-cell lymphoma (DLBCL) by whole-exome sequencing. Proc. Natl. Acad. Sci. USA, 109, 3879–3884
Article CAS PubMed PubMed Central Google Scholar
Lawrence, M. S., Stojanov, P., Polak, P., Kryukov, G. V., Cibulskis, K., Sivachenko, A., Carter, S. L., Stewart, C., Mermel, C. H., Roberts, S. A., et al. (2013) Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature, 499, 214–218
Article CAS PubMed PubMed Central Google Scholar
Villanueva, A., Portela, A., Sayols, S., Battiston, C., Hoshida, Y., Méndez-González, J., Imbeaud, S., Letouzé, E., Hernandez-Gea, V., Cornella, H., et al. (2015) DNA methylation-based prognosis and epidrivers in hepatocellular carcinoma. Hepatology, 61, 1945–1956
Article CAS PubMed Google Scholar
Eifert, C. and Powers, R. S. (2012) From cancer genomes to oncogenic drivers, tumour dependencies and therapeutic targets. Nat. Rev. Cancer, 12, 572–578
Article CAS PubMed Google Scholar
Sanchez-Garcia, F., Villagrasa, P., Matsui, J., Kotliar, D., Castro, V., Akavia, U. D., Chen, B. J., Saucedo-Cuevas, L., Rodriguez Barrueco, R., Llobet-Navas, D., et al. (2014) Integration of genomic data enables selective discovery of breast cancer drivers. Cell, 159, 1461–1475
Article CAS PubMed PubMed Central Google Scholar
Shalem, O., Sanjana, N. E., Hartenian, E., Shi, X., Scott, D. A., Mikkelsen, T. S., Heckl, D., Ebert, B. L., Root, D. E., Doench, J. G., et al. (2014) Genome-scale CRISPR-Cas9 knockout screening in human cells. Science, 343, 84–87
Article CAS PubMed PubMed Central Google Scholar
Jiang, P., Wang, H., Li, W., Zang, C., Li, B., Wong, Y. J., Meyer, C., Liu, J. S., Aster, J. C. and Liu, X. S. (2015) Network analysis of gene essentiality in functional genomics experiments. Genome Biol., 16, 239
Article PubMed PubMed Central Google Scholar
Chen, J. C., Alvarez, M. J., Talos, F., Dhruv, H., Rieckhof, G. E., Iyer, A., Diefes, K. L., Aldape, K., Berens, M., Shen, M. M., et al. (2014) Identification of causal genetic drivers of human disease through systems-level analysis of regulatory networks. Cell, 159, 402–414
Article CAS PubMed PubMed Central Google Scholar
Fehrmann, R. S., Karjalainen, J. M., Krajewska, M., Westra, H. J., Maloney, D., Simeonov, A., Pers, T. H., Hirschhorn, J. N., Jansen, R. C., Schultes, E. A., et al. (2015) Gene expression analysis identifies global gene dosage sensitivity in cancer. Nat. Genet., 47, 115–125
Article CAS PubMed Google Scholar
Rockman, M. V. and Kruglyak, L. (2006) Genetics of global gene expression. Nat. Rev. Genet., 7, 862–872
Article CAS PubMed Google Scholar
Akavia, U. D., Litvin, O., Kim, J., Sanchez-Garcia, F., Kotliar, D., Causton, H. C., Pochanard, P., Mozes, E., Garraway, L. A. and Pe’er, D. (2010) An integrated approach to uncover drivers of cancer. Cell, 143, 1005–1017
Article CAS PubMed PubMed Central Google Scholar
Li, Q., Seo, J. H., Stranger, B., McKenna, A., Pe’er, I., Laframboise, T., Brown, M., Tyekucheva, S. and Freedman, M. L. (2013) Integrative eQTL-based analyses reveal the biology of breast cancer risk loci. Cell, 152, 633–641
Article CAS PubMed PubMed Central Google Scholar
Cancer Genome Atlas Research Network. (2014) Integrated genomic characterization of papillary thyroid carcinoma. Cell, 159, 676–690
Article Google Scholar
Leek, J. T., Scharpf, R. B., Bravo, H. C., Simcha, D., Langmead, B., Johnson, W. E., Geman, D., Baggerly, K. and Irizarry, R. A. (2010) Tackling the widespread and critical impact of batch effects in highthroughput data. Nat. Rev. Genet., 11, 733–739
Article CAS PubMed Google Scholar
Eisenberg, E. and Levanon, E. Y. (2003) Human housekeeping genes are compact. Trends Genet., 19, 362–365
Article CAS PubMed Google Scholar
van der Maaten, L. and Hinton, G. (2008) Visualizing Data using t- SNE. J. Mach. Learn. Res., 9, 2579–2605.
Google Scholar
Hoyer, P. O. (2004) Non-negative matrix factorization with sparseness constraints. J. Mach. Learn. Res., 5, 1457–1469.
Google Scholar
Lee, D. D. and Seung, H. S. (1999) Learning the parts of objects by non-negative matrix factorization. Nature, 401, 788–791
Article CAS PubMed Google Scholar
Kanehisa, M., Goto, S., Sato, Y., Furumichi, M. and Tanabe, M. (2012) KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Res., 40, D109–D114
Article CAS PubMed PubMed Central Google Scholar
Croft, D., O’Kelly, G., Wu, G., Haw, R., Gillespie, M., Matthews, L., Caudy, M., Garapati, P., Gopinath, G., Jassal, B., et al. (2011) Reactome: a database of reactions, pathways and biological processes. Nucleic Acids Res., 39, D691–D697
Article CAS PubMed PubMed Central Google Scholar
Caspi, R., Altman, T., Billington, R., Dreher, K., Foerster, H., Fulcher, C. A., Holland, T. A., Keseler, I. M., Kothari, A., Kubo, A., et al. (2014) The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of Pathway/Genome Databases. Nucleic Acids Res., 42, D459–D471
Article CAS PubMed PubMed Central Google Scholar
Livshits, A., Git, A., Fuks, G., Caldas, C. and Domany, E. (2015) Pathway-based personalized analysis of breast cancer expression data. Mol. Oncol., 9, 1471–1483
Article CAS PubMed Google Scholar
Tarca, A. L., Draghici, S., Khatri, P., Hassan, S. S., Mittal, P., Kim, J. S., Kim, C. J., Kusanovic, J. P. and Romero, R. (2009) A novel signaling pathway impact analysis. Bioinformatics, 25, 75–82
Article CAS PubMed PubMed Central Google Scholar
Paull, E. O., Carlin, D. E., Niepel, M., Sorger, P. K., Haussler, D. and Stuart, J. M. (2013) Discovering causal pathways linking genomic events to transcriptional states using Tied Diffusion Through Interacting Events (TieDIE). Bioinformatics, 29, 2757–2764
Article CAS PubMed PubMed Central Google Scholar
Hofree, M., Shen, J. P., Carter, H., Gross, A. and Ideker, T. (2013) Network-based stratification of tumor mutations. Nat. Methods, 10, 1108–1115
Article CAS PubMed PubMed Central Google Scholar
Liu, Z. and Zhang, S. (2015) Tumor characterization and stratification by integrated molecular profiles reveals essential pan-cancer features. BMC Genomics, 16, 503
Article PubMed PubMed Central Google Scholar
Cancer Genome Atlas Research Network, Weinstein, J. N., Collisson, E. A., Mills, G. B., Shaw, K. R., Ozenberger, B. A., Ellrott, K., Shmulevich, I., Sander, C. andStuart, J. M. (2013) The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet., 45, 1113–1120
Article PubMed Central Google Scholar
Cancer Genome Atlas Research Network. (2014) Comprehensive molecular characterization of gastric adenocarcinoma. Nature, 513, 202–209
Article Google Scholar
Yuan, Y., van Allen, E. M., Omberg, L., Wagle, N., Amin-Mansour, A., Sokolov, A., Byers, L. A., Xu, Y., Hess, K. R., Diao, L., et al. (2014) Assessing the clinical utility of cancer genomic and proteomic data across tumor types. Nat. Biotechnol., 32, 644–652
Article CAS PubMed PubMed Central Google Scholar
Wold, S., Martens, H. and Wold, H. (1983) The multivariate calibrationproblem in chemistry solved by the Pls Method. Lect. Notes Math., 973, 286–293.
Article Google Scholar
Bastien, P., Bertrand, F., Meyer, N. and Maumy-Bertrand, M. (2015) Deviance residuals-based sparse PLS and sparse kernel PLS regression for censored data. Bioinformatics, 31, 397–404
Article PubMed Google Scholar
Aronson, S. J. and Rehm, H. L. (2015) Building the foundation for genomics in precision medicine. Nature, 526, 336–342
Article CAS PubMed Google Scholar

Download references

Author information

Authors and Affiliations

Ministry of Education Key Laboratory of Bioinformatics and Bioinformatics Division, Center for Synthetic and Systems Biology, Tsinghua National Laboratory for Information Science and Technology/Department of Automation, Tsinghua University, Beijing, 100084, China
Dongfang Wang & Jin Gu

Authors

Dongfang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jin Gu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jin Gu.

Additional information

This article is dedicated to the Special Collection of Recent Advances in Next-Generation Bioinformatics (Ed. Xuegong Zhang).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, D., Gu, J. Integrative clustering methods of multi-omics data for molecule-based cancer classifications. Quant Biol 4, 58–67 (2016). https://doi.org/10.1007/s40484-016-0063-4

Download citation

Received: 17 November 2015
Revised: 13 January 2016
Accepted: 23 January 2016
Published: 04 March 2016
Issue Date: March 2016
DOI: https://doi.org/10.1007/s40484-016-0063-4

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Integrative clustering methods of multi-omics data for molecule-based cancer classifications

Abstract

Article PDF

Similar content being viewed by others

Benchmarking joint multi-omics dimensionality reduction approaches for the study of cancer

Multi-omic tumor data reveal diversity of molecular mechanisms that correlate with survival

Multi-omics integration with weighted affinity and self-diffusion applied for cancer subtypes identification

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Integrative clustering methods of multi-omics data for molecule-based cancer classifications

Abstract

Article PDF

Similar content being viewed by others

Benchmarking joint multi-omics dimensionality reduction approaches for the study of cancer

Multi-omic tumor data reveal diversity of molecular mechanisms that correlate with survival

Multi-omics integration with weighted affinity and self-diffusion applied for cancer subtypes identification

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation