Abstract
The path to personalized medicine requires the stratification of patients based on their genetic, molecular, and other characteristics to achieve the individualized treatment of complex diseases such as the breast cancer. The identification of single “biomarkers” as the driving forces for the appearance of cancer has therefore been widely pursued in the last fifteen years but with no robust results across different studies. The use of existing biological knowledge such as the gene interaction networks and regulatory pathways can be of great help, since it has been argued that cancer is caused by the deregulation of multiple biological processes in the cell. In this study we explore the usage of such biological knowledge for the tuning and adaptation of the breast cancer classification tasks both in a supervised (classifying unknown samples according to a predetermined taxonomy) and unsupervised setting (clustering of new data towards identifying new categories). The proposed methodology starts from an initial list of “seed” genes and proceeds to the expansion of their “neighborhoods” according to the topology of a given biological network. The expansion process operates in a supervised manner for the construction of the first level in a two level classification scheme. The first level base classifiers are built using the network structure and a “random walk” search strategy for the selection of the genes used in these classifiers. At the second level, a meta-classifier is trained to combine in the best possible way the results of the base classifiers. The proposed approach therefore aims to strengthen the predictive ability of the initial list of genes and provide more robust generalization guarantees. Proceeding to the unsupervised setting, the extracted gene neighborhoods around the initial “seeds” are considered as modules of highly interacting genes within the same group but of strong independence across groups. This consideration allows the introduction of a sparse Gaussian mixture model for the assignment of breast cancer samples into a set of unknown clusters. Our methodology is explained in full detail and promising results in Breast Cancer related scenarios are obtained.
Similar content being viewed by others
Notes
Using the logarithm (instead of only changing the sign, or taking the reciprocal) is not really necessary, but it transforms the values into a more “manageable” range, usually between 1 and 10.
References
Barabási AL, Gulbahce N, Loscalzo J. Network medicine: a network-based approach to human disease. Nat Rev Genet 2011;12(1):56–68.
Bidard FC, Peeters DJ, Fehm T, Nolé F, Gisbert-Criado R, Mavroudis D, Grisanti S, Generali D, Garcia-Saenz JA, Stebbing J, et al. Clinical validity of circulating tumour cells in patients with metastatic breast cancer: a pooled analysis of individual patient data. The Lancet Oncology. 2014;15(4):406–414.
Biernacki C, Celeux G, Govaert G. Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models. Computational Statistics & Data Analysis. 2003;41(3-4):561–575.
Bishop C. Pattern recognition and machine learning. New York: Springer; 2006.
Breiman L. Bagging predictors. Mach Learn. 1996.
Breiman L. Random Forests. Mach Learn. 2001;45(1):5–32.
Burnham KP, Anderson DR. Multimodel inference understanding aic and bic in model selection. Sociol Methods Res. 2004;33(2):261–304.
Can T, Çamolu O, Singh AK. Analysis of protein-protein interaction networks using random walks. Proceedings of the 5th international workshop on bioinformatics. ACM; 2005. p. 61–68.
Cho DY, Kim YA, Przytycka TM. Network biology approach to complex diseases. PLoS Comput Biol. 2012;8(12):e1002,820.
Chuang HY, Lee E, Liu YT, Lee D, Ideker T. Network-based classification of breast cancer metastasis. Mol Syst Biol. 2007:3.
Chung F. 2007. The heat kernel as the pagerank of a graph, Vol. 104.
Cristofanilli M, Broglio KR, Guarneri V, Jackson S, Fritsche HA, Islam R, Dawood S, Reuben JM, Kau SW, Lara JM, et al. Circulating tumor cells in metastatic breast cancer: biologic staging beyond tumor burden. Clinical breast cancer. 2007;7(6):34–42.
Cristofanilli M, Budd GT, Ellis MJ, Stopeck A, Matera J, Miller MC, Reuben JM, Doyle GV, Allard WJ, Terstappen LW, et al. Circulating tumor cells, disease progression, and survival in metastatic breast cancer. N Engl J Med. 2004;351(8):781–791.
Das J, Yu H. HINT: High-quality protein interactomes and their applications in understanding human disease. BMC Syst Biol. 2012.
Datta S, Datta S. Methods for evaluating clustering algorithms for gene expression data using a reference set of functional classes. BMC bioinformatics. 2006;7(1):397.
Dietterich TG. 2000. Ensemble Methods in Machine Learning.
Fawcett T. An introduction to ROC analysis. Pattern Recogn Lett. 2006;27(8):861–874.
Franken B, de Groot MR, Mastboom WJ, Vermes I, van der Palen J, Tibbe AG, Terstappen LW. Circulating tumor cells, disease recurrence and survival in newly diagnosed breast cancer. Breast Cancer Res. 2012;14(5):1.
Freund Y, Schapire RE. A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci .1997;55(1):119–139.
Giuliano M, Giordano A, Jackson S, De Giorgi U, Mego M, Cohen EN, Gao H, Anfossi S, Handy BC, Ueno NT, et al. Circulating tumor cells as early predictors of metastatic spread in breast cancer patients with limited metastatic dissemination. Breast Cancer Res. 2014;16(5):1.
Gradilone A, Naso G, Raimondi C, Cortesi E, Gandini O, Vincenzi B, Saltarelli R, Chiapparino E, Spremberg F, Cristofanilli M, et al. Circulating tumor cells (ctcs) in metastatic breast cancer (mbc): prognosis, drug resistance and phenotypic characterization. Ann Oncol. 2011;22(1):86–92.
Hofree M, Shen JP, Carter H, Gross A, Ideker T. Network-based stratification of tumor mutations. Nat Methods. 2013;10(11):1108–1115.
Huang E, Cheng S, Dressman H, Pittman J, Tsou M, Horng C, Bild A, Iversen E, Liao M, Chen C, et al. Gene expression predictors of breast cancer outcomes. The Lancet. 2003;361(9369): 1590–1596.
Janni WJ, Rack B, Terstappen LW, Pierga JY, Taran FA, Fehm T, Hall C, de Groot MR, Bidard FC, Friedl TW, et al. Pooled analysis of the prognostic relevance of circulating tumor cells in primary breast cancer. Clin Cancer Res. 2016;22(10):2583–2593.
Joosse SA, Gorges TM, Pantel K. Biology, detection, and clinical implications of circulating tumor cells. EMBO molecular medicine. 2015;7(1):1–11.
Kim TH, Lee KM, Lee SU. Generative image segmentation using random walks with restart. European conference on computer vision. Springer; 2008. p. 264–275.
Kittler J, Hatef M, Duin RPW, Matas J. On Combining Classifiers. IEEE Trans Pattern Anal Mach Intell (). 1998;20(3):226–239.
Kohavi R, John GH. Wrappersfor feature subset selection Artificial intelligence. 1997.
Lang JE, Scott JH, Wolf DM, Novak P, Punj V, Magbanua MJM, Zhu W, Mineyev N, Haqq CM, Crothers JR, Esserman LJ, Tripathy D, van t Veer L, Park JW. Expression profiling of circulating tumor cells in metastatic breast cancer. Breast Cancer Res Treat. 2015;149(1):121–131.
Leiserson MDM, Vandin F, Wu H, Dobson JR, Eldridge JV, Thomas JL, Papoutsaki A, Kim Y, Niu B, McLellan M, Lawrence MS, Gonzalez-Perez A, Tamborero D, Cheng Y, Ryslik GA, Lopez-Bigas N, Getz G, Ding L, Raphael B J. Pan-cancer network analysis identifies combinations of rare somatic mutations across pathways and protein complexes. Nat Genet. 2014:1–11.
Lovasz L. Random walks on graphs: A survey Combinatorics. 1993.
Lucci A, Hall CS, Lodhi AK, Bhattacharyya A, Anderson AE, Xiao L, Bedrosian I, Kuerer HM, Krishnamurthy S. Circulating tumour cells in non-metastatic breast cancer: a prospective study. The lancet oncology. 2012;13(7):688–695.
Maltoni R, Fici P, Amadori D, Gallerani G, Cocchi C, Zoli M, Rocca A, Cecconetto L, Folli S, Scarpi E, et al. Circulating tumor cells in early breast cancer: a connection with vascular invasion. Cancer lett. 2015;367(1):43–48.
McInnes LM, Jacobson N, Redfern A, Dowling A, Thompson EW, Saunders CM. Clinical implications of circulating tumor cells of breast cancer patients: role of epithelial–mesenchymal plasticity. Cellular and Phenotypic Plasticity in Cancer 2015:18.
McLachlan G, Peel D. Finite mixture models, Wiley-Interscience. 2000.
Meyn SP, Tweedie RL. Markov chains and stochastic stability: Springer Science & Business Media. 2012.
Mi H, Thomas P. PANTHER pathway: an ontology-based pathway database coupled with data analysis tools. Protein Networks and Pathway Analysis. 2009:123–140.
Mosca E, Alfieri R, Merelli I, Viti F, Calabria A, Milanesi L. A multilevel data integration resource for breast cancer study. BMC Syst Biol. 2010;4(1):76.
Osborne JD, Flatow J, Holko M, Lin SM, Kibbe WA, Zhu LJ, Danila MI, Feng G, Chisholm RL. Annotating the human genome with disease ontology. BMC genomics. 2009;10(Suppl 1):S6.
Pan JY, Yang HJ, Faloutsos C, Duygulu P. Automatic multimedia cross-modal correlation discovery. Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining. ACM; 2004. p. 653–658.
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E. Scikit-learn: Machine learning in Python. J Mach Learn Res. 2011;12:2825–2830.
Piñero J, Queralt-Rosinach N, Bravo À, Deu-Pons J, Bauer-Mehren A, Baron M, Sanz F, Furlong LI. 2015. Disgenet: a discovery platform for the dynamical exploration of human diseases and their genes, Database. 2015 bav028.
Powers DM. Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. J Mach Learn Technol. 2011;2(1):37–63.
Rosenlicht M. Introduction to analysis. New York: Dover; 1986.
Ruschhaupt M, Huber W, Poustka A, Mansmann U, et al. A compendium to ensure computational reproducibility in high-dimensional classification tasks. Stat Appl Genet Mol Biol. 2004;3(1):1078.
Salhia B, Kiefer J, Ross JTD, Metapally R, Martinez RA, Johnson KN, DiPerna DM, Paquette KM, Jung S, Nasser S, Wallstrom G, Tembe W, Baker A, Carpten J, Resau J, Ryken T, Sibenaller Z, Petricoin EF, Liotta LA, Ramanathan RK, Berens ME, Tran NL. Integrated genomic and epigenomic analysis of breast cancer brain metastasis. PloS one. 2014;9(1):e85,448.
Schapire RE. The strength of weak learnability. Mach Learn. 1990;5(2):197–227.
Segal E, Friedman N, Kaminski N, Regev A, Koller D. From signatures to models: understanding cancer using microarrays. Nat Genet. 2005;37:S38–S45.
Sfakianakis S, Bei ES, Zervakis M, Vassou D, Kafetzopoulos D. On the Identification of Circulating Tumor Cells in Breast Cancer. IEEE Journal of Biomedical and health informatics. 2014;18(3):773–782.
Sfakianakis S, Zervakis M, Tsiknakis M, Kafetzopoulos D. Integration of biological knowledge in the mixture-of-Gaussians analysis of genomic clustering: IEEE. 2010.
Shi M, Beauchamp RD, Zhang B. A Network-Based gene expression signature informs prognosis and treatment for colorectal cancer patients. PloS one. 2012;7(7):e41,292.
Sotiriou C, Piccart MJ. Taking gene-expression profiling to the clinic: when will molecular signatures become relevant to patient care? Nat Rev Cancer. 2007;7(7):545–553.
Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005;102(43):15,545–15,550.
Vanunu O, Magger O, Ruppin E, Shlomi T, Sharan R. Associating Genes and Protein Complexes with Disease via Network Propagation. PLoS Comput Biol. 2010;6(1):e1000,641.
Vidal M, Cusick ME, Barabási AL. Interactome Networks and Human Disease. Cell. 2011;144(6): 986–998.
Wang L Xiao Y, Ping Y, Li J, Zhao H, Li F, Hu J, Zhang H, Deng Y, Tian J, Li X. Integrating Multi-Omics for uncovering the architecture of Cross-Talking pathways in breast cancer. PloS one. 2014;9(8):e104,282.
Wolpert DH. Stacked generalization. Neural Netw. 1992;5(2):241–259.
Acknowledgments
This research is partially supported by the “ONCOSEED” project funded by the NSRF2007-13 Program of the Greek Ministry of Education, Lifelong Learning and Religious Affairs.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interests
The authors declare that they have no conflict of interest.
Additional information
This article is part of the Topical Collection on Systems Medicine
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Sfakianakis, S., Bei, E.S. & Zervakis, M. Exploratory analysis of local gene groups in breast cancer guided by biological networks. Health Technol. 7, 119–132 (2017). https://doi.org/10.1007/s12553-016-0155-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12553-016-0155-1