Skip to main content

Advertisement

Log in

Exploratory analysis of local gene groups in breast cancer guided by biological networks

  • Original Paper
  • Published:
Health and Technology Aims and scope Submit manuscript

Abstract

The path to personalized medicine requires the stratification of patients based on their genetic, molecular, and other characteristics to achieve the individualized treatment of complex diseases such as the breast cancer. The identification of single “biomarkers” as the driving forces for the appearance of cancer has therefore been widely pursued in the last fifteen years but with no robust results across different studies. The use of existing biological knowledge such as the gene interaction networks and regulatory pathways can be of great help, since it has been argued that cancer is caused by the deregulation of multiple biological processes in the cell. In this study we explore the usage of such biological knowledge for the tuning and adaptation of the breast cancer classification tasks both in a supervised (classifying unknown samples according to a predetermined taxonomy) and unsupervised setting (clustering of new data towards identifying new categories). The proposed methodology starts from an initial list of “seed” genes and proceeds to the expansion of their “neighborhoods” according to the topology of a given biological network. The expansion process operates in a supervised manner for the construction of the first level in a two level classification scheme. The first level base classifiers are built using the network structure and a “random walk” search strategy for the selection of the genes used in these classifiers. At the second level, a meta-classifier is trained to combine in the best possible way the results of the base classifiers. The proposed approach therefore aims to strengthen the predictive ability of the initial list of genes and provide more robust generalization guarantees. Proceeding to the unsupervised setting, the extracted gene neighborhoods around the initial “seeds” are considered as modules of highly interacting genes within the same group but of strong independence across groups. This consideration allows the introduction of a sparse Gaussian mixture model for the assignment of breast cancer samples into a set of unknown clusters. Our methodology is explained in full detail and promising results in Breast Cancer related scenarios are obtained.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. Using the logarithm (instead of only changing the sign, or taking the reciprocal) is not really necessary, but it transforms the values into a more “manageable” range, usually between 1 and 10.

  2. http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE45965

  3. http://www.ncbi.nlm.nih.gov/unigene

  4. http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE52604

  5. http://software.broadinstitute.org/gsea/msigdb/

  6. http://www.itb.cnr.it/breastcancer/

  7. http://www.disgenet.org/

  8. http://fundo.nubic.northwestern.edu/

  9. http://pantherdb.org/

References

  1. Barabási AL, Gulbahce N, Loscalzo J. Network medicine: a network-based approach to human disease. Nat Rev Genet 2011;12(1):56–68.

    Article  Google Scholar 

  2. Bidard FC, Peeters DJ, Fehm T, Nolé F, Gisbert-Criado R, Mavroudis D, Grisanti S, Generali D, Garcia-Saenz JA, Stebbing J, et al. Clinical validity of circulating tumour cells in patients with metastatic breast cancer: a pooled analysis of individual patient data. The Lancet Oncology. 2014;15(4):406–414.

    Article  Google Scholar 

  3. Biernacki C, Celeux G, Govaert G. Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models. Computational Statistics & Data Analysis. 2003;41(3-4):561–575.

    Article  MathSciNet  MATH  Google Scholar 

  4. Bishop C. Pattern recognition and machine learning. New York: Springer; 2006.

  5. Breiman L. Bagging predictors. Mach Learn. 1996.

  6. Breiman L. Random Forests. Mach Learn. 2001;45(1):5–32.

    Article  MATH  Google Scholar 

  7. Burnham KP, Anderson DR. Multimodel inference understanding aic and bic in model selection. Sociol Methods Res. 2004;33(2):261–304.

    Article  MathSciNet  Google Scholar 

  8. Can T, Çamolu O, Singh AK. Analysis of protein-protein interaction networks using random walks. Proceedings of the 5th international workshop on bioinformatics. ACM; 2005. p. 61–68.

  9. Cho DY, Kim YA, Przytycka TM. Network biology approach to complex diseases. PLoS Comput Biol. 2012;8(12):e1002,820.

    Article  Google Scholar 

  10. Chuang HY, Lee E, Liu YT, Lee D, Ideker T. Network-based classification of breast cancer metastasis. Mol Syst Biol. 2007:3.

  11. Chung F. 2007. The heat kernel as the pagerank of a graph, Vol. 104.

  12. Cristofanilli M, Broglio KR, Guarneri V, Jackson S, Fritsche HA, Islam R, Dawood S, Reuben JM, Kau SW, Lara JM, et al. Circulating tumor cells in metastatic breast cancer: biologic staging beyond tumor burden. Clinical breast cancer. 2007;7(6):34–42.

    Article  Google Scholar 

  13. Cristofanilli M, Budd GT, Ellis MJ, Stopeck A, Matera J, Miller MC, Reuben JM, Doyle GV, Allard WJ, Terstappen LW, et al. Circulating tumor cells, disease progression, and survival in metastatic breast cancer. N Engl J Med. 2004;351(8):781–791.

    Article  Google Scholar 

  14. Das J, Yu H. HINT: High-quality protein interactomes and their applications in understanding human disease. BMC Syst Biol. 2012.

  15. Datta S, Datta S. Methods for evaluating clustering algorithms for gene expression data using a reference set of functional classes. BMC bioinformatics. 2006;7(1):397.

    Article  Google Scholar 

  16. Dietterich TG. 2000. Ensemble Methods in Machine Learning.

  17. Fawcett T. An introduction to ROC analysis. Pattern Recogn Lett. 2006;27(8):861–874.

    Article  MathSciNet  Google Scholar 

  18. Franken B, de Groot MR, Mastboom WJ, Vermes I, van der Palen J, Tibbe AG, Terstappen LW. Circulating tumor cells, disease recurrence and survival in newly diagnosed breast cancer. Breast Cancer Res. 2012;14(5):1.

    Article  Google Scholar 

  19. Freund Y, Schapire RE. A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci .1997;55(1):119–139.

    Article  MathSciNet  MATH  Google Scholar 

  20. Giuliano M, Giordano A, Jackson S, De Giorgi U, Mego M, Cohen EN, Gao H, Anfossi S, Handy BC, Ueno NT, et al. Circulating tumor cells as early predictors of metastatic spread in breast cancer patients with limited metastatic dissemination. Breast Cancer Res. 2014;16(5):1.

    Article  Google Scholar 

  21. Gradilone A, Naso G, Raimondi C, Cortesi E, Gandini O, Vincenzi B, Saltarelli R, Chiapparino E, Spremberg F, Cristofanilli M, et al. Circulating tumor cells (ctcs) in metastatic breast cancer (mbc): prognosis, drug resistance and phenotypic characterization. Ann Oncol. 2011;22(1):86–92.

    Article  Google Scholar 

  22. Hofree M, Shen JP, Carter H, Gross A, Ideker T. Network-based stratification of tumor mutations. Nat Methods. 2013;10(11):1108–1115.

    Article  Google Scholar 

  23. Huang E, Cheng S, Dressman H, Pittman J, Tsou M, Horng C, Bild A, Iversen E, Liao M, Chen C, et al. Gene expression predictors of breast cancer outcomes. The Lancet. 2003;361(9369): 1590–1596.

    Article  Google Scholar 

  24. Janni WJ, Rack B, Terstappen LW, Pierga JY, Taran FA, Fehm T, Hall C, de Groot MR, Bidard FC, Friedl TW, et al. Pooled analysis of the prognostic relevance of circulating tumor cells in primary breast cancer. Clin Cancer Res. 2016;22(10):2583–2593.

    Article  Google Scholar 

  25. Joosse SA, Gorges TM, Pantel K. Biology, detection, and clinical implications of circulating tumor cells. EMBO molecular medicine. 2015;7(1):1–11.

    Article  Google Scholar 

  26. Kim TH, Lee KM, Lee SU. Generative image segmentation using random walks with restart. European conference on computer vision. Springer; 2008. p. 264–275.

  27. Kittler J, Hatef M, Duin RPW, Matas J. On Combining Classifiers. IEEE Trans Pattern Anal Mach Intell (). 1998;20(3):226–239.

    Article  Google Scholar 

  28. Kohavi R, John GH. Wrappersfor feature subset selection Artificial intelligence. 1997.

  29. Lang JE, Scott JH, Wolf DM, Novak P, Punj V, Magbanua MJM, Zhu W, Mineyev N, Haqq CM, Crothers JR, Esserman LJ, Tripathy D, van t Veer L, Park JW. Expression profiling of circulating tumor cells in metastatic breast cancer. Breast Cancer Res Treat. 2015;149(1):121–131.

    Article  Google Scholar 

  30. Leiserson MDM, Vandin F, Wu H, Dobson JR, Eldridge JV, Thomas JL, Papoutsaki A, Kim Y, Niu B, McLellan M, Lawrence MS, Gonzalez-Perez A, Tamborero D, Cheng Y, Ryslik GA, Lopez-Bigas N, Getz G, Ding L, Raphael B J. Pan-cancer network analysis identifies combinations of rare somatic mutations across pathways and protein complexes. Nat Genet. 2014:1–11.

  31. Lovasz L. Random walks on graphs: A survey Combinatorics. 1993.

  32. Lucci A, Hall CS, Lodhi AK, Bhattacharyya A, Anderson AE, Xiao L, Bedrosian I, Kuerer HM, Krishnamurthy S. Circulating tumour cells in non-metastatic breast cancer: a prospective study. The lancet oncology. 2012;13(7):688–695.

    Article  Google Scholar 

  33. Maltoni R, Fici P, Amadori D, Gallerani G, Cocchi C, Zoli M, Rocca A, Cecconetto L, Folli S, Scarpi E, et al. Circulating tumor cells in early breast cancer: a connection with vascular invasion. Cancer lett. 2015;367(1):43–48.

    Article  Google Scholar 

  34. McInnes LM, Jacobson N, Redfern A, Dowling A, Thompson EW, Saunders CM. Clinical implications of circulating tumor cells of breast cancer patients: role of epithelial–mesenchymal plasticity. Cellular and Phenotypic Plasticity in Cancer 2015:18.

  35. McLachlan G, Peel D. Finite mixture models, Wiley-Interscience. 2000.

  36. Meyn SP, Tweedie RL. Markov chains and stochastic stability: Springer Science & Business Media. 2012.

  37. Mi H, Thomas P. PANTHER pathway: an ontology-based pathway database coupled with data analysis tools. Protein Networks and Pathway Analysis. 2009:123–140.

  38. Mosca E, Alfieri R, Merelli I, Viti F, Calabria A, Milanesi L. A multilevel data integration resource for breast cancer study. BMC Syst Biol. 2010;4(1):76.

    Article  Google Scholar 

  39. Osborne JD, Flatow J, Holko M, Lin SM, Kibbe WA, Zhu LJ, Danila MI, Feng G, Chisholm RL. Annotating the human genome with disease ontology. BMC genomics. 2009;10(Suppl 1):S6.

    Article  Google Scholar 

  40. Pan JY, Yang HJ, Faloutsos C, Duygulu P. Automatic multimedia cross-modal correlation discovery. Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining. ACM; 2004. p. 653–658.

  41. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E. Scikit-learn: Machine learning in Python. J Mach Learn Res. 2011;12:2825–2830.

    MathSciNet  MATH  Google Scholar 

  42. Piñero J, Queralt-Rosinach N, Bravo À, Deu-Pons J, Bauer-Mehren A, Baron M, Sanz F, Furlong LI. 2015. Disgenet: a discovery platform for the dynamical exploration of human diseases and their genes, Database. 2015 bav028.

  43. Powers DM. Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. J Mach Learn Technol. 2011;2(1):37–63.

    MathSciNet  Google Scholar 

  44. Rosenlicht M. Introduction to analysis. New York: Dover; 1986.

    MATH  Google Scholar 

  45. Ruschhaupt M, Huber W, Poustka A, Mansmann U, et al. A compendium to ensure computational reproducibility in high-dimensional classification tasks. Stat Appl Genet Mol Biol. 2004;3(1):1078.

    Article  MathSciNet  MATH  Google Scholar 

  46. Salhia B, Kiefer J, Ross JTD, Metapally R, Martinez RA, Johnson KN, DiPerna DM, Paquette KM, Jung S, Nasser S, Wallstrom G, Tembe W, Baker A, Carpten J, Resau J, Ryken T, Sibenaller Z, Petricoin EF, Liotta LA, Ramanathan RK, Berens ME, Tran NL. Integrated genomic and epigenomic analysis of breast cancer brain metastasis. PloS one. 2014;9(1):e85,448.

    Article  Google Scholar 

  47. Schapire RE. The strength of weak learnability. Mach Learn. 1990;5(2):197–227.

    Google Scholar 

  48. Segal E, Friedman N, Kaminski N, Regev A, Koller D. From signatures to models: understanding cancer using microarrays. Nat Genet. 2005;37:S38–S45.

    Article  Google Scholar 

  49. Sfakianakis S, Bei ES, Zervakis M, Vassou D, Kafetzopoulos D. On the Identification of Circulating Tumor Cells in Breast Cancer. IEEE Journal of Biomedical and health informatics. 2014;18(3):773–782.

    Article  Google Scholar 

  50. Sfakianakis S, Zervakis M, Tsiknakis M, Kafetzopoulos D. Integration of biological knowledge in the mixture-of-Gaussians analysis of genomic clustering: IEEE. 2010.

  51. Shi M, Beauchamp RD, Zhang B. A Network-Based gene expression signature informs prognosis and treatment for colorectal cancer patients. PloS one. 2012;7(7):e41,292.

    Article  Google Scholar 

  52. Sotiriou C, Piccart MJ. Taking gene-expression profiling to the clinic: when will molecular signatures become relevant to patient care? Nat Rev Cancer. 2007;7(7):545–553.

    Article  Google Scholar 

  53. Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005;102(43):15,545–15,550.

    Article  Google Scholar 

  54. Vanunu O, Magger O, Ruppin E, Shlomi T, Sharan R. Associating Genes and Protein Complexes with Disease via Network Propagation. PLoS Comput Biol. 2010;6(1):e1000,641.

    Article  MathSciNet  Google Scholar 

  55. Vidal M, Cusick ME, Barabási AL. Interactome Networks and Human Disease. Cell. 2011;144(6): 986–998.

    Article  Google Scholar 

  56. Wang L Xiao Y, Ping Y, Li J, Zhao H, Li F, Hu J, Zhang H, Deng Y, Tian J, Li X. Integrating Multi-Omics for uncovering the architecture of Cross-Talking pathways in breast cancer. PloS one. 2014;9(8):e104,282.

    Article  Google Scholar 

  57. Wolpert DH. Stacked generalization. Neural Netw. 1992;5(2):241–259.

    Article  Google Scholar 

Download references

Acknowledgments

This research is partially supported by the “ONCOSEED” project funded by the NSRF2007-13 Program of the Greek Ministry of Education, Lifelong Learning and Religious Affairs.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stelios Sfakianakis.

Ethics declarations

Conflict of interests

The authors declare that they have no conflict of interest.

Additional information

This article is part of the Topical Collection on Systems Medicine

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sfakianakis, S., Bei, E.S. & Zervakis, M. Exploratory analysis of local gene groups in breast cancer guided by biological networks. Health Technol. 7, 119–132 (2017). https://doi.org/10.1007/s12553-016-0155-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12553-016-0155-1

Keywords

Navigation