Abstract
We introduce a network-based approach to identify subnets of functionally-related genes for predicting 5-year survivability of breast cancer patients treated with chemotherapy, hormone therapy, and a combination of these. A gene expression dataset and a protein-protein interaction network are integrated to construct a weighted graph, where edge weight expresses the predictability of the two corresponding genes in predicting the class. We propose a scoring criterion to measure the density of a weighted sub-graph, which is also an estimation of its predictive power. Thus, we can identify an optimally-dense sub-network for each seed gene, and then evaluate that sub-network by classification method. Finally, among the sub-networks whose classification performance greater than a given threshold, we search for an optimal set of sub-networks that can further improve classification performance via a voting scheme. We significantly improved the results of existing approaches. For each type of treatment, our best prediction model can reach 85% accuracy or more. Many selected sub-networks used to construct the voting models contain breast/other cancer-related genes including SP1, TP53, MYC, NOG, and many more, providing pieces of evidence for down-stream analysis.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
DeSantis, C.E., Ma, J., Goding Sauer, A., Newman, L.A., Jemal, A.: Breast cancer statistics, 2017, racial disparity in mortality by state. CA Cancer J. Clin. 67(6), 439–448 (2017). https://doi.org/10.3322/caac.21412
American Cancer Society: Breast cancer facts & figures 2017–2018 (2017)
O’Shaughnessy, J.: Extending survival with chemotherapy in metastatic breast cancer. Oncologist 10(Suppl. 3), 20–29 (2005)
Van’t Veer, L.J., et al.: Gene expression profiling predicts clinical outcome of breast cancer. Nature 415(6871), 530 (2002). https://doi.org/10.1038/415530a
Pereira, B., et al.: The somatic mutation profiles of 2,433 breast cancers refine their genomic and transcriptomic landscapes. Nat. Commun. 7, 11479 (2016). https://doi.org/10.1038/ncomms11479
Ferroni, P., Zanzotto, F.M., Riondino, S., Scarpato, N., Guadagni, F., Roselli, M.: Breast cancer prognosis using a machine learning approach. Cancers 11(3), 328 (2019). https://doi.org/10.3390/cancers11030328
Peng, H., Long, F., Ding, C.: Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 8, 1226–1238 (2005). https://doi.org/10.1109/TPAMI.2005.159
Huy, P.Q., Ngom, A., Rueda, L.: PAFS-an efficient method for classifier-specific feature selection. In: 2016 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1–8. IEEE, December 2016. https://doi.org/10.1109/ssci.2016.7850131
Chuang, H.Y., Lee, E., Liu, Y.T., Lee, D., Ideker, T.: Network‐based classification of breast cancer metastasis. Mol. Syst. Biol. 3(1) (2007). https://doi.org/10.1038/msb4100180
Allahyar, A., De Ridder, J.: FERAL: network-based classifier with application to breast cancer outcome prediction. Bioinformatics 31(12), i311–i319 (2015). https://doi.org/10.1093/bioinformatics/btv255
Wang, X., Gulbahce, N., Yu, H.: Network-based methods for human disease gene prediction. Brief. Funct. Genomics 10(5), 280–293 (2011). https://doi.org/10.1093/bfgp/elr024
Li, J., et al.: Mining disease genes using integrated protein–protein interaction and gene–gene co-regulation information. FEBS Open Bio 5, 251–256 (2015). https://doi.org/10.1016/j.fob.2015.03.011
Amgalan, B., Lee, H.: WMAXC: a weighted maximum clique method for identifying condition-specific sub-network. PLoS ONE 9(8), e104993 (2014). https://doi.org/10.1371/journal.pone.0104993
He, H., Lin, D., Zhang, J., Wang, Y.P., Deng, H.W.: Comparison of statistical methods for subnetwork detection in the integration of gene expression and protein interaction network. BMC Bioinform. 18(1), 149 (2017). https://doi.org/10.1186/s12859-017-1567-2
van Dam, S., Vosa, U., van der Graaf, A., Franke, L., de Magalhaes, J.P.: Gene co-expression analysis for functional classification and gene–disease predictions. Brief. Bioinform. 19(4), 575–592 (2017). https://doi.org/10.1093/bib/bbw139
Mucaki, E.J., et al.: Predicting outcomes of hormone and chemotherapy in the molecular taxonomy of breast cancer international consortium (METABRIC) study by biochemically-inspired machine learning. F1000Res. 5 (2016). https://doi.org/10.12688/f1000research.9417.3
Wyner, A.D.: A definition of conditional mutual information for arbitrary ensembles. Inf. Control 38(1), 51–59 (1978). https://doi.org/10.1016/S0019-9958(78)90026-8
Moreno-Torres, J.G., Sáez, J.A., Herrera, F.: Study on the impact of partition-induced dataset shift on k-fold cross-validation. IEEE Trans. Neural Netw. Learn. Syst. 23(8), 1304–1312 (2012). https://doi.org/10.1109/TNNLS.2012.2199516
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002). https://doi.org/10.1613/jair.953
Tarragona, M., et al.: Identification of NOG as a specific breast cancer bone metastasis-supporting gene. J. Biol. Chem. 287(25), 21346–21355 (2012). https://doi.org/10.1074/jbc.P112.355834
Acknowledgements
This work has been supported by the Natural Sciences and Engineering Research Council of Canada (NSERC).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Quang Pham, H., Rueda, L., Ngom, A. (2020). A Data Integration Approach for Detecting Biomarkers of Breast Cancer Survivability. In: Rojas, I., Valenzuela, O., Rojas, F., Herrera, L., Ortuño, F. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2020. Lecture Notes in Computer Science(), vol 12108. Springer, Cham. https://doi.org/10.1007/978-3-030-45385-5_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-45385-5_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-45384-8
Online ISBN: 978-3-030-45385-5
eBook Packages: Computer ScienceComputer Science (R0)