Abstract
With the rapid accumulation of multi-dimensional disease data, the integration of multiple similarity networks is essential for understanding the development of diseases and identifying subtypes of diseases. The recent computational efficient method named SNF is suitable for the integration of similarity networks and has been extensively applied to the bioinformatics analysis. However, the computational complexity and space complexity of the SNF method increases with the increase of the sample numbers. In this research, we develop a parallel SNF algorithm named paraSNF to improve the speed and scalability of the SNF. The experimental results on two large-scale simulation datasets reveal that the paraSNF algorithm is 30x–100x faster than the serial SNF. And the speedup of the paraSNF over the SNF which running on multi-cores with multi-threads is 8x–15x. Furthermore, more than 60% memory space are saved using paraSNF, which can greatly improve the scalability of the SNF.
This work was mainly supported by the National Natural Science Foundation of China under Grant (No. U1435219), the National Key Research and Development Program of China under (No. 2016YFB0200401), grants from the Major Research Plan of the National Natural Science Foundation of China (No. U1435222), National Natural Science Foundation of China (No. 61572515) and the Major Research Plan of the National Key R&D Programof China (No. 2016YFC0901600).
X. Shen and S. He—These authors contributed equally to this work.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Tomczak, K., Czerwińska, P., Wiznerowicz, M.: The Cancer Genome Atlas (TCGA): an immeasurable source of knowledge. Contemp. Oncol. 19(1A), 68–77 (2015)
Levine, D.A.: Cancer Genome Atlas Research Network. Integrated genomic characterization of endometrial carcinoma. Nature 497(7447), 67 (2013)
Verhaak, R.G.W., Hoadley, K.A., Purdom, E.: Integrated genomic analysis identifies clinically rele-vant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR, and NF1. Cancer Cell 17(1), 98–110 (2010)
Curtis, C., Shah, S.P., Chin, S.F.: The genomic and transcriptomic architecture of 2,000 breast tu-mours reveals novel subgroups. Nature 486(7403), 346 (2012)
Wang, B., Mezlini, A.M., Demir, F.: Similarity network fusion for aggregating data types on a genomic scale. Nat. Methods 11(3), 333 (2014)
Shen, R., Olshen, A.B., Ladanyi, M.: Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis. Bioinformatics 25(22), 2906–2912 (2009)
Yuan, Y., Savage, R.S., Markowetz, F.: Patient-specific data fusion defines prognostic cancer subtypes. PLoS Comput. Biol. 7(10), e1002227 (2011)
He, S., He, H., Xu, W.: ICM: a web server for integrated clustering of multi-dimensional biomedical data. Nucl. Acids Res. 44(W1), W154–W159 (2016)
Ng, A.Y., Jordan, M.I., Weiss, Y.: On spectral clustering: analysis and an algorithm. In: Advances in Neural Information Processing Systems, pp. 849–856 (2002)
Vanderwiel, S.P., Lilja, D.J.: Data prefetch mechanisms. ACM Comput. Surv. (CSUR) 32(2), 174–199 (2000)
Liu, P., Yu, J., Huang, M.C.: Thread-aware adaptive prefetcher on multicore systems: improving the performance for multithreaded workloads. ACM Trans. Arch. Code Optim. (TACO) 13(1), 13 (2016)
Krishnan, M., Nieplocha, J.: SRUMMA: a matrix multiplication algorithm suitable for clusters and scalable shared memory systems. In: 18th International Parallel and Distributed Processing Symposium. Proceedings. IEEE, p. 70 (2004)
Schatz, M.D., Van de Geijn, R.A., Poulson, J.: Parallel matrix multiplication: a systematic journey. SIAM J. Sci. Comput. 38(6), C748–C781 (2016)
Li, D., Xu, C., Cheng, B.: Performance modeling and optimization of parallel LU-SGS on many-core processors for 3D high-order CFD simulations. J. Supercomput. 73(6), 2506–2524 (2017)
Chen, C., Fang, J., Tang, T.: LU factorization on heterogeneous systems: an energy-efficient approach towards high performance. Computing 99(8), 1–21 (2017)
Haixia, L.I.: Application of Cannon algorithm on parallel computers. J. Huangshi Inst. Technol. 3, 006 (2010)
Supplementary Data. https://www.nature.com/articles/nmeth.2810#supplementary-information
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Shen, X., He, S., Fang, M., Wen, Y., Bo, X., Dou, Y. (2018). paraSNF: An Parallel Approach for Large-Scale Similarity Network Fusion. In: Li, C., Wu, J. (eds) Advanced Computer Architecture. ACA 2018. Communications in Computer and Information Science, vol 908. Springer, Singapore. https://doi.org/10.1007/978-981-13-2423-9_12
Download citation
DOI: https://doi.org/10.1007/978-981-13-2423-9_12
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-2422-2
Online ISBN: 978-981-13-2423-9
eBook Packages: Computer ScienceComputer Science (R0)