Abstract
Experimentally identifying disease genes is time-consuming and expensive, and thus it is appealing to develop computational methods for predicting disease genes. Many existing methods predict new disease genes from protein-protein interaction (PPI) networks. However, PPIs are changing during cells’ lifetime and thus only using the static PPI networks may degrade the performance of algorithms. In this study, we propose an algorithm for predicting disease genes based on centrality features extracted from clinical single sample-based PPI networks (dgCSN). Our dgCSN first constructs a single sample-based network from a universal static PPI network and the clinical gene expression of each case sample, and fuses them into a network according to the frequency of each edge appearing in all single sample-based networks. Then, centrality-based features are extracted from the fused network to capture the property of each gene. Finally, regression analysis is performed to predict the probability of each gene being disease-associated. The experiments show that our dgCSN achieves the AUC values of 0.893 and 0.807 on Breast Cancer and Alzheimer’s disease, respectively, which are better than two competing methods. Further analysis on the top 10 prioritized genes also demonstrate that dgCSN is effective for predicting new disease genes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Moody, S.E., Boehm, J.S., Barbie, D.A., Hahn, W.C.: Functional genomics and cancer drug target discovery. Curr. Opin. Mol. Ther. 12(3), 284–293 (2010)
Yang, P., Li, X., Wu, M., Kwoh, C.K., Ng, S.K.: Inferring gene-phenotype associations via global protein complex network propagation. PLoS ONE 6(7), e21502 (2011)
Chen, B., Shang, X., Li, M., Wang, J., Wu, F.X.: A two-step logistic regression algorithm for identifying individual-cancer-related genes. In: 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 195–200. IEEE (2015)
Chen, B., Shang, X., Li, M., Wang, J., Wu, F.X.: Identifying individual-cancer-related genes by rebalancing the training samples. IEEE Trans. Nanobiosci. 15(4), 309–315 (2016)
Tang, X., Hu, X., Yang, X., Sun, Y.: A algorithm for identifying disease genes by incorporating the subcellular localization information into the protein-protein interaction networks. In: 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 308–311. IEEE (2016)
Yang, P., Li, X.L., Mei, J.P., Kwoh, C.K., Ng, S.K.: Positive-unlabeled learning for disease gene identification. Bioinformatics 28(20), 2640–2647 (2012)
Jia, P., Zheng, S., Long, J., Zheng, W., Zhao, Z.: dmGWAS: dense module searching for genome-wide association studies in protein-protein interaction networks. Bioinformatics 27(1), 95–102 (2011)
Aerts, S., Lambrechts, D., Maity, S., Van Loo, P., Coessens, B., De Smet, F., Tranchevent, L.C., De Moor, B., Marynen, P., Hassan, B., et al.: Gene prioritization through genomic data fusion. Nat. Biotechnol. 24(5), 537–544 (2006)
Tranchevent, L.C., Ardeshirdavani, A., ElShal, S., Alcaide, D., Aerts, J., Auboeuf, D., Moreau, Y.: Candidate gene prioritization with endeavour. Nucleic Acids Res. 44, W117–W121 (2016). https://doi.org/10.1093/nar/gkw365
Wang, Q., Yu, H., Zhao, Z., Jia, P.: EW_dmGWAS: edge-weighted dense module search for genome-wide association studies and gene expression profiles. Bioinformatics 31, 2591–2594 (2015). https://doi.org/10.1093/bioinformatics/btv150
Hou, L., Chen, M., Zhang, C.K., Cho, J., Zhao, H.: Guilt by rewiring: gene prioritization through network rewiring in genome wide association studies. Hum. Mol. Genet. 23(10), 2780–2790 (2014)
Luo, P., Tian, L.P., Ruan, J., Wu, F.X.: Identifying disease genes from PPI networks weighted by gene expression under different conditions. In: 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 1259–1264. IEEE (2016)
Wang, J., Peng, X., Li, M., Pan, Y.: Construction and application of dynamic protein interaction network based on time course gene expression data. Proteomics 13(2), 301–312 (2013)
Meng, X., Li, M., Wang, J., Wu, F.X., Pan, Y.: Construction of the spatial and temporal active protein interaction network for identifying protein complexes. In: 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 631–636. IEEE (2016)
Chen, B., Fan, W., Liu, J., Wu, F.X.: Identifying protein complexes and functional modules from static PPI networks to dynamic PPI networks. Brief. Bioinform. 15(2), 177–194 (2013)
Chen, B., Wang, J., Li, M., Wu, F.X.: Identifying disease genes by integrating multiple data sources. BMC Med. Genomics 7(Suppl. 2), S2 (2014)
Chen, B., Li, M., Wang, J., Wu, F.X.: Disease gene identification by using graph kernels and Markov random fields. Sci. China Life Sci. 57(11), 1054–1063 (2014)
Chen, B., Li, M., Wang, J., Shang, X., Wu, F.X.: A fast and high performance multiple data integration algorithm for identifying human disease genes. BMC Med. Genomics 8(Suppl. 3), S2 (2015)
Köhler, S., Bauer, S., Horn, D., Robinson, P.N.: Walking the interactome for prioritization of candidate disease genes. Am. J. Hum. Genet. 82(4), 949–958 (2008)
Hoff, P.D., Raftery, A.E., Handcock, M.S.: Latent space approaches to social network analysis. J. Am. Stat. Assoc. 97(460), 1090–1098 (2002)
Radicchi, F., Castellano, C., Cecconi, F., Loreto, V., Parisi, D.: Defining and identifying communities in networks. Proc. Nat. Acad. Sci. U.S.A. 101(9), 2658–2663 (2004)
Wang, J., Li, M., Wang, H., Pan, Y.: Identification of essential proteins based on edge clustering coefficient. IEEE/ACM Trans. Comput. Biol. Bioinf. 9(4), 1070–1080 (2012)
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
McKusick, V., et al.: Online mendelian inheritance in man (OMIM). Mckusick-Nathans Institute for Genetic Medicine, Johns Hopkins University. National Center for Biotechnology Information, National Library of Medicine, Bethesda (2004). http://www.ncbi.nlm.nih.gov/omim/
Luo, P., Tian, L.P., Ruan, J., Wu, F.: Disease gene prediction by integrating PPI networks, clinical RNA-Seq data and OMIM data. IEEE/ACM Trans. Comput. Biol. Bioinf. (2017)
Forbes, S.A., Beare, D., Boutselakis, H., Bamford, S., Bindal, N., Tate, J., Cole, C.G., Ward, S., Dawson, E., Ponting, L., et al.: COSMIC: somatic cancer genetics at high-resolution. Nucleic Acids Res. 45, D777–D783 (2016). https://doi.org/10.1093/nar/gkw1121
Grossman, R.L., Heath, A.P., Ferretti, V., Varmus, H.E., Lowy, D.R., Kibbe, W.A., Staudt, L.M.: Toward a shared vision for cancer genomic data. N. Engl. J. Med. 375(12), 1109–1112 (2016)
Scheckel, C., Drapeau, E., Frias, M.A., Park, C.Y., Fak, J., Zucker-Scharff, I., Kou, Y., Haroutunian, V., Ma’ayan, A., Buxbaum, J.D., et al.: Regulatory consequences of neuronal ELAV-like protein binding to coding and non-coding RNAs in human brain. Elife 5, e10421 (2016)
Love, M.I., Huber, W., Anders, S.: Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15(12), 550 (2014)
Dillies, M.A., Rau, A., Aubert, J., Hennequet-Antier, C., Jeanmougin, M., Servant, N., Keime, C., Marot, G., Castel, D., Estelle, J., et al.: A comprehensive evaluation of normalization methods for illumina high-throughput RNA sequencing data analysis. Brief. Bioinform. 14(6), 671–683 (2013)
Li, T., Wernersson, R., Hansen, R.B., Horn, H., Mercer, J., Slodkowicz, G., Workman, C.T., Rigina, O., Rapacki, K., Stærfeldt, H.H., et al.: A scored human protein-protein interaction network to catalyze genomic interpretation. Nat. Methods 14(1), 61–64 (2016)
Chen, Y., Wang, W., Zhou, Y., Shields, R., Chanda, S.K., Elston, R.C., Li, J.: In silico gene prioritization by integrating multiple data sources. PLoS ONE 6(6), e21137 (2011)
Erten, S., Bebek, G., Ewing, R.M., Koyutürk, M.: DADA: degree-aware algorithms for network-based disease gene prioritization. BioData Min. 4(1), 19 (2011)
Chen, J., Bardes, E.E., Aronow, B.J., Jegga, A.G.: ToppGene suite for gene list enrichment analysis and candidate gene prioritization. Nucleic Acids Res. 37(Suppl. 2), W305–W311 (2009)
Weber, A.M., Ryan, A.J.: ATM and ATR as therapeutic targets in cancer. Pharmacol. Ther. 149, 124–138 (2015)
Soria-Bretones, I., Sáez, C., Ruíz-Borrego, M., Japón, M.A., Huertas, P.: Prognostic value of CtIP/RBBP8 expression in breast cancer. Cancer Med. 2(6), 774–783 (2013)
Stotani, S., Giordanetto, F., Medda, F.: DYRK1A inhibition as potential treatment for Alzheimers disease. Future Med. Chem. 8(6), 681–696 (2016)
Acknowledgments
This work is supported in part by Natural Science and Engineering Research Council of Canada (NSERC), China Scholarship Council (CSC) and by the National Natural Science Foundation of China under Grant No. 61571052 and No. 61602386, and the Natural Science Foundation of Shaanxi Province under Grant No. 2017JQ6008.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Luo, P., Tian, LP., Chen, B., Xiao, Q., Wu, FX. (2018). Predicting Disease Genes from Clinical Single Sample-Based PPI Networks. In: Rojas, I., Ortuño, F. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2018. Lecture Notes in Computer Science(), vol 10813. Springer, Cham. https://doi.org/10.1007/978-3-319-78723-7_21
Download citation
DOI: https://doi.org/10.1007/978-3-319-78723-7_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-78722-0
Online ISBN: 978-3-319-78723-7
eBook Packages: Computer ScienceComputer Science (R0)