Abstract
Protein-protein interaction (PPI) plays an important role in regulating cells and signals. PPI deregulation will lead to many diseases, including pernicious anemia or cancer. Despite the ongoing efforts of the bioassay group, continued data incompleteness limits our ability to understand the molecular roots of human disease. Therefore, it is urgent to develop a computational method that accurately and quickly detects PPIs. In this paper, a highly efficient model is proposed for predicting PPIs through heterogeneous network by combining local feature with global feature. Heterogeneous network is collected from several valuable datasets, containing five types of nodes and nine interactions among them. Local feature is extracted from protein sequence by k-mer method. Global feature is extracted from heterogeneous network by LINE (Large-scale Information Network Embedding). Protein representation is obtained from local feature and global feature by concatenation. Finally, random forest is trained to classify and predict potential protein pairs. The proposed method is demonstrated on STRING dataset and achieved an average 86.55% prediction accuracy with 0.9308 AUC. Extensive contrast experiments are performed with different protein representations and different classifiers. Obtained experiment results illustrate that proposed method is economically viable, which provides a new perspective for future research.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Kotlyar, M., et al.: In silico prediction of physical protein interactions and characterization of interactome orphans. Nat. Methods 12, 79 (2015)
Fields, S., Song, O.-k.: A novel genetic system to detect protein–protein interactions. Nature 340, 245 (1989)
Gavin, A.-C., et al.: Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 415, 141 (2002)
Ho, Y., et al.: Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 415, 180 (2002)
An, J.-Y., Meng, F.-R., You, Z.-H., Fang, Y.-H., Zhao, Y.-J., Zhang, M.: Using the relevance vector machine model combined with local phase quantization to predict protein-protein interactions from protein sequences. BioMed Res. Int. 2016, (2016)
Huang, D.-S., Zhang, L., Han, K., Deng, S., Yang, K., Zhang, H.: Prediction of protein-protein interactions based on protein-protein correlation using least squares regression. Curr. Protein Pept. Sci. 15, 553–560 (2014)
Huang, Y.-A., Chen, X., You, Z.-H., Huang, D.-S., Chan, K.C.: ILNCSIM: improved lncRNA functional similarity calculation model. Oncotarget 7, 25902 (2016)
Luo, X., Ming, Z., You, Z., Li, S., Xia, Y., Leung, H.: Improving network topology-based protein interactome mapping via collaborative filtering. Knowl.-Based Syst. 90, 23–32 (2015)
Wong, L., You, Z.-H., Ming, Z., Li, J., Chen, X., Huang, Y.-A.: Detection of interactions between proteins through rotation forest and local phase quantization descriptors. Int. J. Mol. Sci. 17, 21 (2016)
You, Z.-H., Lei, Y.-K., Gui, J., Huang, D.-S., Zhou, X.: Using manifold embedding for assessing and predicting protein interactions from high-throughput experimental data. Bioinformatics 26, 2744–2751 (2010)
You, Z.-H., Yin, Z., Han, K., Huang, D.-S., Zhou, X.: A semi-supervised learning approach to predict synthetic genetic interactions by combining functional and topological properties of functional gene network. BMC Bioinf. 11, 343 (2010)
You, Z.-H., Zhou, M., Luo, X., Li, S.: Highly efficient framework for predicting interactions between proteins. IEEE Trans. Cybern. 47, 731–743 (2016)
Zheng, C.-H., Zhang, L., Ng, T.-Y., Shiu, C.K., Huang, D.-S.: Metasample-based sparse representation for tumor classification. IEEE/ACM Trans. Comput. Biol. Bioinf. 8, 1273–1282 (2011)
Zheng, C.-H., Zhang, L., Ng, V.T.-Y., Shiu, C.K., Huang, D.-S.: Molecular pattern discovery based on penalized matrix decomposition. IEEE/ACM Trans. Comput. Biol. Bioinf. (TCBB) 8, 1592–1603 (2011)
An, J.-Y., et al.: Identification of self-interacting proteins by exploring evolutionary information embedded in PSI-BLAST-constructed position specific scoring matrix. Oncotarget 7, 82440 (2016)
Deng, S., Yuan, J., Huang, D., Zhen, W.: SFAPS: an R package for structure/function analysis of protein sequences based on informational spectrum method. In: IEEE International Conference on Bioinformatics & Biomedicine (2014)
Deng, S.-P., Zhu, L., Huang, D.-S.: Mining the bladder cancer-associated genes by an integrated strategy for the construction and analysis of differential co-expression networks. BMC Genomics 16, S4 (2015). BioMed Central
Guo, Y., Yu, L., Wen, Z., Li, M.: Using support vector machine combined with auto covariance to predict protein–protein interactions from protein sequences. Nucleic Acids Res. 36, 3025–3030 (2008)
Sun, J., et al.: Refined phylogenetic profiles method for predicting protein–protein interactions. Bioinformatics 21, 3409–3415 (2005)
Zhang, Q.C., et al.: Structure-based prediction of protein–protein interactions on a genome-wide scale. Nature 490, 556 (2012)
Romero-Molina, S., Ruiz-Blanco, Y.B., Harms, M., Münch, J., Sanchez-Garcia, E.: PPI-Detect: a support vector machine model for sequence-based prediction of protein–protein interactions. J. Comput. Chem. 40, 1233–1242 (2019)
Shen, J., et al.: Predicting protein–protein interactions based only on sequences information. Proc. Natl. Acad. Sci. 104, 4337–4341 (2007)
Chen, K.-H., Wang, T.-F., Hu, Y.-J.: Protein-protein interaction prediction using a hybrid feature representation and a stacked generalization scheme. BMC Bioinf. 20, 308 (2019)
Wang, Y., You, Z., Li, X., Chen, X., Jiang, T., Zhang, J.: PCVMZM: using the probabilistic classification vector machines model combined with a Zernike moments descriptor to predict protein–protein interactions from protein sequences. Int. J. Mol. Sci. 18, 1029 (2017)
Wang, Y.-B., You, Z.-H., Li, L.-P., Huang, Y.-A., Yi, H.-C.: Detection of interactions between proteins by using legendre moments descriptor to extract discriminatory information embedded in PSSM. Molecules 22, 1366 (2017)
Wang, Y.-B., et al.: Predicting protein–protein interactions from protein sequences by a stacked sparse autoencoder deep neural network. Mol. BioSyst. 13, 1336–1344 (2017)
Szklarczyk, D., et al.: The STRING database in 2017: quality-controlled protein–protein association networks, made broadly accessible. Nucleic Acids Res. 45(D1), D362D368 (2016). gkw937
Chen, G., et al.: LncRNADisease: a database for long-non-coding RNA-associated diseases. Nucleic Acids Res. 41, D983–D986 (2012)
Cheng, L., et al.: LncRNA2Target v2. 0: a comprehensive database for target genes of lncRNAs in human and mouse. Nucleic Acids Res. 47, D140-D144 (2018)
Chou, C.-H., et al.: miRTarBase update 2018: a resource for experimentally validated microRNA-target interactions. Nucleic Acids Res. 46, D296–D302 (2017)
Davis, A.P., et al.: The comparative toxicogenomics database: update 2019. Nucleic Acids Res. 47, D948–D954 (2018)
Huang, Z., et al.: HMDD v3. 0: a database for experimentally supported human microRNA–disease associations. Nucleic Acids Res. 47, D1013-D1017 (2018)
Kozomara, A., Birgaoanu, M., Griffiths-Jones, S.: miRBase: from microRNA sequences to function. Nucleic Acids Res. 47, D155–D162 (2018)
Miao, Y.-R., Liu, W., Zhang, Q., Guo, A.-Y.: lncRNASNP2: an updated database of functional SNPs and mutations in human and mouse lncRNAs. Nucleic Acids Res. 46, D276–D280 (2017)
Piñero, J., et al.: DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants. Nucleic Acids Res. 45(D1): D833D839 (2016). gkw943
Wishart, D.S., et al.: DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res. 46, D1074-D1082 (2017)
Belkin, M., Niyogi, P.: Laplacian Eigenmaps for Dimensionality Reduction and Data. Neural Comput. 15, 1373–1396 (2003)
Perozzi, B., Al-Rfou, R., Skiena, S.: DeepWalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 701–710. Association for Computing Machinery, New York (2014)
Grover, A., Leskovec, J.: node2vec: scalable feature learning for networks. In: ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (2016)
Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., Mei, Q.: Line: large-scale information network embedding. In: Proceedings of the 24th International Conference on World Wide Web, pp. 1067–1077. International World Wide Web Conferences Steering Committee (2015)
Wang, D., Cui, P., Zhu, W.: Structural deep network embedding. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1225–1234. ACM (2016)
Chen, T., Guestrin, C.: Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794. ACM (2016)
Hosmer, D.W., Jr., Lemeshow, S., Sturdivant, R.X.: Applied Logistic Regression. Wiley, New York (2013)
Suykens, J.A., Vandewalle, J.: Least squares support vector machine classifiers. Neural Process. Lett. 9, 293–300 (1999)
Rätsch, G., Onoda, T., Müller, K.-R.: Soft margins for AdaBoost. Mach. Learning 42, 287–320 (2001)
Funding
This work was supported in part by Awardee of the NSFC Excellent Young Scholars Program, under Grant 61722212, in part by the National Natural Science Foundation of China, under Grants 61702444, in part by the Chinese Postdoctoral Science Foundation, under Grant 2019M653804, in part by the National Natural Science Foundation of China under Grant 62002297, in part by the West Light Foundation of The Chinese Academy of Sciences, under Grant 2018-XBQNXZ-B-008.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Su, XR., You, ZH., Chen, ZH., Yi, HC., Guo, ZH. (2021). Protein-Protein Interaction Prediction by Integrating Sequence Information and Heterogeneous Network Representation. In: Huang, DS., Jo, KH., Li, J., Gribova, V., Premaratne, P. (eds) Intelligent Computing Theories and Application. ICIC 2021. Lecture Notes in Computer Science(), vol 12838. Springer, Cham. https://doi.org/10.1007/978-3-030-84532-2_55
Download citation
DOI: https://doi.org/10.1007/978-3-030-84532-2_55
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-84531-5
Online ISBN: 978-3-030-84532-2
eBook Packages: Computer ScienceComputer Science (R0)