Skip to main content

Protein-Protein Interaction Prediction by Integrating Sequence Information and Heterogeneous Network Representation

  • Conference paper
  • First Online:
Intelligent Computing Theories and Application (ICIC 2021)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12838))

Included in the following conference series:

Abstract

Protein-protein interaction (PPI) plays an important role in regulating cells and signals. PPI deregulation will lead to many diseases, including pernicious anemia or cancer. Despite the ongoing efforts of the bioassay group, continued data incompleteness limits our ability to understand the molecular roots of human disease. Therefore, it is urgent to develop a computational method that accurately and quickly detects PPIs. In this paper, a highly efficient model is proposed for predicting PPIs through heterogeneous network by combining local feature with global feature. Heterogeneous network is collected from several valuable datasets, containing five types of nodes and nine interactions among them. Local feature is extracted from protein sequence by k-mer method. Global feature is extracted from heterogeneous network by LINE (Large-scale Information Network Embedding). Protein representation is obtained from local feature and global feature by concatenation. Finally, random forest is trained to classify and predict potential protein pairs. The proposed method is demonstrated on STRING dataset and achieved an average 86.55% prediction accuracy with 0.9308 AUC. Extensive contrast experiments are performed with different protein representations and different classifiers. Obtained experiment results illustrate that proposed method is economically viable, which provides a new perspective for future research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Kotlyar, M., et al.: In silico prediction of physical protein interactions and characterization of interactome orphans. Nat. Methods 12, 79 (2015)

    Article  Google Scholar 

  2. Fields, S., Song, O.-k.: A novel genetic system to detect protein–protein interactions. Nature 340, 245 (1989)

    Google Scholar 

  3. Gavin, A.-C., et al.: Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 415, 141 (2002)

    Article  Google Scholar 

  4. Ho, Y., et al.: Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 415, 180 (2002)

    Article  Google Scholar 

  5. An, J.-Y., Meng, F.-R., You, Z.-H., Fang, Y.-H., Zhao, Y.-J., Zhang, M.: Using the relevance vector machine model combined with local phase quantization to predict protein-protein interactions from protein sequences. BioMed Res. Int. 2016, (2016)

    Google Scholar 

  6. Huang, D.-S., Zhang, L., Han, K., Deng, S., Yang, K., Zhang, H.: Prediction of protein-protein interactions based on protein-protein correlation using least squares regression. Curr. Protein Pept. Sci. 15, 553–560 (2014)

    Article  Google Scholar 

  7. Huang, Y.-A., Chen, X., You, Z.-H., Huang, D.-S., Chan, K.C.: ILNCSIM: improved lncRNA functional similarity calculation model. Oncotarget 7, 25902 (2016)

    Article  Google Scholar 

  8. Luo, X., Ming, Z., You, Z., Li, S., Xia, Y., Leung, H.: Improving network topology-based protein interactome mapping via collaborative filtering. Knowl.-Based Syst. 90, 23–32 (2015)

    Article  Google Scholar 

  9. Wong, L., You, Z.-H., Ming, Z., Li, J., Chen, X., Huang, Y.-A.: Detection of interactions between proteins through rotation forest and local phase quantization descriptors. Int. J. Mol. Sci. 17, 21 (2016)

    Article  Google Scholar 

  10. You, Z.-H., Lei, Y.-K., Gui, J., Huang, D.-S., Zhou, X.: Using manifold embedding for assessing and predicting protein interactions from high-throughput experimental data. Bioinformatics 26, 2744–2751 (2010)

    Article  Google Scholar 

  11. You, Z.-H., Yin, Z., Han, K., Huang, D.-S., Zhou, X.: A semi-supervised learning approach to predict synthetic genetic interactions by combining functional and topological properties of functional gene network. BMC Bioinf. 11, 343 (2010)

    Article  Google Scholar 

  12. You, Z.-H., Zhou, M., Luo, X., Li, S.: Highly efficient framework for predicting interactions between proteins. IEEE Trans. Cybern. 47, 731–743 (2016)

    Article  Google Scholar 

  13. Zheng, C.-H., Zhang, L., Ng, T.-Y., Shiu, C.K., Huang, D.-S.: Metasample-based sparse representation for tumor classification. IEEE/ACM Trans. Comput. Biol. Bioinf. 8, 1273–1282 (2011)

    Article  Google Scholar 

  14. Zheng, C.-H., Zhang, L., Ng, V.T.-Y., Shiu, C.K., Huang, D.-S.: Molecular pattern discovery based on penalized matrix decomposition. IEEE/ACM Trans. Comput. Biol. Bioinf. (TCBB) 8, 1592–1603 (2011)

    Article  Google Scholar 

  15. An, J.-Y., et al.: Identification of self-interacting proteins by exploring evolutionary information embedded in PSI-BLAST-constructed position specific scoring matrix. Oncotarget 7, 82440 (2016)

    Article  Google Scholar 

  16. Deng, S., Yuan, J., Huang, D., Zhen, W.: SFAPS: an R package for structure/function analysis of protein sequences based on informational spectrum method. In: IEEE International Conference on Bioinformatics & Biomedicine (2014)

    Google Scholar 

  17. Deng, S.-P., Zhu, L., Huang, D.-S.: Mining the bladder cancer-associated genes by an integrated strategy for the construction and analysis of differential co-expression networks. BMC Genomics 16, S4 (2015). BioMed Central

    Google Scholar 

  18. Guo, Y., Yu, L., Wen, Z., Li, M.: Using support vector machine combined with auto covariance to predict protein–protein interactions from protein sequences. Nucleic Acids Res. 36, 3025–3030 (2008)

    Article  Google Scholar 

  19. Sun, J., et al.: Refined phylogenetic profiles method for predicting protein–protein interactions. Bioinformatics 21, 3409–3415 (2005)

    Article  Google Scholar 

  20. Zhang, Q.C., et al.: Structure-based prediction of protein–protein interactions on a genome-wide scale. Nature 490, 556 (2012)

    Article  Google Scholar 

  21. Romero-Molina, S., Ruiz-Blanco, Y.B., Harms, M., Münch, J., Sanchez-Garcia, E.: PPI-Detect: a support vector machine model for sequence-based prediction of protein–protein interactions. J. Comput. Chem. 40, 1233–1242 (2019)

    Article  Google Scholar 

  22. Shen, J., et al.: Predicting protein–protein interactions based only on sequences information. Proc. Natl. Acad. Sci. 104, 4337–4341 (2007)

    Article  Google Scholar 

  23. Chen, K.-H., Wang, T.-F., Hu, Y.-J.: Protein-protein interaction prediction using a hybrid feature representation and a stacked generalization scheme. BMC Bioinf. 20, 308 (2019)

    Article  Google Scholar 

  24. Wang, Y., You, Z., Li, X., Chen, X., Jiang, T., Zhang, J.: PCVMZM: using the probabilistic classification vector machines model combined with a Zernike moments descriptor to predict protein–protein interactions from protein sequences. Int. J. Mol. Sci. 18, 1029 (2017)

    Article  Google Scholar 

  25. Wang, Y.-B., You, Z.-H., Li, L.-P., Huang, Y.-A., Yi, H.-C.: Detection of interactions between proteins by using legendre moments descriptor to extract discriminatory information embedded in PSSM. Molecules 22, 1366 (2017)

    Article  Google Scholar 

  26. Wang, Y.-B., et al.: Predicting protein–protein interactions from protein sequences by a stacked sparse autoencoder deep neural network. Mol. BioSyst. 13, 1336–1344 (2017)

    Article  Google Scholar 

  27. Szklarczyk, D., et al.: The STRING database in 2017: quality-controlled protein–protein association networks, made broadly accessible. Nucleic Acids Res. 45(D1), D362D368 (2016). gkw937

    Google Scholar 

  28. Chen, G., et al.: LncRNADisease: a database for long-non-coding RNA-associated diseases. Nucleic Acids Res. 41, D983–D986 (2012)

    Article  Google Scholar 

  29. Cheng, L., et al.: LncRNA2Target v2. 0: a comprehensive database for target genes of lncRNAs in human and mouse. Nucleic Acids Res. 47, D140-D144 (2018)

    Google Scholar 

  30. Chou, C.-H., et al.: miRTarBase update 2018: a resource for experimentally validated microRNA-target interactions. Nucleic Acids Res. 46, D296–D302 (2017)

    Article  Google Scholar 

  31. Davis, A.P., et al.: The comparative toxicogenomics database: update 2019. Nucleic Acids Res. 47, D948–D954 (2018)

    Article  Google Scholar 

  32. Huang, Z., et al.: HMDD v3. 0: a database for experimentally supported human microRNA–disease associations. Nucleic Acids Res. 47, D1013-D1017 (2018)

    Google Scholar 

  33. Kozomara, A., Birgaoanu, M., Griffiths-Jones, S.: miRBase: from microRNA sequences to function. Nucleic Acids Res. 47, D155–D162 (2018)

    Article  Google Scholar 

  34. Miao, Y.-R., Liu, W., Zhang, Q., Guo, A.-Y.: lncRNASNP2: an updated database of functional SNPs and mutations in human and mouse lncRNAs. Nucleic Acids Res. 46, D276–D280 (2017)

    Article  Google Scholar 

  35. Piñero, J., et al.: DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants. Nucleic Acids Res. 45(D1): D833D839 (2016). gkw943

    Google Scholar 

  36. Wishart, D.S., et al.: DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res. 46, D1074-D1082 (2017)

    Google Scholar 

  37. Belkin, M., Niyogi, P.: Laplacian Eigenmaps for Dimensionality Reduction and Data. Neural Comput. 15, 1373–1396 (2003)

    Article  Google Scholar 

  38. Perozzi, B., Al-Rfou, R., Skiena, S.: DeepWalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 701–710. Association for Computing Machinery, New York (2014)

    Google Scholar 

  39. Grover, A., Leskovec, J.: node2vec: scalable feature learning for networks. In: ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (2016)

    Google Scholar 

  40. Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., Mei, Q.: Line: large-scale information network embedding. In: Proceedings of the 24th International Conference on World Wide Web, pp. 1067–1077. International World Wide Web Conferences Steering Committee (2015)

    Google Scholar 

  41. Wang, D., Cui, P., Zhu, W.: Structural deep network embedding. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1225–1234. ACM (2016)

    Google Scholar 

  42. Chen, T., Guestrin, C.: Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794. ACM (2016)

    Google Scholar 

  43. Hosmer, D.W., Jr., Lemeshow, S., Sturdivant, R.X.: Applied Logistic Regression. Wiley, New York (2013)

    Book  Google Scholar 

  44. Suykens, J.A., Vandewalle, J.: Least squares support vector machine classifiers. Neural Process. Lett. 9, 293–300 (1999)

    Article  Google Scholar 

  45. Rätsch, G., Onoda, T., Müller, K.-R.: Soft margins for AdaBoost. Mach. Learning 42, 287–320 (2001)

    Article  Google Scholar 

Download references

Funding

This work was supported in part by Awardee of the NSFC Excellent Young Scholars Program, under Grant 61722212, in part by the National Natural Science Foundation of China, under Grants 61702444, in part by the Chinese Postdoctoral Science Foundation, under Grant 2019M653804, in part by the National Natural Science Foundation of China under Grant 62002297, in part by the West Light Foundation of The Chinese Academy of Sciences, under Grant 2018-XBQNXZ-B-008.

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Su, XR., You, ZH., Chen, ZH., Yi, HC., Guo, ZH. (2021). Protein-Protein Interaction Prediction by Integrating Sequence Information and Heterogeneous Network Representation. In: Huang, DS., Jo, KH., Li, J., Gribova, V., Premaratne, P. (eds) Intelligent Computing Theories and Application. ICIC 2021. Lecture Notes in Computer Science(), vol 12838. Springer, Cham. https://doi.org/10.1007/978-3-030-84532-2_55

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-84532-2_55

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-84531-5

  • Online ISBN: 978-3-030-84532-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics